The 8008, 8080, 8086, i386, and amd64 instruction sets are, unfortunately, usually given in hexadecimal; but they are dramatically more readable in octal. The 8080 opcode map in particular can be drawn rather neatly using octal.
Consider this segment of amd64 machine code:
400575: ba 02 00 00 00 mov $0x2,%edx
40057a: be 64 06 40 00 mov $0x400664,%esi
40057f: bf 01 00 00 00 mov $0x1,%edi
400584: e8 a7 fe ff ff callq 400430 <write@plt>
400589: ba 01 00 00 00 mov $0x1,%edx
40058e: be 41 10 60 00 mov $0x601041,%esi
400593: bf 00 00 00 00 mov $0x0,%edi
400598: e8 a3 fe ff ff callq 400440 <read@plt>
Here it is in octal:
400575: 272 002 000 000 000 mov $0x2,%edx
40057a: 276 144 006 100 000 mov $0x400664,%esi
40057f: 277 001 000 000 000 mov $0x1,%edi
400584: 350 247 376 377 377 callq 400430 <write@plt>
400589: 272 001 000 000 000 mov $0x1,%edx
40058e: 276 101 020 140 000 mov $0x601041,%esi
400593: 277 000 000 000 000 mov $0x0,%edi
400598: 350 243 376 377 377 callq 400440 <read@plt>
Here you can see that, for example, all the “load immediate” instructions are "27x", with "x" representing the register: 2 for %edx, 6 for %esi, 7 for %edi, and so on. As it turns out, there are precisely 8 registers that can be addressed in this way, corresponding to the 8 octal digits. And these register numbers are consistent across instructions; here we can see (in some i386 code, from httpdito), 0 representing %eax, 1 representing %ecx, 2 representing %edx again, and 3 representing %ebx (yes, the numbers are not in the same order as the letters):
804811c: 120 push %eax
804811d: 122 push %edx
804811e: 350 354 377 377 377 call 0x804810f
8048123: 132 pop %edx
8048124: 130 pop %eax
...
8048130: 102 inc %edx
8048131: 271 353 226 004 010 mov $0x80496eb,%ecx
8048136: 061 333 xor %ebx,%ebx
8048138: 103 inc %ebx
8048139: 103 inc %ebx
By contrast, in hexadecimal, the immediate-load instruction ba 02 00 00 00 and the “pop %edx” instruction 5a represent %edx as “a”, while “push %edx” and “inc %edx” are 52 and 42 respectively, representing %edx as “2”. Moreover, note that in hexadecimal, both “push” and “pop” of registers are “5x”, while in octal they are “12x” and “13x” respectively.
So this is the sense in which I say 8086, i386, and amd64 machine code are dramatically more readable in octal. The octal digits correspond neatly to the bitfields in the instruction encoding, in most cases. But even the 8086 opcode map is rather large.
By contrast, every 8080 opcode is a single byte, though some are followed by one or two bytes of immediate data, so a full opcode table is only 256 cells. It, too, is more comprehensible in octal than in hexadecimal, organizing the instruction set into four 64-byte “pages”, although it has some cases where an inconveniently-located two-bit field identifies one of the 8080’s 16-bit register pairs rather than a single 8-bit register. A simple permutation of the rows and columns ameliorates this.
(Beware! None of this has been tested, and it would be surprising if I had found all the errrors in it.)
The 0xy page is largely register-pair operations, occupying three or four columns, with three single-register operations, occupying eight columns:
0xy | x | ||||||||
---|---|---|---|---|---|---|---|---|---|
B | D | H | M | C | E | L | A | ||
BC | DE | HL | SP | BC | DE | HL | SP | ||
0 | 2 | 4 | 6 | 1 | 3 | 5 | 7 | ||
y | 0 | NOP | - | ||||||
2 | STAX | STA | LDAX | LHLD | LDA | ||||
4 | INR (increment register) | ||||||||
6 | MVI (mov immediate) | ||||||||
1 | LXI | DAD (double add) | |||||||
3 | INX | DCX | |||||||
5 | DCR | ||||||||
7 | RLC | RAC | DAA | STC | RRC | RAR | CMA | CMC |
The 1xy page is entirely devoted to the MOV instruction, except for 166, which would logically be MOV M, M but is instead HLT.
1xy | x (dest) | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
B | D | H | M | C | E | L | A | |||
0 | 2 | 4 | 6 | 1 | 3 | 5 | 7 | |||
y (src) | B | 0 | MOV | |||||||
D | 2 | |||||||||
H | 4 | |||||||||
M | 6 | HLT | ||||||||
C | 1 | |||||||||
E | 3 | |||||||||
L | 5 | |||||||||
A | 7 |
The 2xy page consists of single-operand instructions that implicitly act on the accumulator A, but unlike the 0xy page, the operand is in the final octal digit, not the middle one. If laid out consistently with the other pages, this makes the instructions columns:
2xy | x | |||||||||
---|---|---|---|---|---|---|---|---|---|---|
0 | 2 | 4 | 6 | 1 | 3 | 5 | 7 | |||
y (src) | B | 0 | ADD | SUB | ANA | ORA | ADC | SBB | XRA | CMP |
D | 2 | |||||||||
H | 4 | |||||||||
M | 6 | |||||||||
C | 1 | |||||||||
E | 3 | |||||||||
L | 5 | |||||||||
A | 7 |
Finally, the 3xy page contains all the control-flow and stack operations, plus some miscellaneous operations; some operate on register pairs, some on registers, some on neither. Three of these operations (Rcc, Jcc, Ccc) contain a 3-bit condition-code operand instead of a register operand, and the RST instruction contains an interrupt vector number.
3xy | x | ||||||||
---|---|---|---|---|---|---|---|---|---|
B | D | H | M | C | E | L | A | ||
BC | DE | HL | SP | BC | DE | HL | SP | ||
NZ | NC | PO | P | Z | C | PE | M | ||
0 | 2 | 4 | 6 | 1 | 3 | 5 | 7 | ||
y | 0 | Rcc | |||||||
2 | Jcc | ||||||||
4 | Ccc | ||||||||
6 | ADI | SUI | ANI | ORI | ACI | SBI | XRI | CPI | |
1 | POP | POP PSW | RET | - | PCHL | SPHL | |||
3 | JMP | OUT | XTHL | DI | - | IN | XCHG | EI | |
5 | PUSH | PUSH PSW | CALL | - | |||||
7 | RST |
The 8080 is interesting to me not just for nostalgic reasons (many of my first computers in the 1980s were Z80-based) but because it’s nearly the smallest existing computer demonstrably capable of self-hosted software development with an assembler and running a usable user interface, at least if you have a character generator or printer connected to it. The PDP-8 and LGP-30 are simpler, but John Cowan tells me most PDP-8 development was actually done on a PDP-10 and cross-compiled, as with modern embedded microcontrollers, and the LGP-30 was normally programmed in machine code, with the programmer doing the “assembly” beforehand with pencil and paper. By contrast, although much significant software for the 8080 was written on a PDP-10 (notably Microsoft BASIC), much of it was written under CP/M on the 8080 itself.
Wirth’s RISC for Oberon and James Bowman’s J1A Forth CPU are other reasonable candidates, and both are fairly inspired designs with much simpler instruction sets than the 8080, but I think both require more transistors than the 8080, and the available software for them is somewhat lacking.
The GreenArrays F18A CPU design requires, I think, fewer transistors than the 8080 (certainly the MuP21 did) and has a simpler and much more powerful instruction set, but almost no software exists for it, and in particular there is no published self-hosted development environment, as far as I know. (The 18-bit address space is nine times the size of the 8080’s, but the chips made so far have only 64 words of RAM per CPU.)
By contrast, the 8080 has existing self-hosted assemblers as well as compilers for Turbo Pascal, Fortran, small-c, Tiny-C, and BASIC; computer algebra systems; display text editors; CP/M, which includes the assembler, a rudimentary filesystem, file management utilities, a REPL, and a debugger; and at least two free-software operating systems — Drew DeVault’s KnightOS and David Given’s CP/Mish. Yet you can fit the whole 8080 instruction set on a sheet of paper, and its full documentation is a 15-page chapter in the Intel manual.
This is an inspiring example of what is possible, even if the 8080 instruction set itself is kind of clumsy and lame, with the benefit of 40 years of hindsight. Its very imperfection is encouraging — it shows that even deeply flawed hacks can have enduring value and even achieve greatness.
I just learned that there's a public-domain full C compiler for CP/M written in assembly; BD Software C (aka "BDS C") was dedicated to the public domain in 2002. According to p. 264 of Byte August 1983, this was one of the fastest C compilers available for CP/M, and supported a fairly complete version of the C language (well, for the platform.)
I, Leor Zolman, hereby release all rights to BDS C (all binary and source code modules, including compiler, linker, library sources, utilities, and all documentation) into the Public Domain. Anyone is free to download, use, copy, modify, sell, fold, spindle or mutilate any part of this package forever more. If, however, anyone ever translates it to BASIC, FORTRAN or C#, please don't tell me.
Leor Zolman
9/20/2002
From my point of view, at least, the availability of this software catapults the 8080 architecture from being a vaguely plausible but implausibly inconvenient architecture to program for, to being a simple architecture with a viable self-hosting development toolchain.