An 8080 opcode map in octal

Kragen Javier Sitaker, 2019-08-28 (updated 2019-11-24) (11 minutes)

The 8008, 8080, 8086, i386, and amd64 instruction sets are, unfortunately, usually given in hexadecimal; but they are dramatically more readable in octal. The 8080 opcode map in particular can be drawn rather neatly using octal.

i386 and amd64 examples

Consider this segment of amd64 machine code:

  400575:       ba 02 00 00 00          mov    $0x2,%edx
  40057a:       be 64 06 40 00          mov    $0x400664,%esi
  40057f:       bf 01 00 00 00          mov    $0x1,%edi
  400584:       e8 a7 fe ff ff          callq  400430 <write@plt>
  400589:       ba 01 00 00 00          mov    $0x1,%edx
  40058e:       be 41 10 60 00          mov    $0x601041,%esi
  400593:       bf 00 00 00 00          mov    $0x0,%edi
  400598:       e8 a3 fe ff ff          callq  400440 <read@plt>

Here it is in octal:

  400575:       272 002 000 000 000             mov    $0x2,%edx
  40057a:       276 144 006 100 000             mov    $0x400664,%esi
  40057f:       277 001 000 000 000             mov    $0x1,%edi
  400584:       350 247 376 377 377             callq  400430 <write@plt>
  400589:       272 001 000 000 000             mov    $0x1,%edx
  40058e:       276 101 020 140 000             mov    $0x601041,%esi
  400593:       277 000 000 000 000             mov    $0x0,%edi
  400598:       350 243 376 377 377             callq  400440 <read@plt>

Here you can see that, for example, all the “load immediate” instructions are "27x", with "x" representing the register: 2 for %edx, 6 for %esi, 7 for %edi, and so on. As it turns out, there are precisely 8 registers that can be addressed in this way, corresponding to the 8 octal digits. And these register numbers are consistent across instructions; here we can see (in some i386 code, from httpdito), 0 representing %eax, 1 representing %ecx, 2 representing %edx again, and 3 representing %ebx (yes, the numbers are not in the same order as the letters):

 804811c:       120                             push   %eax
 804811d:       122                             push   %edx
 804811e:       350 354 377 377 377             call   0x804810f
 8048123:       132                             pop    %edx
 8048124:       130                             pop    %eax
...
 8048130:       102                             inc    %edx
 8048131:       271 353 226 004 010             mov    $0x80496eb,%ecx
 8048136:       061 333                         xor    %ebx,%ebx
 8048138:       103                             inc    %ebx
 8048139:       103                             inc    %ebx

By contrast, in hexadecimal, the immediate-load instruction ba 02 00 00 00 and the “pop %edx” instruction 5a represent %edx as “a”, while “push %edx” and “inc %edx” are 52 and 42 respectively, representing %edx as “2”. Moreover, note that in hexadecimal, both “push” and “pop” of registers are “5x”, while in octal they are “12x” and “13x” respectively.

So this is the sense in which I say 8086, i386, and amd64 machine code are dramatically more readable in octal. The octal digits correspond neatly to the bitfields in the instruction encoding, in most cases. But even the 8086 opcode map is rather large.

The 8080 opcode map

By contrast, every 8080 opcode is a single byte, though some are followed by one or two bytes of immediate data, so a full opcode table is only 256 cells. It, too, is more comprehensible in octal than in hexadecimal, organizing the instruction set into four 64-byte “pages”, although it has some cases where an inconveniently-located two-bit field identifies one of the 8080’s 16-bit register pairs rather than a single 8-bit register. A simple permutation of the rows and columns ameliorates this.

(Beware! None of this has been tested, and it would be surprising if I had found all the errrors in it.)

The 0xy page is largely register-pair operations, occupying three or four columns, with three single-register operations, occupying eight columns:

0xy x
B D H M C E L A
BC DE HL SP BC DE HL SP
0 2 4 6 1 3 5 7
y 0 NOP -
2 STAX STA LDAX LHLD LDA
4INR (increment register)
6MVI (mov immediate)
1LXI DAD (double add)
3INX DCX
5DCR
7RLC RAC DAA STC RRC RAR CMA CMC

The 1xy page is entirely devoted to the MOV instruction, except for 166, which would logically be MOV M, M but is instead HLT.

1xy x (dest)
B D H M C E L A
0 2 4 6 1 3 5 7
y
(src)
B 0 MOV
D 2
H 4
M 6 HLT
C 1
E 3
L 5
A 7

The 2xy page consists of single-operand instructions that implicitly act on the accumulator A, but unlike the 0xy page, the operand is in the final octal digit, not the middle one. If laid out consistently with the other pages, this makes the instructions columns:

2xy x
0 2 4 6 1 3 5 7
y
(src)
B 0 ADD SUB ANA ORA ADC SBB XRA CMP
D 2
H 4
M 6
C 1
E 3
L 5
A 7

Finally, the 3xy page contains all the control-flow and stack operations, plus some miscellaneous operations; some operate on register pairs, some on registers, some on neither. Three of these operations (Rcc, Jcc, Ccc) contain a 3-bit condition-code operand instead of a register operand, and the RST instruction contains an interrupt vector number.

3xy x
B D H M C E L A
BC DE HL SP BC DE HL SP
NZ NC PO P Z C PE M
0 2 4 6 1 3 5 7
y 0 Rcc
2 Jcc
4 Ccc
6ADI SUI ANI ORI ACI SBI XRI CPI
1POP POP
PSW
RET - PCHL SPHL
3JMP OUT XTHL DI - IN XCHG EI
5PUSH PUSH
PSW
CALL -
7RST

Why?

The 8080 is interesting to me not just for nostalgic reasons (many of my first computers in the 1980s were Z80-based) but because it’s nearly the smallest existing computer demonstrably capable of self-hosted software development with an assembler and running a usable user interface, at least if you have a character generator or printer connected to it. The PDP-8 and LGP-30 are simpler, but John Cowan tells me most PDP-8 development was actually done on a PDP-10 and cross-compiled, as with modern embedded microcontrollers, and the LGP-30 was normally programmed in machine code, with the programmer doing the “assembly” beforehand with pencil and paper. By contrast, although much significant software for the 8080 was written on a PDP-10 (notably Microsoft BASIC), much of it was written under CP/M on the 8080 itself.

Wirth’s RISC for Oberon and James Bowman’s J1A Forth CPU are other reasonable candidates, and both are fairly inspired designs with much simpler instruction sets than the 8080, but I think both require more transistors than the 8080, and the available software for them is somewhat lacking.

The GreenArrays F18A CPU design requires, I think, fewer transistors than the 8080 (certainly the MuP21 did) and has a simpler and much more powerful instruction set, but almost no software exists for it, and in particular there is no published self-hosted development environment, as far as I know. (The 18-bit address space is nine times the size of the 8080’s, but the chips made so far have only 64 words of RAM per CPU.)

By contrast, the 8080 has existing self-hosted assemblers as well as compilers for Turbo Pascal, Fortran, small-c, Tiny-C, and BASIC; computer algebra systems; display text editors; CP/M, which includes the assembler, a rudimentary filesystem, file management utilities, a REPL, and a debugger; and at least two free-software operating systems — Drew DeVault’s KnightOS and David Given’s CP/Mish. Yet you can fit the whole 8080 instruction set on a sheet of paper, and its full documentation is a 15-page chapter in the Intel manual.

This is an inspiring example of what is possible, even if the 8080 instruction set itself is kind of clumsy and lame, with the benefit of 40 years of hindsight. Its very imperfection is encouraging — it shows that even deeply flawed hacks can have enduring value and even achieve greatness.

BDS C

I just learned that there's a public-domain full C compiler for CP/M written in assembly; BD Software C (aka "BDS C") was dedicated to the public domain in 2002. According to p. 264 of Byte August 1983, this was one of the fastest C compilers available for CP/M, and supported a fairly complete version of the C language (well, for the platform.)

I, Leor Zolman, hereby release all rights to BDS C (all binary and source code modules, including compiler, linker, library sources, utilities, and all documentation) into the Public Domain. Anyone is free to download, use, copy, modify, sell, fold, spindle or mutilate any part of this package forever more. If, however, anyone ever translates it to BASIC, FORTRAN or C#, please don't tell me.

Leor Zolman
9/20/2002

From my point of view, at least, the availability of this software catapults the 8080 architecture from being a vaguely plausible but implausibly inconvenient architecture to program for, to being a simple architecture with a viable self-hosting development toolchain.

Topics