Notes on reading eForth 1.0 for the 8086

Kragen Javier Sitaker, 2007 to 2009 (5 minutes)

These are notes on the original 8086 eForth model, eForth 1.0 by Bill Muench and C. H. Ting, 1990.

The assembly-language parts of eForth are:

That's 157 instructions worth of primitives in all, in 31 primitives, plus 14 more instructions in the boot code. The rest of the system is written in Forth. It would be about 31 fewer instructions if there were a central NEXT instead of an indirect jump on the end of each word.

I don't have an assembler that can assemble it or a disassembler that can disassemble the compiled version, but if it's similar to Bill Muench's updated “8086 eForth ITC16i 971014.1”, there should be about 4.3 bytes per instruction, which would make this a 675-byte kernel of an interpreter. (That later Forth also has CHAR+, CHAR-, CHARS, CELL+, CELL-, CELLS, and REDIRECT as primitives, but omits lower-case next.)

The 161 high-level FORTH colon definitions in EFORTH.SRC are another 3078 words of text (according to wc) and so are probably about another 6000 bytes. There's an 8814-character span of NULLs in the middle of the .COM file, which totals 15600 bytes, leaving 6786 bytes that might be meaningful; this is pretty close to my estimate of 6675 bytes. (Which leaves out the user variables and so on.)

If we use token threading, 3078 words of text are probably closer to 3000 bytes, but the token table is another 1024 bytes.

To implement the I/O stuff on Forth on Linux, we’d probably want to implement a “syscall” word in machine language taking up to 6 arguments (because we need select() for ?RX, and it takes five arguments) that makes an up-to-5-argument system call. Apparently the parameters go into %ebx, %ecx, %edx, %esi, and %edi, in that order, according to my disassembly of select.o from my libc.a.

In a direct-threaded system, there’s a few bytes extra penalty for colon definitions --- probably five.

“unnecessary” primitives here include: “next” (the lowercase looping one) R> 1- >R I 0= (IF) R@ R> DUP >R or : R@ R> R> TUCK >R >R ; over : over >R DUP R> SWAP ;

That is, it would be possible to avoid defining them as primitives.

The 24 different instructions used are NOP, CALL, LODSW, JMP, MOV, CLI, STI, INT, CLD, IRET, XOR, JZ, OR, JNZ, PUSH, POP, CMP, XCHG, JC, SUB, ADD, CWD, AND, and RCL.

Topics