Eur-Scheme

Kragen Javier Sitaker, 2007 to 2009 (13 minutes)

So I was thinking about how to simplify Ur-Scheme further. Ur-Scheme is about 1600 lines of code so far. Something like half or a third of that is assembly code, represented in Scheme, mostly implementing the basic types (strings, conses, closures, heap variables, symbols, booleans, characters, nil, fixnums) and the basic control structures (procedure call and return, including tail-call elimination, closure creation, conditionals, sequences). The primitives thus defined total about 1100 lines of assembly-language output, so they're probably somewhere around 1000 instructions. (But it doesn't have a garbage collector yet!)

Having these things implemented in assembly definitely helps it reach the reasonable performance level that it does. But it would be nice to have a system that was a lot simpler and more portable, even if it was slower.

eForth's approach

The eForth Model 1.0 was a Forth system written to be maximally portable; the x86 version apparently contains 171 machine-code instructions in about 28 primitives, and everything else is built up on top of that in interpreted Forth words. The primitives were something like the following set:

I don't remember how eForth accomplished rightward bit shifts (e.g. for division), and I don't have it handy at the moment.

The Eur-Scheme Ideal

Could we build a Scheme with a similar structure — as little as possible in assembly code? Would it be simpler?

Ideally we'd be able to implement not only the normal Scheme types, but also as much as possible of the run-time system, in Scheme, on top of a minimal set of machine-code primitives. That would mean heap allocation, garbage collection, some parts of closure handling, variadic function calls, and all the types mentioned above; the basic system would only have to support (non-closure) function call and return, some conditional (I'm using %ifeq), and set!; and primitives for accessing memory.

Two Approaches To cons

The implementation of quote in a compiler is kind of annoying; it's straightforward in an interpreter or a load-and-go compiler, but in a compiler that might run in a different interpreter, and that generates an output file rather than compiling things into memory, its representations of data have to be compatible with those produced by the definition used at run-time, regardless of whether that run-time definition is in Scheme, C, assembly, or something else.

Ur-Scheme's pair structure, created by cons, consists of three machine words in order: the magic number 0x2ce11ed to identify it as a cons, the car, and the cdr.

Here are the two definitions of cons from Ur-Scheme, the run-time definition and the compile-time definition used by quote:

;; We define a label here before the procedure prologue so that other
;; asm routines can call cons
(add-to-header (lambda () (text) (label "cons")))
(define-global-procedure 'cons 2
  (lambda ()
    (emit-malloc-n 12)
    (mov (const cons-magic) (indirect tos))
    (mov tos ebx)
    (get-procedure-arg 0)
    (mov tos (offset ebx 4))
    (pop)
    (get-procedure-arg 1)
    (mov tos (offset ebx 8))
    (pop)))
;; Compile a quoted cons cell.
(define (compile-cons car-contents cdr-contents labelname)
  (rodatum labelname)
  (compile-word cons-magic)
  (compile-word car-contents)
  (compile-word cdr-contents)
  (text))

I've been considering a slight variation in which assembly-emitting routines return the location in which their result was placed. That would shorten the run-time code for cons above to the following slight variation; I'm not sure if this is clearer or not, but it's certainly briefer, at 6 lines of body instead of 9.

(define-global-procedure 'cons 2
  (lambda ()
    (mov (const cons-magic) (indirect (emit-malloc-n 12)))
    (mov tos ebx)
    (mov (get-procedure-arg 0) (offset ebx 4))
    (pop)
    (mov (get-procedure-arg 1) (offset ebx 8))
    (pop)))

It would be ideal if the two could be generated from a single structure specification, but at present there are only three in-memory types that can be generated at compile-time: pairs, strings (which are variable-sized and mostly consist of bytes), and symbols (which point to strings). So it's not clear that such a struct facility would be a net win for simplicity.

Primitives for Accessing Memory: The Eur-Scheme Approach

So the garbage collector, the heap allocator, and the stuff out of which pairs and strings and so on are constructed need to be able to access raw memory. And it's important that the garbage collector and the heap allocator not accidentally call the heap allocator — you could easily get an infinite recursive loop; so their interface to raw memory shouldn't require heap-allocation.

Perhaps the best way to do this would be to support basically a subset of C: allow routines to declare local variables that are just machine words, which can be used as pointers, are allocated on the stack or in machine registers, and are somehow ignored by the garbage collector (because they aren't necessarily Scheme values or pointers to Scheme heap objects).

But I'm going to talk about another approach, which I think will probably be simpler. You have a pointer data type, like the string and pair data types of normal Scheme, and an explicit stack to save and restore pointers on; and a set of pointers that exist from the time the program starts, and are therefore not heap-allocated. In order to

For concreteness, there could be eight callee-saves pointers, called %g0, %g1, %g2, %g3, %g4, %g5, %g6, and %g7, and eight caller-saves pointers, called %c0, %c1, %c2, %c3, %c4, %c5, %c6, and %c7. There might be other pointers around. For example, there might be some pointer constants created by the compiler, or pointer variables allocated on the heap by routines that aren't part of the garbage collector or memory allocator.

Then we need only a set of primitives for accessing them:

These 20 or so primitives are more or less sufficient for implementing things like the garbage collector and high-level data structures. None of them are particularly complicated; here's a possible implementation of the body of *load!:

        mov (%ebp), %eax        # fetch argument zero, %p0 (src)
        call ensure_pointer     # error unless it's a pointer object
        mov 4(%eax), %ebx       # fetch its contents (the location to load from)
        mov (%ebx), %ebx        # fetch the pointed-to value from memory
        mov 4(%ebp), %eax       # fetch argument one, %p1 (dest)
        call ensure_pointer     # error unless it's a pointer object too
        mov %ebx, 4(%eax)       # store into its contents.

Those seven instructions are wrapped inside seven more instructions for procedure prologue and epilogue. This suggests that the total number of assembly instructions that you need to write to implement these primitives on a new CPU is somewhere around 7 * 20 = 140, plus the basic conditional, procedure call and return, and variable access. So you could probably port such a Scheme to a new CPU architecture with 300-400 lines of code or so, around twice the amount needed to port the eForth Model 1.0.

Of those 150-200 instructions that you need to write, some are in procedure prologues and epilogues that get duplicated 20 times for those 20 or so primitives. So you might end up with something like 400-500 lines of assembly output for the primitives.

Two More Approaches to cons

Again, the nested-assembly version of cons looks like this:

(define-global-procedure 'cons 2
  (lambda ()
    (mov (const cons-magic) (indirect (emit-malloc-n 12)))
    (mov tos ebx)
    (mov (get-procedure-arg 0) (offset ebx 4))
    (pop)
    (mov (get-procedure-arg 1) (offset ebx 8))
    (pop)))

The "subset of C" approach might look like this:

(define (cons car cdr)
  (let-pointer ((rv (malloc-n 12)))
    (pointer-write (indirect rv) cons-magic)
    (pointer-write (offset rv 4) (pointer-val car))
    (pointer-write (offset rv 8) (pointer-val cdr))
    rv))

The Eur-Scheme approach might look like this:

(define (cons car cdr)
  (*store! cons-magic (malloc-n 12 %c0)) ; cons-magic is a pointer constant
  (*store! (*put! car %c1) (*add! %c0 4 %c2))
  (*store! (*put! cdr %c1) (*add! %c0 8 %c2))
  (*get c0))

cons doesn't have to save and restore the values of those pointers because they're all caller-saves, and cons isn't calling anything else that might use them for its own purposes.

I think that's a substantial improvement in both readability and simplicity over the current assembly version, and it's not obviously worse than the subset-of-C version. This example leads me to suspect that defining routines in this fashion, rather than in assembly, would make for a considerably more concise and comprehensible Scheme system than Ur-Scheme.

References

Ur-Scheme is a subset-of-Scheme compiler I wrote to learn how to write compilers. It compiles itself; it's reasonably fast, despite being safe, and very small.

Topics