Programming paradigms for tiny microcontrollers

Kragen Javier Sitaker, 2007 to 2009 (6 minutes)

Concurrent actors in a microcontroller

Suppose you have a microcontroller with a small RAM (say, 2kiB) and you want to run bigger actor-oriented programs. Perhaps you can "page" the actors in and out to an external serial memory?

What I have in mind is that you can have many messages in flight at the same time --- enqueued in the RAM --- and some subset of actors resident. Whenever an actor sends a message to another actor, the message is enqueued, and the execution engine merely repeatedly searches for a message to deliver to a resident actor. When there are no more such messages, it "pages out" some actors to the serial memory to make space to "page in" some other actor that has messages waiting for it.

It's possible for the set of in-flight messages to get too big for RAM. There are a couple of tactics we could use in this case.

First, we could simply page out a chunk of the message queue to the serial memory. This will universally work, but it might make it hard to figure out what actors would be good choices to page in later.

Second, we could look for message deliveries that will diminish the number of in-flight messages instead of increasing it. In the classical Actors model and in Erlang, each handling of a message returns a new state for the actor, rather than mutating the actor's state during the handling of the message; if the system implemented this, then any message delivery could simply be undone by returning the message to the queue, deleting whatever outgoing messages the execution produced, and keeping the old actor state instead of using the new one. So the system could simply try each pending message, one after another, until it finds one that reduces the number of in-flight messages.

Third, we could page out actors to make more room for the message queue.

Of these three, the first strategy is apparently mandatory; the other two might be useful optimizations.

2kiB of RAM is enough to hold 1024 16-bit quantities; if a typical message contains four of these (destination actor, message name, and two arguments) then we could handle a queue of 256 messages at once. If half the RAM is dedicated to actor storage, we could handle 128. This is the level of concurrency at which the system would be most efficient; having fewer concurrently in-flight messages would worsen the choices of which actors to page in next, and possibly reduce the amount of work a particular actor could do before getting paged back out, while having more would not improve that choice, but would require time to be spent paging messages in and out.

When considering the size of each actor on its way to or from memory, it's important to remember that we have to include its code as well as its data. The data might be only 2-10 words, but the code will probably be much larger. So it's probably worthwhile to take this into account in decisions of which actors to page in and out, and to share the code between objects when possible, just as in a traditional in-memory object system without concurrency.

There are several drawbacks to this scheme:

Concurrent tree-space transformation in a microcontroller

Suppose instead that we run Aardappel in the microcontroller. In place of in-flight messages, we have the trees of the tree space, which of course we page out to memory; in place of stateful actors, we have stateless rewrite rules, which rewrite one or more trees into zero or more trees. (Approximately.) In the Aardappel implementation, all the rewrite rules for a particular "type" of tree get collected by the compiler and compiled into a single function, where the "type" is the atom at the beginning of the expression for the tree.

So for a relatively simple system, we repeat the following process:

(This needs some refinement in case all the trees of the most common type aren't currently rewritable.)

I'm not sure this is really very different from the other proposal, but I think it is likely to work better for the following reasons:

Topics