A variety of software systems have used some kind of automatic dependency graph tracking to automatically recompute things, from Lotus 1-2-3 (and derived spreadsheets) up to current JavaScript frameworks like ReactJS and current big-data frameworks like Apache Spark. I’m thinking about some ideas related to this, especially inspired by Spark, and I thought I would look around and see what already exists.
It turns out a lot of fucking things related to this already exist, so I thought I’d write down a summary of some of them.
There are way too many for me to summarize, though.
Candidates not yet noted below: make; React; Meteor; redo; materialized views in databases; STMs; git-annex; database indexes; tabled predicates in Prologs; Merkle trees; immediate-mode GUIs.
Bob Frankston said in March 2015 that VisiCalc, the original spreadsheet, didn’t include “natural order” (i.e. dependency-driven) recalculation, in order to fit into 16 kilobytes. He was responding to a tweet from Mitch Kapor explaining that Rick Ross had implemented that for the first time, in 1982, in Lotus 1-2-3. VisiCalc, instead, had an option to recalculate by rows or by columns. Since 1-2-3, though, spreadsheets default to dependency-order recalculation.
Dependency-order recalculation is comparatively easy for spreadsheets (although infamous thieves Rene Pardo and Remy Landau still got a US patent on it in 1983; Pardo claims to have done it in an “electronic spreadsheet” called LANPAR in 1969, but his lawyer denied it in court), because the total number of things that could possibly be recalculated is human-scale, all of them can be recalculated in only a single way, and so you can simply enumerate them and do a topological sort.
Nowadays, dependency-order recalculation is approximately unnecessary in spreadsheets; computers are so fast that human-scale spreadsheets could recalculate in milliseconds, so recalculating the whole spreadsheet after every keystroke would be reasonable. (They don’t, but that’s another story.)
Tor: https://blog.torproject.org/category/tags/deterministic-builds
Chromium: https://www.chromium.org/developers/testing/isolated-testing/deterministic-builds
Debian: https://wiki.debian.org/ReproducibleBuilds
Firefox: https://bugzilla.mozilla.org/show_bug.cgi?id=885777
ccache is a build accelerator specific to C, C++, and Objective-C. It hashes your source code, include files, compiler (well, typically just its size, mtime, and name), command-line options, etc., with MD4, and stores the compiler’s output (including e.g. warning messages) in an on-disk cache in, normally, your home directory; future recompilation attempts whose inputs haven’t changed will just reuse the previous compiler output.
In theory, this means that you could get most of the benefits of
make
for C and especially C++ programs by just wrapping your
compiler in ccache
inside your build script; rerunning the build
script would rehash all your source files, copy the object files into
your current directory, and then relink the executable. Depending on
what you’re compiling, this might be almost instantaneous, or it might
be very slow.
Because Unix provides ccache
with no reliable way to get a secure
hash of the source files from the filesystem, it has to read them in
their entirety to figure out whether they have reverted to an old
version. I tried it just now on my netbook on a tiny three-kilobyte
GLUT program, and it ended up reading about a megabyte of .h
files
in order to figure out that it could safely reuse the 2.7kB .o
file
from a previous compilation, taking 39 milliseconds in all, even
though it made only about 365 system calls.
https://www.cs.berkeley.edu/~matei/papers/2012/nsdi_spark.pdf
https://spark.apache.org/docs/latest/streaming-programming-guide.html
http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-177.pdf http://www.vestasys.org/
Avery Pennarun
https://github.com/apenwarr/bup/blob/89ac418d84e29ba482bbd21ebc1172c2d1ff5507/DESIGN https://github.com/bup/bup https://bup.github.io/
In Stallman & Sussman 1976, describing their pre-SPICE circuit simulator, we find, “If a user changes some part of the circuit specification (a device parameter or an imposed voltage or current), only those facts depending on the changed fact need be ‘forgotten’ and re-deduced, so small changes in the circuit may need only a small amount of new analysis.” They are describing their invention of “dependency-directed backtracking”, which later became known as a “truth maintenance system”, and it’s built with generalized constraint propagation, which is substantially more general than the unidirectional dependencies mentioned in the other systems above, and one that supports finding a contradiction and backtracking from it to undo the set of incorrect guesses that led to it, and avoid that set in the future. You could use this kind of system, for example, to solve Sudoku puzzles rapidly.
A TMS, like the other systems above, remembers how every datum was deduced, but it does so not in order to promote computational efficiency by caching results, but rather to track the sources of problems — in this case, logical contradictions.
http://www.umut-acar.org/self-adjusting-computation