One of the approaches the STEPS program wanted to try was to “separate implementation from optimization”. I don’t know that they ever got anywhere with that (certainly I never saw anything in their public reports that could be described that way) but I think it’s a really interesting idea.
Conventional programming practice unashamedly mixes together the specification of what your program does with a lot of implementation choices about how to do it efficiently. Of the languages I’ve used, the one that is purest at describing just the semantics of the desired computation is perhaps Scheme, though it’s in no sense the theoretical limit in that sense. What are the ways Scheme and other programming languages mix these things together?
(See also Generic programming with proofs, specification, refinement, and specialization.)
Scheme is pretty abstract.
In Scheme you don’t explicitly specify what type your variables are,
when to deallocate things, how to represent enumerated choices in
machine registers, whether you’re calling a function or invoking a
macro. You don’t specify whether a call to a function will return to
the callsite, through a saved continuation somewhere up the stack from
the callsite, or through a saved continuation to somewhere on another
stack. Iteration uses the same syntax as recursion. If you use lists
or vectors for your records, you never have to declare record types,
and you never have to say which record type a list or vector
represents, although the built-in field accessors have names like
cadddr
.
But Scheme is relentlessly monomorphic, at least outside the numerical tower — the standard procedures for lists, for example, work only on lists, not arbitrary sequences. To index into a string or to index into a list, you use different procedures, so you can’t write a procedure that works implicitly on either one. It’s unapologetically eager (although continuations provide some escape from this, in that they allow a procedure to suspend evaluation until a result is ready; because it has the implicit variable capture semantics later adopted by nearly all modern programming languages, its closures also provide an escape valve, though also a rather syntactically clumsy one. Your record types must be explicitly defined (if your system implements SRFI 9 so that you have record types). And if you need both a function and its inverse, you must define them separately — more generally, given an n-ary relation, there exist 2ⁿ functions from subsets of its columns to sets of values for the other columns, and in Scheme each of these must generally be reified as a separate procedure.
By contrast, in Lua or JS, you can implicitly (“schemalessly”) define a record type, with reasonable accessor syntax, just by using it; in Lua syntax:
dragStart = {x=ev.p.x, y=ev.p.y}
The schemaless Scheme equivalent is something like
(let ((drag-start (cons (x (p ev)) (y (p ev)))))
…)
but then you have to access the x
and y
fields of drag-start
with car
and cdr
, or define functions wrapping them (which might
have to have longer names than x
and y
).
If you use a SRFI-9 declaration like this one in Scheme:
(define-record-type :point (make-point x y) point? (x get-x) (y get-y))
then you can instead write
(let ((drag-start (make-point (x (p ev)) (y (p ev)))))
…)
and then you have (get-x drag-start)
and (get-y drag-start)
as you
would want. But you still can't use the same accessors x
and y
for your new record type that were used for the fields of (p ev)
;
this is a specific case of a larger issue, which is that Scheme makes
polymorphism cumbersome and often impossible.
In Python, Smalltalk, or Ruby, you can generally define a new type that can be used in place of an existing type as an argument to a function or method, simply by implementing the interface that the function or method expects of that argument. Consider this very simple Python function from Dercuano:
def pluralize(noun, number):
return noun if number == 1 else noun + 's'
This is intended for use with a string and an integer, returning a string, as you’d expect. A Scheme equivalent:
(define (pluralize noun number)
(if (= number 1)
noun
(string-concatenate (list noun "s"))))
But all the Python version demands of the “integer” is that it be
usefully comparable to 1, and all it demands of the “string” is that
you can concatenate a string to it and get a value of the same type.
So, for example, you can partially evaluate pluralize
(and a class
of similar functions) with respect to the number
argument by passing
the following Template
object as an argument instead of the string:
class Template:
def __radd__(self, other):
return Template_cat(templatify(other), self)
def __add__(self, other):
return Template_cat(self, templatify(other))
def of(self, value):
return value
class Template_cat(Template):
def __init__(self, a, b):
self.a = a
self.b = b
def of(self, value):
return self.a.of(value) + self.b.of(value)
class Template_k(Template):
def __init__(self, value):
self.value = value
def of(self, value):
return self.value
def templatify(val):
return val if isinstance(val, Template) else Template_k(val)
This amounts to partial evaluation by way of abstract interpretation with an non-standard semantics, but it’s done entirely at the level of ordinary code by passing an argument that implements an interface. To do something like this with Scheme, you have to step up to a metalevel and write a sort of compiler rather than just implementing metamethods.
So you could reasonably argue that the Python pluralize
is a more
abstract specification of a computation than the Scheme version.
As many of us learned from SICP, although Scheme is eager, you can
define lazy streams in Scheme by delaying evaluation with lambda
:
(define (ints-from n) (lambda () (cons n (ints-from (+ n 1)))))
Or by using delay
:
(define (ints-from-delay n) (delay (cons n (ints-from-delay (+ n 1)))))
But — related to the above issue about polymorphism — in standard
Scheme, you cannot pass these lazy streams to things that expect
regular lists. (A Scheme that provides implicit force
would allow
this; as I understand it, supporting this possibility is the reason
delay
and force
are in the standard, but neither of the Schemes I
tried it on has implicit force
.) So a caller that expects a regular
list as a return value is constraining the callee to produce it
entirely before returning. And a callee that expects a regular list
as an argument is constraining the caller to produce it entirely
before the callee begins execution.
In mathematics, it’s entirely commonplace to deal with infinite sequences on equal terms (heh) with finite sequences, and even infinite sequences of infinite sequences, etc. The decision to compute only a finite subset of an infinite sequence is, in many cases, a sort of optimization decision — computing too many terms is relatively harmless, consuming only extra time and memory, and computing all the terms would take infinite time and therefore require some kind of hypercomputation.
This kind of thing frequently results in dual implementations: an
eager implementation for when all the results are needed (more
convenient to handle in Scheme and in most languages) and a lazy one
for when it’s important to produce results lazily. Usually these are
provided through separate operations (functions, methods, or
operators), though in some cases, it’s just different options to the
same operation; for example, in the perlop documentation for the
pattern-match operator m//
:
The “/g” modifier specifies global pattern matching — that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.
In scalar context, each execution of “m//g” finds the next match, returning true if it matches, and false if there is no further match. The position after the last match can be read or set using the “pos()” function; see “pos” in perlfunc.
By contrast, in Python, you can write a generator function, which creates a generator object when initially invoked; both the generator object and the builtin list object implement the builtin iterator protocol, so things that just use the iterator protocol (by, for example, iterating or converting to a tuple) will not notice the difference between a list and a generator.
In Dercuano, for example, this method is a generator; it generates a lazy sequence of (subject, verb, object) triples that form the main data store of the Dercuano application:
def load_triples(filename):
with open(filename) as f:
last_subj = None
for line in f:
if not line.strip():
continue
fields = tuple(unquote(field.replace('+', '%20'))
for field in line.split())
if line[0] in string.whitespace:
fields = (last_subj,) + fields
for fn in fields[2:]:
yield (fields[0], fields[1], fn)
last_subj = fields[0]
This parses some vaguely RDF-N3-like text saying things like this:
7805-6-volts written 2019-06-08
updated 2019-06-10
concerns electronics pricing
array-intervals written 2019-06-23
concerns math intervals arrays apl python c programming algorithms numpy
Its return value is passed to this function, which generates some in-memory hash tables for quick access:
def as_relations(triples):
relations = {}
for subj, verb, obj in triples:
if verb not in relations:
relations[verb] = relation.Relation()
relations[verb].put(subj, obj)
# Precompute inverse relations eagerly
for verb in relations:
~relations[verb]
return relations
Right now, the execution of as_relations
is interleaved with that of
load_triples
. If load_triples
is changed to build up a list of
3-tuples and return it, or if load_triples
’s caller does so by
invoking list()
on its return value before passing it to
as_relations
, the code of as_relations
continues to produce the
same results, but the file access is no longer interleaved with the
hashing, and an additional big temporary data list is built up in
memory. As it turns out, this makes it run slightly faster. (So I
made the change.)
The ability to hide optimization decisions behind interfaces like this is powerful, but I’m kind of overselling Python’s implementation of the iterator interface here. If you iterate over a list a second time, you get the same items, unless you’ve changed it; but iterators in Python are single-pass, and so you cannot iterate over a generator a second time. Worse, if you attempt to do so, you silently get no items, rather than an error message. And, because Python’s semantics have pervasive mutability, it’s easy to write a generator that unintentionally produces a different sequence of values depending on exactly when the iteration happens. (In the above example, this could happen if the contents of the file changed, which is a problem that goes beyond Python; see Immutability-based filesystems: interfaces, problems, and benefits.)
Laziness is useful for a few different reasons: to avoid building up big in-memory temporary data structures that will just be thrown away again; to do computations in finite time involving infinite sequences like the continued-fraction expansion of φ or π; and to provide partial results sooner from things that can take a long time to finish, including jobs that use heavy computation, I/O, and remote procedure calls, which are a sort of I/O.
(Haskell is the champion of laziness, avoiding both Python’s repeated-iteration problem and Python’s implicit-mutation-dependency problem, but I don’t know Haskell, so I don’t have a good example to hand.)
Consider this Prolog “predicate”, which is to say, n-ary relation:
x(7805-6-volts, written, 2019-06-08).
x(7805-6-volts, updated, 2019-06-10).
x(7805-6-volts, concerns, electronics).
x(7805-6-volts, concerns, pricing).
x(array-intervals, written, 2019-06-23).
x(array-intervals, updated, 2019-06-10).
x(array-intervals, concerns, math).
x(array-intervals, concerns, electronics).
We can interactively query the topics covered by How to get 6 volts out of a 7805, and why you shouldn’t as follows:
?- x(7805-6-volts, concerns, Topic).
Topic = electronics ;
Topic = pricing
So we can think of x
as a function which, when executed, gives the
topics for How to get 6 volts out of a 7805, and why you shouldn’t. But we could make the note name an
input variable and think of it as a function from note names to (sets
of) topics:
?- Note = 7805-6-volts, x(Note, concerns, Topic).
Note = 7805-6-volts,
Topic = electronics ;
Note = 7805-6-volts,
Topic = pricing ;
false.
(Calling back to the previous topic, this sequence is lazy; you can
say Note = 7805-6-volts, findall(Topic, x(Note, concerns, Topic),
L).
to get the whole list at once.)
Also, though, x
is the inverse function, from topics to note names:
?- Topic = electronics, x(Note, concerns, Topic).
Topic = electronics,
Note = 7805-6-volts ;
Topic = electronics,
Note = array-intervals.
But we can also invoke x
as a function to return a list of all
note–topic pairs:
?- x(Note, concerns, Topic).
Note = 7805-6-volts,
Topic = electronics ;
Note = 7805-6-volts,
Topic = pricing ;
Note = array-intervals,
Topic = math ;
Note = array-intervals,
Topic = electronics.
Or a function from note names to last-updated dates:
?- Note = array-intervals, x(Note, updated, Date).
Note = array-intervals,
Date = 2019-6-10.
If we compose x
with itself, we can even, for example, find all the
pairs of notes last updated on the same date:
?- x(Note1, updated, Date), x(Note2, updated, Date).
Note1 = Note2, Note2 = 7805-6-volts,
Date = 2019-6-10 ;
Note1 = 7805-6-volts,
Date = 2019-6-10,
Note2 = array-intervals ;
Note1 = array-intervals,
Date = 2019-6-10,
Note2 = 7805-6-volts ;
Note1 = Note2, Note2 = array-intervals,
Date = 2019-6-10.
But even without composing x
with itself or selecting from it, we
can construct eight different functions from it by declaring some of
its columns to be inputs and others to be outputs; by currying or
partially evaluating those functions with respect to some subset of
their input arguments (in the relational view, selecting a subset of
rows from the relation), we can get a much larger set of functions out
of it. As written, there are four functions with the first column
fixed to How to get 6 volts out of a 7805, and why you shouldn’t and four more with it fixed to
Reducing the cost of self-verifying arithmetic with array operations; four each with the second column fixed to each of
its three values; and four each (much less interesting) functions with
the third column fixed to each of its six values. This is 44 more
functions, plus the 8 uncurried ones, for a total of 52, plus the
functions with more than one curried argument (which are less
interesting).
Defining 52+ functions is not too bad for 8 lines of code, and you won’t have to go searching through a many-page API document to figure out how to use these 52+ functions, either.
As I said before, though, Prolog hardwires a depth-first search
strategy, which fails to terminate for many searches, and is
exponentially inefficient for many others. So the extent to which you
can do this kind of thing in Prolog directly is fairly limited. The
usual example is append
, which is a built-in Prolog function; we can
ask SWI-Prolog for a listing of it as follows:
?- listing(append/3).
lists:append([], A, A).
lists:append([A|B], C, [A|D]) :-
append(B, C, D).
true.
These two or three lines of code define the list-append relation among lists; thinking of it in the usual way as a way to append two lists, we can call it to append two lists:
?- append([h,e,l,l,o], [v,e,n,u,s], L).
L = [h, e, l, l, o, v, e, n, u|...].
But these two lines of code define eight functions, just like the
x/3
predicate defined above. We can ask for all pairs of lists that
append to another list:
?- append(A, B, C).
A = [],
B = C ;
A = [_G249],
C = [_G249|B] ;
A = [_G249, _G255],
C = [_G249, _G255|B]
We can ask for every pair of lists that can be appended to form a given list:
?- append(A, B, [v,e,n,u,s]).
A = [],
B = [v, e, n, u, s] ;
A = [v],
B = [e, n, u, s] ;
A = [v, e],
B = [n, u, s] ;
A = [v, e, n],
B = [u, s] ;
A = [v, e, n, u],
B = [s] ;
A = [v, e, n, u, s],
B = [] ;
false.
We can ask if one list is a prefix of another list (and, implicitly, what’s left over):
?- append([h,e,l,l], X, [h,e,l,l,o]).
X = [o].
?- append([h,e,l,l], X, [v,e,n,u,s]).
false.
Or, more interestingly, a suffix:
?- append(Hello, [n,u,s], [v,e,n,u,s]).
Hello = [v, e] ;
false.
That’s a lot of functionality for two lines of code!
You can define Scheme’s standard append
function in a precisely
analogous way:
(define (append a b)
(if (null? a)
b
(cons (car a) (append (cdr a) b))))
But you can’t run a Scheme procedure “backwards” in the way the above Prolog predicates are being used, so if you want starts-with and ends-with predicates, much less lists of sublists, you need to program those separately.
This kind of thing is a really tempting pointer to what Will Byrd calls “relational programming”, where instead of programming procedures or functions (or, as Total Functional Programming points out, partial functions) we program relations. Modern Prolog systems are doing some work with tabling and integer programming to expand the scope of this kind of thing, but they’re incurably handicapped by Prolog’s depth-first evaluation model. Byrd and Kiselyov’s system miniKANREN is, like the database query mini-language in SICP, an exploration of making this kind of thing much more general than it is in Prolog. Its second most impressive feat is that, given a relational specification of an interpreter, it was able to deduce a self-reproducing program, a quine, for that interpreter. (Its most impressive feat is that Chung-Chieh Shan and Oleg Kiselyov extended it to support probabilistic programming five years before anyone else realized probabilistic programming was a useful thing to do.)
As I understand it — and I may have this part wrong — miniKANREN uses a sort of breadth-first search strategy which is guaranteed to find a solution if one exists, but, like Prolog’s strategy, is not guaranteed to terminate. My intuition is that this means that for many problems — in some sense, most problems — it will take exponentially longer than a program that always knows the right choice to make, or even a version of miniKANREN that knows the right choice to make more often, or gets to it sooner. This suggests that some kind of optimization-hint information could yield an exponential speedup, so adding some kind of optimization to your miniKANREN program (whether separated or, as is traditional, mixed into it) could yield big wins.
Naturally, Byrd and Kiselyov implemented miniKANREN in Scheme, although Clojure also includes a version of it.
From a certain point of view, the Scheme append
procedure is an
optimized version of the Prolog append/3
predicate: it’s a version
of the predicate specialized to the case where the first two arguments
are ground and the third argument is a free variable.
Consider this Python implementation of the method of secants (see Using the method of secants for general optimization, and note that these implementations tend to report success by dividing by zero):
def sec(f, x0, x1, e):
y = f(x0), f(x1)
while abs(y[1]) > e:
x0, x1 = x1, x1 - y[1]*(x1 - x0)/(y[1] - y[0])
y = y[1], f(x1)
return x1
Here is a Scheme equivalent, although it may not be the best way to express the algorithm in Scheme:
(define (sec f x0 x1 e)
(letrec ((loop
(lambda (y0 y1 x0 x1)
(if (> (abs y1) e)
(let ((xn (- x1 (/ (* y1 (- x1 x0))
(- y1 y0)))))
(loop y1 (f xn) x1 xn))
x1))))
(loop (f x0) (f x1) x0 x1)))
If it terminates, this approximates a root of the function f
by
finding the intersection point of the X-axis and the secant line
through (xᵢ₋₁, f(xᵢ₋₁)) and (xᵢ, f(xᵢ)); in the usual cases,
this is a slightly more efficient approximation of Newton–Raphson
iteration, since it converges with degree φ to Newton’s 2, but only
requires a single computation of y(x) per iteration, rather than
separate computations of y(x) and ẏ(x). (In some cases, not
needing to compute the derivative is also useful, although if the
derivative doesn’t actually exist, this method may not converge.)
The thing I want to point out here is that, assuming floating-point,
this procedure computes in constant space, but by virtue of doing so,
it erases all the intermediate values of x1
and y
. If you just
want the root, that may be okay, but if you want to analyze the
performance of the algorithm, it may be useful to save those
intermediate results.
Related to the previous item about laziness — in this form, the
algorithm fails to terminate at times — notably when you give it real
starting points for a function whose roots are complex, such as
sec(lambda x: x**2 + 2, 0, 1.0, .0000001)
, although it works fine
with other starting points such as sec(lambda x: x**2 + 2, 0, 1.0 +
1.0j, .0000001)
. Even if it does manage to terminate, it can run for
many iterations, and in some applications it would be best to
terminate its execution early, for example if a concurrent search from
a different starting point finds a solution.
If you sometimes want the constant-space behavior and sometimes want the whole sequence of values or guaranteed termination, most programming languages require you to restructure the procedure as a generator of a lazy sequence, so that a concurrent consumer has the opportunity to discard its intermediate values or abort the iteration. For example:
def sec_seq(f, x0, x1):
y = f(x0), f(x1)
while True:
yield x1
x0, x1 = x1, x1 - y[1]*(x1 - x0)/(y[1] - y[0])
y = y[1], f(x1)
This allows you to, for example, take the 10th value, discarding all previous values, to compute the first 10 values, or to interleave the iterative computation in other ways:
>>> list(itertools.islice(sec_seq(lambda x: x**2 + 2, 0, 1.0 + 1.0j), 9, 10))
[(-2.2572481470524732e-21+1.4142135623730951j)]
>>> list(itertools.islice(sec_seq(lambda x: x**2 + 2, 0, 1.0 + 1.0j), 10))
[(1+1j),
(-1+1j),
2j,
(-0.2+1.4j),
(-0.03448275862068967+1.4137931034482758j),
(1.3877787807814457e-17+1.411764705882353j),
(2.9885538387977862e-05+1.4142135620573204j),
(-2.5897366406179418e-08+1.4142135623730947j),
(-8.007108329513308e-22+1.4142135623733687j),
(-2.2572481470524732e-21+1.4142135623730951j)]
The original Python and Scheme code is seen to have mixed together the space optimization of discarding the intermediate values with the implementation of the underlying algorithm. Scheme is in some sense often more abstract than Python with regard to this sort of irreversibility, in that you write this loop with a call whose tail position the Scheme system is supposed to recognize, rather than explicitly overwriting variables with new values; but, on a standard Scheme system, you can’t construct some kind of context where executing the original Scheme example above gives you access to all of the intermediate values. And in fact the surgery needed to convert the Scheme version of the algorithm to optionally produce the whole sequence of values, interleaved with the execution of the consumer, is more extensive than on the Python version.
(The situation is somewhat worse even than this suggests: in both Python and Scheme, if the arguments are arbitrary-precision rational objects (or, in some Schemes, integers), the algorithm is not constant-space, and the Scheme standard more or less explicitly permits garbage collection to be broken, and Scheme space behavior depends in most cases on garbage collection.)
But we see that Python and Scheme permit us to write this algorithm in such a way that the invoking context can preserve the intermediate values or discard them, as it wishes. (Doing this in general requires using FP-persistent data structures and avoiding mutation entirely, which is totally impractical in Python, and pretty hard to do without sacrificing efficiency in Scheme.) But what would an algorithm specification language look like that entirely omitted erasure?
The first algorithm presented in an algorithms course is often Euclid’s algorithm for computing greatest common divisors. Consider this Python implementation:
def gcd(a, b):
while b:
a, b = b, a % b
return a
Or its Scheme equivalent, although the Scheme standard already provides this function:
(define (gcd a b)
(if (= b 0)
a
(gcd b (remainder a b))))
Like the method-of-secants code, this computes a sequence of intermediate values before arriving at its final answer, and these intermediate values are lost. (However, at least for the usual data types, in this case termination is guaranteed.) It turns out that, as explained in Using Aryabhata’s pulverizer algorithm to calculate multiplicative inverses in prime Galois fields and other multiplicative groups, the lost values are actually very useful; you can use them to efficiently compute the inverses of the elements of a multiplicative group or find that no such inverses exist, because they allow you to compute not only g = gcd(a, b) but also coefficients s and t such that sa + tb = g. The easiest way to do this in Python is, first, to restructure the algorithm as a recursive algorithm like the Scheme implementation:
def gcd(a, b):
return a if b == 0 else gcd(b, a % b)
Second, to push the termination test down the stack one level:
def gcd(a, b):
r = a % b
if r == 0:
return b
return gcd(b, r)
Finally, to augment the return value with the additional information:
def egcd(a, b):
q, r = divmod(a, b) # r = a - qb
if r == 0:
return 0, 1, b # 0a + 1b = b
s, t, g = egcd(b, r) # sb + tr = g
assert s*b + t*r == g
return t, s - t*q, g # sb + t(a - qb) = g = ta + (s - tq)b
For example, if invoked as egcd(81, 18)
, this returns (1, -4, 9),
because 1·81 - 4·18 = 9, their greatest common divisor;, so in ℤ/81ℤ,
18 has no multiplicative inverse. But, if invoked as egcd(93, 26)
,
it returns (7, -25, 1), because 93 and 26 are relatively prime, and
7·93 - 25·26 = 1, so in ℤ/93ℤ, the multiplicative inverse of 26 is
-25, which is to say, 68.
In a sense, the simpler functions above are optimizations of this one: they are iterative and they avoid doing work and allocating memory to calculate s and t when not needed. You can of course write the extended version and hope that your compiler is smart enough to detect this situation, but you will probably be disappointed, and moreover you will probably have no way of knowing. (Even if you grovel over assembly listings to verify it once, you likely won’t notice if future change to the code break the optimization.)
Wouldn’t it be better if you could write the extended version and assert that, at a given callsite, the algorithm used should be iterative? If the compilation failed because that assertion couldn’t be satisfied, wouldn’t it be good to be able to explain to the compiler how to make the optimization and what assumptions justified the optimization, and have the compiler search for a counterexample for the compilation error message if your assumptions were wrong, or at least suggest a lemma you could try to give it a proof for?
So we’ve seen particular ways in which even in Scheme we must entangle optimization with implementation, but in more mainstream programming languages, the situation is even more extreme.
Consider this Golang code from my maze program:
func (m Maze) Show(output io.Writer) {
h, v := m.Horizontal, m.Vertical
for r := 0; r < len(h); r++ {
s := []byte{}
for c := 0; c < len(v[0]); c++ {
k := context{
up: v[r][c],
down: v[r+1][c],
left: h[r][c],
right: h[r][c+1],
}
s = append(s, glyphs[k]...)
}
output.Write(append(s, '\n'))
}
}
Using a maze previously generated by the rest of the program with Kruskal's algorithm, this produces output like the following maze, which has one path through it:
╻╺━┳┳━━━━┳┳━┳━━┳┳┳━━━┳┳━━━━━┳━━┳━┳┳━┳━━━━━━━━┳━━┳━━┳┳┳┳━┳━┳┳┳━┳━┳━━━━┳━━┳━┳━┳━━┓
┃╺━┫┗┳━╸┏┫┗┓┗┓┏┫╹┗━━╸┃┣━╸╺━━┻╸╺╋┓╹┃╺┻┳━┓┏┳╸┏┓┣━┓┣╸┏┫┃╹╹┏╋╸┃╹╹╻┗╸┗╸┏┳┳┛╻┏┛╻┃╺┻┓╻┃
┣╸╺┫┏┻━╸╹┗┓┗╸╹┃┗━╸╺━┓╹┗━┓┏╸╺┓╺┳┛┃╻┃╺┳┫╺╋┛┃╻╹┗┛╺┫┗┓╹┃┗━┓╹╹╻┗┳━╋╸╻╺┓╹╹┗┳┛┗╸┗╋╸╺┛┣┫
┃╺┓┃╹┏╸┏┓╺┫┏╸╺╋━╸┏┓╻┣╸╺┳┻┫╺┓┣┳╋╸┗┫┣╸┃┗╸┗╸┃┣╸╺━┓╹┏┫┏┛╺┳┛┏━┻┓┣╸╹╻┃╻┃╺┳━╋┓╻┏╸┗┓╺━┛┃
┃╻┣┛╻┗┓┃┃╺┻┛┏┓┗┳━┛┗┻╋━┓┣┓┃╺┫╹╹┣╸╻┃┗╸┃╻┏━━┫┗┳━┳┛╻╹┃┃┏┳╋╸┗━┓╹╹╻╺┫┃┗╋┓┗┓╹┗┛┗━┓┃┏━╸┃
┃┣┫┏┛╻┗┛┗┓╻╻╹┗┳┛╻┏┳━┫╺┫╹┃╹╻┣╸╻┗╸┣┛┏╸╹┗╋╸╻┗┓┗┓┗━┫╺┛┃╹╹┗╸╺┓┃╺┓┣╸┃┃╻┃┃╺┛┏╸╻╻┏┻┛┗┳┓┃
┃╹╹┗┳┻┓┏┓┃┗╋╸╺┻╸┃┃╹╻╹┏┻┓┗┓┣┻━┛┏┓┗┓┣━┓╺╋━┫╺┛┏┫╺━┫╺┳┫┏╸╻┏┓┗┻┳┫┣┓┣┫┗┫┗━━┻┓┗┫┣┓┏━┫┗┫
┣┳┳━┻╸┗┫┗┻┳┛╻┏━╸┗╋━┛┏┛╺╋┓╹┣┳╸╻╹┃┏┛╹╺╋┓╹╺╋┳╸╹┣╸╻┣┓┃┗╋━┛╹┣━╸┃┣┛┃┃┗┳┻━╸╻┏┛╻┗┛╹╹╻╹┏┫
┃╹┗╸┏┓┏┛╺┓┣━╋┛╻╺━┛╻╻┃╻┏┛┣╸┃┣╸┣━┛┣━┓╻┃┣━┓┃┃╻╺┻┳┛╹┣┛╻┃┏╸╺╋┓╺┛┗┓┗┻╸┗┓┏╸┗╋━┻┳┓╺┳┛╻╹┃
┣━┳┓╹┃┃╺┳┫┣╸┣╸┃┏┳┓┗┫┗┫┃╺┻━┛╹╺╋┓╻┗╸╹┗┫┗╸┃┃╹┣┓╻┗┓╻┗┓┗┻┫┏╸┃┗╸╻╺┫╺━┓╺┻┻╸╺┫╻╺┛┣━┫╻┃╻┃
┃┏┫┗┓┃╹╻╹┃┣╸┣┳┫┃╹┗┓┃╺┫╹┏━╸╻╺━┛┗┻┓╻┏╸┃┏╸┗┻╸╹┗╋━┻┛┏┻┓┏┫┣╸┗━┓┣╸┃┏┳╋╸╻╻╻╺┛┃╺━┫╺╋┫┣┫┃
┃╹┃╺╋┻┳┛╺┻┛╺┛╹┗┛╻╺┻┫╺┛╻┣━┓┗━┳┳┓╻┣┻┻╸┃┣┳┳┳┳╸╻┗┳━╸╹╻╹╹┃┣┳╸┏┻┻┳╋┛╹┗╸┣┻┛╻╻┗━━╋┓╹╹┃┗┫
┣╸┣┓┃╻┃╺┳┳╸╻┏┓╺┓┃╻╺┫╺━┫╹┏┫╺━┛┃┣┛┃╻┏╸┗┫╹╹╹┃┏┛╺┫╻╻╺┫╻┏┻┫┗┓┃╻╻╹┃┏┓╺┳┛┏╸┗╋━━┳┛╹╻╺┫╺┫
┃╺┛╹┃┃┣━┫╹╺┫╹┣━┛┗┻┳┫╺┓┣╸╹┗┓╺┓┃┣━┫┣┻╸╻┃┏┳╸┣┻┓╻┗┫┗┳┫┗┻╸╹╺┛┃┣╋━┫┃┗┳┻━┫╺┓┗╸╺┛╺┓┣━┫╺┫
┣┳┓╻┣┫╹╻╹╻╺┫┏┻╸╺┳╸╹╹╺┻╋━┓╺┫╺╋┫┣┓╹┗┓╻┃┗┫┗┓╹╺┫┗┓╹╺┫┗━┓┏┳╸╻╹╹╹╺╋┻┓┗━╸┃╻┃╻┏┓┏┳┛┗┓╹╻┃
┃┃╹┃╹┗╸┣━╋┳┻┻┳┓╺┫╻╻┏━━┛┏┫┏┻━┫┃╹┗╸┏┻┻┫╻┃┏┻━┓┗━┻╸╺┫╺━┛╹┣╸┣╸╻╻╻╹╻┃╺┓╻┗╋┻┛╹┣┛┣━┓┗╸┃┃
┃╹╻┗━┓╺┫┏┫┣╸╺┛┗╸┣┫┃┃┏╸╺┛┗┫┏╸┃╹╺┳━┛╺┳┫┃╹┗╸╺┻┓╻╻┏╸┃╺┳╸╻┗━┫┏┫┣┻━┫╹┏╋┻╸┃┏┳┳╋┓╹╻┃╻╻┣┫
┃╻┃┏━┛╻╹┃┃┣━━━┓╻┃╹┗┛┣┳╸╺┓┣┻╸┗━╸┗╸┏┓╹┗╋━━╸╺━┻┛┣┻╸┣┓┃╺┫┏╸┗┛┃┗╸╻┗━┫┃╺━┫╹┃╹╹╹╺┫┗╋┫┃┃
┣┛┃┗━┓┃╻╹╹┗┳━╸╹┗┫╻╻╺┫╹╻╺┫┃┏━╸╻┏╸╻╹┗━┳┛╻╺┓╻╻╺┳┛┏╸╹┣┻╸┃┃╻╺━╋╸╻┗┳╸┃╹╺━╋╸╹╻╺━┳┛╺┫╹╹┃
┗━┻━━┻┻┻━━━┻━━━━┻┻┻━┻━┻━┻┻┻━━┻┻━┻━━━┻━┻━┻┻┻━┻━┻━━┻━━┻┻┻━━┻━┻━┻━┻━━━┻━━┻━━┻━━┻━╸╹
If you translate this function directly to Python, it’s about half as much code:
def show(self, output):
h, v = self.horizontal, self.vertical
for r in range(len(h)):
s = []
for c in range(len(v[0])):
k = (v[r][c],
v[r+1][c],
h[r][c],
h[r][c+1])
s.append(glyphs[k])
output.write(''.join(s) + '\n')
What was the other half? It’s not, mostly, that Python uses shorter
tokens. As far as I can tell, the only really missing part is the
types (and struct fields), although those are only about 62 of the
extra 300 bytes of code. In Python, we don’t have to declare that our
hash key is a context
or name its fields, declare the type of the
receiver, declare the type of the output stream, or declare the output
buffer as a byte slice.
A (non-modern) C++ or Java version would be far worse — it would have type annotations on almost every line of the code.
A non-word-for-word translation
The Golang function above was actually from a translation from a Python program which didn’t use a hash table to produce the characters but rather a search over parallel arrays:
def render(h, v): for hline, upline, downline in zip(h, v[:-1], v[1:]): row = [] for r, l, u, d in zip(hline[1:], hline[:-1], upline, downline): row.append(random.choice([c for i, c in enumerate(chars) if l == left[i] and r == right[i] and u == top[i] and d == bottom[i]])) yield ''.join(row)
This ended up being almost as long as the Golang version, a somewhat unfair comparison since it leaves out the Golang initialization code for the
glyphs
hash table:for i, glyph := range []rune(chars) { k := context{ up: topbs[i] == '1', down: bottm[i] == '1', left: lefts[i] == '1', right: right[i] == '1', } glyphs[k] = []byte(string(glyph)) }
The Python version will detect even method name misspellings at
runtime; if you said output.wriet
instead of output.write
, it
won’t complain. Similarly, it detects type errors, where you’re
calling it with no argument or a non-output-stream argument, at
runtime. And if one of the things in glyphs
isn’t actually a
string, or one of the keys isn’t a 4-tuple as it should be, or h
or
v
isn’t a two-dimensional array of strings of the right size, that’s
also a runtime error. Moreover, if you got the keys in the 4-tuple in
the wrong order, it won’t be obvious and Python won’t complain.
By contrast, most of these are checked at compile-time by the Golang
compiler. It constructs a simple formal proof that the output
argument has a Write
method: it’s an io.Writer
interface value,
and those have Write
methods, QED. The only one it doesn’t check is
the array size, a check it does do at runtime. But the Golang
compiler catches a fairly large fraction of bugs by using simple
theorem proving at compile time, bugs that the Python program would
need a test suite to catch. (I didn’t write the test suite.)
The Golang program also runs a lot faster. It generates a 301×301 maze (comparable to the Python program’s 300×300 maze) in 400 milliseconds, while the Python program takes 4.9 seconds to do the same task with the same algorithm. Both are single-threaded, so, crudely, the Python program is spending 92% of its time doing computations the Golang program doesn’t need to do. (The Golang program could in theory be benefiting from concurrent garbage collection, but its user+system time never exceeds the wallclock time by more than about 3%, so it can’t be benefiting much.)
A lot of those computations are the ones the compiler did at compile-time in order to find bugs, using the type annotations. Other optimizations, though, were choices of how to represent the program’s state in memory.
Declaring the output buffer as a byte slice was an optimization;
originally it was a string appended to with +=
. Indeed, having an
explicit output buffer at all was an optimization; previously the code
used fmt.Fprintf(output, "%c", glyphs[k])
. And the context
type,
with its four boolean fields, is presumably both considerably faster
to copy around and gentler on the garbage collector than Python’s
tuple of reference-counted pointers.
It’s tempting to think that proofs and type specifications are at a higher level of abstraction than a straightforward implementation, while an optimized implementation is working at a lower level of abstraction than that straightforward implementation. And you could imagine a way for that to be true, where you have a sort of cascade from high-level specifications (“the program must not crash”, “there must be only a single path through the maze”) down to lower-level more detailed specifications (“use Kruskal’s algorithm”) which fill out the high-level specifications into something closer to the metal, down to low-level micro-optimizations (“represent a cell context as a tuple of four booleans, represent those booleans as single bits in a byte, and index the glyphs numerically by context rather than by hashing”). But the above example shows that this is not really the way things work today. Perhaps it could be?
Ideally, you could separate the decisions about how to represent state (use a byte buffer, use a hash, use an array, use a four-byte struct) from the higher-level specification of the algorithm so that you could separately prove that the algorithm, abstractly, solves the problem it's needed to solve, and that the state representation is a valid state representation for that algorithm (that is, its semantics faithfully implements the semantics the algorithm needs). Given a system that could construct such proofs, you could automatically conduct a heuristic search for state representations that provide the required properties.
If an algorithm is written in a sufficiently abstract way, it can be automatically specialized to different contexts (given, for example, context-specific implementations of the operations it needs), while proofs of its properties can be written in terms of the abstract algorithm, perhaps with reference to properties of the concrete representation it's using. The C++ STL is one example of this approach, but also, for example, you can use Newton’s divided-differences polynomial-interpolation algorithm to perform polynomial interpolation in different fields — including, for example, Galois fields, where it provides not numerical approximation but Shamir secret sharing. But this can only be done automatically if the original specification of polynomial interpolation is written in such a way as to not be limited to floating-point.
Two very common optimizations: representing integers as fixed-length
n-bit bitstrings and approximating arithmetic on them with the
cyclic and multiplicative groups of ℤ/2ⁿℤ (“integer arithmetic”, in
the computer sense); and representing members of ℚ or ℝ as
floating-point numbers and approximating arithmetic on them with
floating-point arithmetic. The first is precisely correct in the
cases where we have no overflow or underflow, and where the overflows
and underflows cancel out. (In ANSI C, undefined behavior deprives us
of the second possibility unless we use unsigned
values.) The
second is approximately correct when the rounding errors are not too
large, which can be detected dynamically using interval arithmetic,
affine arithmetic, and friends, or statically using numerical analysis
(see Reducing the cost of self-verifying arithmetic with array operations for more on this relationship.) Also,
sometimes, we use fixed-length bitstrings with “integer division” to
approximate the richer arithmetic of ℚ or ℝ, with rounding-error
considerations similar to those of floating-point.
These are not minor or even constant-factor optimizations. Consider the recurrence that gives rise to the Mandelbrot set† — in ℂ this is zᵢ = zᵢ₋₁² + z₀, but for calculation we normally translate it into ℝ²:
xᵢ = xᵢ₋₁² - yᵢ₋₁² + x₀
yᵢ = 2xᵢ₋₁yᵢ₋₁ + y₀
If (x₀, y₀) ∈ ℚ², then all of the pairs in the recurrence will be in ℚ², so we can straightforwardly compute them precisely on a computer using arbitrary-precision arithmetic. However, if we do this, in most cases, the computation takes time exponential in i, because the numerators and denominators of the xᵢ and yᵢ double in length every iteration! The first time I implemented the Mandelbrot set in Squeak Smalltalk, this is what happened to me, because in Squeak, integer division produces arbitrary-precision rationals by default, not floating-point numbers.
So optimizing this computation by approximating the results with floating-point arithmetic makes it possible to calculate new items of the recurrence in constant time, and without the possibility of a memory-exhaustion exception, while the precise computation takes exponential time and also exponential memory.
Being able to express an algorithm like the computation of the Mandelbrot recurrence separately from a specification of the concrete realization of its operations in a computer could allow us to prove results about the algorithm in the abstract, then automatically specialize it for a particular concrete domain (such as 32-bit floating-point numbers) and determine which of these proofs still holds for that concrete approximation. For example, a proof that it computes terms in linear time would fail to hold for an arbitrary-precision realization.
As I mentioned above, the Python and Scheme implementations above of
the method of secants have the same feature — if applied to
arbitrary-precision rational objects, they can consume amounts of
space and time that increase exponentially with the number of
iterations, even when they erase previous results, precisely because
the expression of the algorithm in those languages can be abstract
over numeric types. So, although we could imagine a system which
would allow us to mechanically verify a constant-space guarantee for
that algorithm in the case where we erase intermediate values and use
constant-space numeric types, neither Python nor Scheme is that
system; and, although we could imagine a system which would allow us
to use the intermediate values computed by that algorithm without
explicitly exporting them with a yield
, neither Python nor Scheme is
that system either. We enjoy the worst of both worlds.
† I think Gaston Julia, not Mandelbrot, first called attention to the astonishing features of this simple recurrence, but like nearly everyone else alive, I owe my introduction to it to Mandelbrot.
All of the above is a sort of laundry list of ways that these different optimizations get mixed into our source code daily, leading to duplication of code, poor performance (because the cost of an optimization is so high), and lack of adaptability to new situations. But it doesn’t go very far toward exploring how we could really separate them in practice. Suppose you write the Mandelbrot recurrence the way we write recurrence relations in math books, as in my example above.