Commentaries on reading Engelbart’s “Augmenting Human Intellect”

Kragen Javier Sitaker, 2018-12-24 (updated 2018-12-25) (25 minutes)

My commentaries on reading Engelbart’s 1962 “Augmenting Human Intellect: A Conceptual Framework” for the first time, rather belatedly in 2018.

The date 1962 is rather crucial in understanding this work, because he talks a lot about computers, and computers were changing rapidly around this time. 1962 is the same year Sutherland (mentioned on p. 72) wrote Sketchpad, published in 1963, in which he invented graphical user interfaces, object-oriented programming, computer-aided design, and constraint-based programming, for example; and 1962 is the year IBM published their book on Project Stretch, edited by Werner Buchholz, which ran from 1954 to 1961. The first integrated circuits were fabricated around this time, the first real compilers were written in the years 1958–9, and the Lisp paper was published in 1959, based on an implementation on a vacuum-tube IBM computer. DEC had just been founded, and Spacewar! had just been written. The IBM 360 project, with its ambition to unify “business” and “scientific” computing on a single general-purpose computer, had not yet begun. Timesharing operating systems, like compilers, was an avant-garde technique — Stretch was designed for timesharing operating systems, but most computers did not use them, or indeed any operating system. Nowadays instead of “timesharing” we call it “multitasking”.

(I am not sure when Expensive Typewriter, the first interactive word processor, was written.)

In short, this report was published immediately after the creation of the compiler, the high-level programming language (“problem-oriented language (e.g., ALGOL or COBOL)”, p. 17), the computer game, the operating system, and multitasking (“what is known as ‘time sharing’”, p. 70), and immediately before the publication of the graphical user interface and object-oriented programming.

Moreover, this was three years after Sputnik, and airlines had just started flying jet airplanes ten years before, adding a new increment in human velocity to some 300 m/s (600 mph in archaic units), up from the 8 m/s that had reigned from the domestication of the horse until the introduction of the automobile seven decades earlier; supersonic jets were being flown by the military, but the Concorde wouldn’t offer passenger supersonic service until 1976. In retrospect, the continuous, spectacular increase in human velocity Engelbart could look back on ended almost precisely in 1962, with the exception of a few dozen astronauts.

This was also the time period when full automation was being touted as an imminent end to work as such, before the difficulties of robotics were appreciated. Engelbart doesn’t mention this directly, but his report is part of the same intellectual ferment that sprang up to try to make sense of the possibilities of the new technologies.

Given this atmosphere of radical ferment, it’s interesting that a significant part of Engelbart’s paper (pp. 48–56) is dedicated to discussing “As We May Think”, published in 1945, 17 years earlier; Engelbart quotes its description of the Memex in its entirety. I’m not sure when Engelbart came upon the paper (and I missed my opportunity to ask him) but that 17-year gap is notable. He cites Licklider’s “Man-Computer Symbiosis”, various list-processing things (including IPL-V and LISP 1.5, on pp. 65–66), and a couple of items in passing from Ross Ashby, but in nothing like the depth he dedicates to the Memex.

Engelbart’s famous “Mother of all Demos” in 1968 was the result of six years of work at SRI attempting to realize the vision he set out in this 1962 report.

Are we augmented yet?

On p. 3, Engelbart explicitly cites the velocity progress I mention above as his antecedent:

there is no particular reason not to expect gains in personal intellectual effectiveness from a concerted system-oriented approach that compare to those made in personal geographic mobility since horseback and sailboat days.

This leads me to ask, 56 years later, have we achieved those gains? Can I do intellectual things now that would have taken 150 people to do in 1962, or can I do things now in a day that would have taken me four to six months then?

In a few domains, I think I can. In purely calculational terms, I have balanced on my paunch a laptop capable of about a hundred billion multiplies per second; its RAM is 4 gibibytes with a memcpy speed of about 2 gigabytes per second (4 gigabytes/second or 32 gigabits/second unidirectional), and its SSD holds 100 gigabytes of data and can be read at 75 megabytes per second (600 megabits per second). Stretch’s disk was 8 megabits per second, or 125 000 64-bit words per second, with a total size of 2 megawords (16 megabytes) and a seek time of 150 milliseconds (p. 20 of Buchholz), and it needed about 2.5 microseconds per instruction (p. 32 of Buchholz), some of which could be multiplies; its memory cycle time was 2.1 microseconds per 64-bit word (p. 17 of Buchholz), giving 32 megabits per second, with a maximum of 2¹⁸ words (1 mebibyte). Moreover, Stretch was shared by all of Los Alamos National Labs, which I think was several hundred people at the time, though perhaps only a few dozen of them had access to the computer.

(On pp. 66–67 Engelbart describes the capabilities of a typical computer of his day: 100,000 6-bit bytes of RAM costing 60¢–US$1.50 per byte, 2–10μs per machine instruction, with a megabyte-sized magnetic drum costing 5¢ per byte, or a hundred-megabyte-sized disk costing 0.14¢ per byte.)

So, roughly speaking, my laptop is 2.5 million times faster than Stretch at multiplying (and arithmetic in general), 1000 times faster than Stretch at accessing its RAM, of which it has 4096 times as much, and 75 times faster than Stretch at accessing bulk storage, of which it has 6000 times as much; and this computing power is distributed among about 1000 times fewer people.

As a consequence of this extreme arithmetic power, it can do things like real-time raytracing, software-defined radio, and speech recognition. Mostly, however, this arithmetic is devoted to decompressing porn and other video. I don’t have any speech-recognition software on here.

I can and occasionally do use it for tasks like 3-D design, and designing and testing signal-processing algorithms in IPython, and I have a library of many technical papers, including the Engelbart report I’m currently reading. Due largely to format incompatibilities, I don’t have an easy way to search through the library for key words, although searching through an individual book takes only a few seconds. Quoting one document in another, as I did above, often requires manual retyping — evince’s user interface sometimes doesn’t respond to mouse drags, and it doesn’t offer an interface to easily display an excerpt from a PDF or DjVu file, even though those file formats contain internal indices to permit random access.

I can easily prepare a document plotting a bunch of equations and simulations, complete with nicely-typeset equations, using IPython; TeX and HTML provide me with document-preparation capabilities Engelbart’s whole team couldn’t pull together in 1962, even leaving aside the ability to distribute the results immediately to the world.

I wrote a Tetris game a couple of days ago. It took me three or four hours, using the C programming language from the 1970s, slightly augmented with type-checking from the 1980s. Using a more modern programming language would have simplified it only slightly. It doesn’t handle key-repeat in the way I would like, which is substantially more difficult due to decisions baked into the X11 protocol in the 1980s to make terminal emulators easier to write; fixing that will probably take several more hours.

When checking one of the assertions in Engelbart’s report, I quickly calculated how many words you’d need in a lexicon for it to cover half of the words in a sample corpus (95, for the British National Corpus). This took me 4 minutes.

Wikipedia, Stack Exchange, arXiv, the Internet Archive, and other sites with books and academic papers are perhaps the things that most closely approximate Engelbart’s vision of jet-like “gains in personal intellectual effectiveness”, since there I can easily get information about a wide range of topics. Unfortunately, though, they can only provide answers to questions that someone else has already learned the answers to; they are immensely useful for describing known consequences, but they are not useful for working out the consequences of situations I just dreamed up.

Consequently, I do most of my noodling in the same way I would have 56 years ago, with typed notes and a pencil and paper, but with quicker access to the library of knowledge represented by Wikipedia and the like, and the easier revision Engelbart suggests on p. 13 (though Emacs is considerably more fluid than the OCR stylus he suggests, more like the text editing process he suggests on p. 16, pp. 77–80, and p. 84). I have instant access to the worldwide community of knowledge workers, but this is useless; they largely waste their time on trivialities and infighting.

Nothing approaching Engelbart’s vision of the augmented architect has been realized, despite the extensive computerization of architectural drafting (p. 5):

He checks to make sure that sun glare from the windows will not blind a driver on the roadway, and the "clerk" computes the information that one window will reflect strongly onto the roadway between 6 and 6:30 on midsummer mornings. … Finally he has the "clerk" combine all of these sequences of activity to indicate spots where traffic is heavy in the building, or where congestion might occur, and to determine what the severest drain on the utilities is likely to be.

I conclude that, so far, Engelbart’s unfinished revolution is still unfinished.

The failure of diffusion

Remarks on pp. 16–17 express Engelbart’s forlorn hope that a centrally-planned research effort could “greatly accelerate” the evolutionary process of diffusion of innovations:

Normally the necessary equipment would enter the market slowly; changes from the expected would be small, people would change their ways of doing things a little at a time, and only gradually would their accumulated changes create markets for more radical versions of the equipment. Such an evolutionary process has been typical of the way our repertoire hierarchies have grown and formed.

But an active research effort, aimed at exploring and evaluating possible integrated changes throughout the repertoire hierarchy, could greatly accelerate this evolutionary process. The research effort could guide the product development of new artifacts toward taking long-range meaningful steps; simultaneously, competitively- minded individuals who would respond to demonstrated methods for achieving greater personal effectiveness would create a market for the more radical equipment innovations. The guided evolutionary process could be expected to be considerably more rapid than the traditional one.

Perhaps if he had studied the accumulating literature on, for example, diffusion of agricultural innovations in poor countries, he might have had a better plan. In retrospect, it took 20 years for even the beginnings of his innovations to be adopted even by office workers, and 40 years to be broadly adopted by society.

How did Engelbart conceive training would happen?

Engelbart put a lot of emphasis on the importance of “training”, which is an important part of the process of diffusion of innovations, but he seems to have lacked an anthropological or ethnographic conception of how this training would come about; or at least he is silent on the topic.

It’s interesting that Engelbart takes as his starting point “a man”, that is to say, an individual, rather than an institution, grappling with problems. (The unexamined sexism in the terminology was de rigueur in formal contexts in the US in 1962.) To my mind, much of the underlying zeitgeist of the work is collectivist, in the sense that Engelbart imagines institutions supporting the individual (“pursuit by an enlightened society”); he speaks of “diplomats, executives, social scientists, life scientists, physical scientists, attorneys, designers,” perhaps with the implicit presumption that the users of his system will be supported not only with a powerful computer system but also with secretaries, administrative staff, a purchasing department, and so on. His “H-LAM/T” framework (p. 9) cites the necessity of “training”, but never discusses the social context of the training — is it provided to new employees by a company, individually undertaken by problem-solvers who want to get augmented, or required of the whole population by a government?

In the 1980s, when some of Engelbart’s ideas finally gained broad adoption, training was precisely the point on which he remained outside the mainstream, and which made his continuing research progressively less relevant to that mainstream; CHI or HCI (already purged of the implicit sexism of the term “man-machine interface” term, or “man-artifact interface” as on p. 20) began to take advantage of findings from cognitive science with an eye to market competitiveness, while Engelbart did not.

On training, pp. 9–10:

[W]hile an untrained aborigine cannot drive a car through traffic, because he cannot leap the gap between his cultural background and the kind of world that contains cars and traffic, it is possible to move step by step through an organized training program that will enable him to drive effectively and safely.

Of course, part of this is that the first part of the report is purely descriptive; it is not prescribing courses of action so much as it is describing the human world as Engelbart sees it. On p. 30 he talks a bit more about the circumstances of training in the context of possible research programs:

For instance, some research situations might have to disallow changes which require extensive retraining, or which require undignified behavior by the human. Other situations might admit changes requiring years of training, very expensive equipment, or the use of special drugs.

(In this context it is worth pointing out that the Tuskegee Experiment was still ongoing at the time, no such thing as an IRB existed, LSD was still legal, and the MKUltra research program was in full swing, although it wouldn’t be revealed to the public until 1975.)

This is unfortunate in part because the advent of computer games showed that computers were capable of producing fairly extreme learning performance by setting up a sort of Skinner box. But Engelbart was writing too early, and on the wrong coast, to have observed Spacewar!, and in any case computer games didn’t really have a significant number of players until the 1970s. Consequently his focus, in practical terms, was mostly on designing new artifacts (in his H-LAM/T breakdown) to perform work for humans, then training humans to use them, rather than on designing new forms of training.

The lack of emphasis on communication

As I said above, Engelbart’s focus is primarily individualist, focusing on the individual in their efforts to solve problems, but implicitly collectivist; he treats communication and community as secondary to individual thought and action, e.g., p. 22:

Humans made another great step forward when they learned to represent particular concepts in their minds with specific symbols. Here we temporarily disregard communicative speech and writing, and consider only the direct value to the individual of being able to do his [sic] heavy thinking by mentally manipulating symbols instead of the more unwieldly [sic] concepts which they represent.

In a sense, this is the same lacuna represented by the mysterious absence of discussion of the social context of training.

He says, “temporarily,” but so far I have not encountered where he analyzes communication in such a way.

On pp. 90–91, he does mention the utility of rich hypertext structures with typed links for communication, and even talks a bit about pedagogy:

Well, when you ever get handy at roaming over the type of symbol structure which we have been showing here, and you turn for this purpose to another person's work that is structured in this way, you will find a terrific difference there in the ease of gaining comprehension as to what he has done and why he has done it, and of isolating what you want to use and making sure of the conditions under which you can use it. This is true even if you find his structure left in the condition in which he has been working on it--that is, with no special provisions for helping an outsider find his way around. But we have learned quite a few simple tricks… Some of these techniques are quite closely related to those used in automated-instruction programming--perhaps you know about 'teaching machines?'

The power of the unpredictable and of irrationality

Engelbart touts of unpredictable processes, claiming that the power to use them is one of the largest benefits of the computerized augmentation he proposes (p. 45):

When the course of action must respond to new comprehension, new insights and new intuitive flashes of possible explanations or solutions, it will not be an orderly process. Existing means of composing and working with symbol structures penalize disorderly processes very heavily, and it is part of the real promise in the automated H-LAM/T systems of tomorrow that the human can have the freedom and power of disorderly processes.

Simultaneously, though, he perceives the subconscious mind as mostly an enemy due to its irrationality:

Clinical psychology seems to provide clear evidence that a large proportion of a human's everyday activity is significantly mediated or basically prompted by unconscious mental processes that, although "natural" in a functional sense, are not rational. … It may be that the first stages of research on augmenting the human intellect will have to proceed without being able to do anything about this problem [emphasis mine] except accommodate it as well as possible.

On reasoning and argument structures

On p. 62 Engelbart is discussing the process of revising a research design:

…when several consideration statements bore upon a given product statement, and when that product statement came to be modified through some other consideration, it was not always easy to remember why it had been established as it had. Being able to fish out the other considerations linked to that statement would have helped considerably.

Modern automated software testing supplies this in a somewhat restricted form: if you change the code and some tests start failing, you can go back and read those tests to find out what other considerations led to the code behaving as it had; but of course not everything can be described in code yet. Proof assistants perform a similar task in a more rigorous fashion for mathematical proofs. Requirements-management systems like DOORS also attempt to do this for more general structures of non-machine-readable statements.

He goes into more detail about his proposed antecedent-consequent links on pp. 84–88.

Chording keysets and premature commitment

On pp. 74–79, Engelbart describes the chording keysets that his system used into the 1980s:

He could hit a great many combinations of keys on his keyset--i.e., any one stroke of his hand could depress a number of keys, which gave him over a thousand unique single-stroke signals to the computer with either hand.

He was quite optimistic about the speed of this system, with a quite reasonable information-theoretic basis:

It seems that, for instance, the 150 most commonly used words in a natural language made up about half of any normal text in that language. Joe said that it was quite feasible to learn and use the single-stroke abbreviations for about half of the words he used, but beyond that each added percent began to require him to have too many abbreviations under his command. …

(As it turns out, the 95 most commonly used words, without lemmatization or even stemming, make up half of the text in the British National Corpus; the 150 most common words make up 54.43% — though this disregards the tail of words that occurred less than 5 times, which might indeed amount to the 9% of the total that would make his statement correct. Thanks to augmentation, I could calculate this in about four minutes by writing an 8-line Python script. Doing the same exercise on words extracted from an earlier draft of this note, I needed 179 words, drawn from the 223 most common words in the BNC, to reach 50% of the words in this note; some common words, such as “she”, “her”, and “know”, did not appear. 6.3% of the total words in this note, such as “1968”, “lemmmatization”, and “keysets”, don’t appear in the BNC at all. Incidentally, the point in the curve at which a given word makes up 0.1% of the corpus — so it could conceivably be efficient to enter it with a 10-bit abbreviation code — is just about that 50th percentile (95 words) in the BNC; this note is too short to give good statistics at that level. This greater rigor brought the total to almost 40 minutes rather than 4.)

A whole word so abbreviated saved typing all the letters as well as the spaces at either side of the word, and a word-ending abbreviated by a single stroke saved typing the letters and the end-of-word space. He claimed that he could comfortably rattle off about 180 words a minute--faster than he could reasonably talk. … He made some brief references to statistical predictions that the computer could make regarding what you were going to type next, and that if you got reasonably skillful you could "steer through the extrapolated prediction field" as you entered your information…

In practice, though, the chording keysets constructed by his team didn’t really reach above about 50 words per minute, according to anecdotes. Part of this may have been that they ended up using only one key per finger (the “over a thousand” quoted above suggests two bits per finger, e.g., four keys per finger or two keys per finger that can be concurrently depressed) but other parts may have been the lack of the extensive abbreviation dictionary described.

This leads one to ask: why didn’t Engelbart give up on his own keyset design at some point and use a Stenotype keyboard? Stenotype operators regularly reach well over the 180 words per minute he was hoping for. Instead, he apparently continued to hope that the design he came up with in 1962, before doing any experiments, would eventually pan out. That was a terrible idea; it prevented him from taking advantage of the 20+ years of experience he gained in the meantime.

Similarly, the outliner structure of NLS and Augment already appears on p. 84.

Topics