The ultimate capacity of human memory if spaced-practice memorization works as advertised

Kragen Javier Sitaker, 2017-01-04 (updated 2017-01-08) (14 minutes)

Here I am going to consider learning with ulterior motives — the learning we do in order to become more human and more capable, the ostensible motive of an educational system, rather than the learning we do purely for pleasure. None of what follows applies to recreational learning, because it all considers time applied to studying as a cost, not a benefit.

The spaced-practice psychology literature find that generally the ideal spacing interval is 10% to 20% of the time you want to remember something — so, say, 15 years if you want to remember it for your entire life. You would think that this would imply you need exponentially increasing study intervals, since it would seem that your chances of remembering that ∫cot x dx = ln |sin x| + C if you hear it once every 15 years are very poor indeed. So far, though, no convincing experimental evidence confirms this apparent common-sense conclusion; the studies that have been done did not find evidence that such “expanding-interval” spaced practice is superior.

However, it seems clear that the schooling system as it stands is at the opposite extreme. K-12 is 13 years of 36 weeks of 30 hours of torture, about 14000 hours of institutional child abuse in total. Abusive treatment aside, that’s a massive cost: about 7 years of normal working hours or 2.4 years of waking hours. You would like such a massive cost to be well-spent, providing an equivalently massive benefit, for example of education. But, by concentrating them in 13 years, we guarantee that any learning point with a 15-year practice interval is presented at most only once; the possible degree of recall after a single presentation is very low even when testing is the next day, much less decades later.

So, in order to achieve any kind of reasonable level of recall at all, schools waste the students’ time with massed practice, guaranteeing that nearly all of what is taught will be forgotten by adulthood. Worse, the waste of practice-massing in schools is fractal, happening at many levels:

  1. K-12 schooling is massed in 13 years rather than being spread throughout a person’s entire life;
  2. taking courses on different topics each year guarantees the loss of most learning even before K-12 schooling ends;
  3. summer vacation guarantees the loss of nearly all new learning from the last months of the school year, and although this could be avoided by devoting those months exclusively to review of earlier material, this is not done;
  4. moving from unit to unit within each course, with little attention given to material from previous units, guarantees the loss of most learning from the course even before the end of the course;
  5. pre-scheduled large examinations incentivize students to “cram”, massing their practice in the days immediately before the examination, in order to cheat the examination into indicating a level of mastery of the material that they do not in fact possess, or, more precisely, will lose within a week or two;
  6. and, although the evidence for this is less clear in psychological studies, hour-long classes are very likely to result in poorer retention from one day to the next than if they were split into two half-hour chunks at different times of day. Teachers assign homework to compensate somewhat for the loss of retention, but this worsens rather than improves the waste of time.

Psychological studies have shown 40% increases in learning throughput (that is, 70% as much study time to reach the same level of achievement) from adding a single level of distribution to practice sessions, which is something more than two standard deviations. If we could increase learning throughput by 40% six times in succession, the total increase would be 650%. A person whose life had not been wasted by any of these six levels of massing could — very speculatively! — perhaps learn seven times as much per hour of study, reaching apparently superhuman levels of intellectual achievement, at no increase in total study time.

However, this learning would be spread across a lifetime rather than concentrated in the 5–18 range. If we spread it across the 70-year-long 5–75 age range, it would be only 200 hours per year (33 minutes per day). If we were to accept the optimistic 650% improvement speculation above, this would be equivalent to some 1500 hours of regular K-12 schooling per year, only moderately better than the 1080 hours in the standard system. You would only start to see a really significant difference after age 18, when the students using a program rationally designed to optimize their learning, rather than subjugate them and provide fake “educational achievement” on tests, continued learning, and their knowledge and competence continued to grow, while the victims of industrial-age schooling instead began the inexorable intellectual decline that is such a universally-remarked- upon phenomenon today.

(These numbers change a bit if we include “higher education”, which, for an undergraduate degree, is typically about 5 years of about 40 hours a week, 28 weeks a year, between lectures, labs, and homework, about 5600 more hours; most of the same criticisms can be applied to it, some a fortiori. This would bring us to almost 20000 lifetime hours of schooling, 280 hours a year if spread across 70 years.)

This even allocation of 200 hours per year represents a tradeoff, though, and probably not a good one. If you learn a skill at 10 years old and die at 75, you can use that skill to your advantage for 65 years. But if you learn it at 65 years old, you can only use it to your advantage for 10 years. This is the logic that underlies the traditional soul-destroying system of studying first, then working, and in itself it isn’t flawed; it just doesn’t take into account the psychological discoveries made since the 18th century. It’s yet another form of the exploration-exploitation tradeoff, a general feature of bandit problems.

A better tradeoff would weight the expense of learning toward the beginning of life, while leaving the human more time to exploit their skills toward the end of their life. The optimal curve depends on many factors, including your model of how skills improve life utility, how studying fatigues you, how risk-tolerant you are about dying early or late, the probability of dramatic human life extension, and so on, but a linear reduction from 400 hours per year at 5 years old down to 0 hours per year at 75 seems likely to capture the majority of the benefits from the optimal curve.

On this study schedule, you start by studying 66 minutes per day on your 5th birthday and taper down by about 56 seconds per year (154 milliseconds per day) until reaching 0 on your 75th birthday. On your 18th birthday, when you would normally graduate from high school, you are down to studying almost 54 minutes per day, and you’ve spent 4740 hours of study, about a third of what the victims of high school have. But, because they were in a system optimized to waste their time, you’ve learned more than twice as much.

How much is that? If we take the 1.8 minutes per “card” in Gwern’s review of spaced repetition systems as a good approximation, it would be about 158 000 “cards”, somewhat more than the users of these systems report having memorized, somewhat more than the number of words in the Oxford English Dictionary. Of course, most knowledge cannot be separated into isolated “facts” or “cards” in this way, but I suspect that improves the situation for our optimized learners — when you work on a real-world problem, you are inevitably practicing many procedural skills at once, while recalling a flash card only reminds you of a single fact.

This suggests that the speculation in my 2010 post, “could we learn a new foreign language every week?”, might actually be plausible, as unlikely as it sounds. The 70-hour budget mentioned there works out to about 2 hours per year or about 2300 “cards”, which is not far from the size of existing Anki decks designed to reach basic competency in a foreign language; that post estimates that you really need more like 8500 cards’ worth, which would work out to 260 hours rather than 70.

Devoting an hour a day to a system like Anki, it’s reasonable to add about 30 “cards” per day with an hour a day devoted to practicing them, so you can’t compress those 8500 cards into less than about 280 days, and that only works if that’s the only subject in which you’re adding new “cards”. While this takes into account that you’re continuing to practice other things during that time, it seems likely that learning will be more efficient if you aren’t adding material entirely from a single subject, spacing out the addition of similar things. I’ve found, for example, that memorizing the atomic weights in the periodic table, I frequently confuse elements that I add in the same day (especially if they’re otherwise similar), and in my hanzi recall practice, I’m currently confusing the characters 是 and 在, even though they look nothing alike, just because they’re the same kind of thing and I learned them at about the same time.

However, if you add those 8500 cards over the course of, say, four years, you could easily speak three foreign languages at a basic level by the time you’re 18, at a total cost of about 1000 hours out of the 4700 total prescribed above.

(It seems likely, though not certain, that acquiring languages has a critical period in childhood and later becomes much more difficult or even impossible to do in the same way, so it might be better for small children to spend more of their time on foreign-language learning and less on, say, math and history.)

At 8500 cards per language and 158 000 cards in all, you could expect to speak 19 languages with basic fluency in an average lifetime if you were to focus entirely on foreign-language learning, all in the same amount of time a normal person loses to K-12 education. This may not be the best use of your time, but it gives a very plausible example of the kind of superhuman intellectual achievements we could expect with an optimized learning system, even if all we optimize is the practice schedule. Note that this number is not computed from the speculative exponential computation of the waste of traditional K-12 education above, but from observed time per card in existing spaced-repetition systems and a speculative guess that you can achieve basic fluency (roughly equivalent to that of a six-year-old native speaker, who has some 6000 vocables) with 8500 cards per language.

As some kind of indication that such large improvements are possible, I’m currently studying the Hebrew aleph-bet using Anki since a week ago. Currently I’m at 80% correct on the review cards and have spent a total of 21 minutes on its 22 cards over those 7 days. I’d probably be doing better if I hadn’t added half the cards all at once on Saturday, resulting in confusion between some letters like taf and tet; it will probably take me a total of more than 40 minutes to finish memorizing the whole thing. But how long do Israeli children or Hebrew students normally spend learning the alphabet? I think it’s typically several hours.

Above I’ve talked about how you could spend the same 14 000 hours of K-12 education or 20 000 hours of an undergraduate degree in a much more efficient manner, learning several times as much, at the same cost in time and probably without even learning more slowly. However, in practice, the Jevons paradox will probably kick in. Just as improving the efficiency of steam-engines, allowing them to do more work on the same amount of coal, resulted in them being applied to new applications and increasing the total consumption of coal, it seems likely that if time spent learning things is dramatically more effective, people will consider it worthwhile to learn more things.

So, a typical optimized learner probably will not spend only an hour a day studying in this new, more efficient manner; they might be unable to resist spending two or three hours a day, even if the learning is not quite as efficient, learning perhaps 50% or 100% more. At this rate, an average learner might complete the equivalent of an undergraduate degree’s 20 000 hours in 6000 hours, sometime around age 12. (XXX actually do the calculation!)

Topics