Short words

Kragen Javier Sitaker, 2019-12-10 (updated 2019-12-11) (4 minutes)

I like programming-language tokens to be short; if alphabetical, I like them to be comprehensible. This can lead code to look like Urbit, but no matter. In particular I think Perl's "my" is significantly better than "var", "val", or "let", and JS's "const" is a lose. Similarly, Darius Bacon's language Cant uses "#yes" and "#no" rather than "true" or "false".

The most frequent 128 three-letter words in English, according to the British National Corpus, are:

the and was for you are not but had his she her all has one can who him its two out did now may new any see how get way our got own too say erm day yes man use put old why off end men set yet six war car saw let far law big act job age run try pay ten mrs ago ask few air god sir lot cos bed tax top art cut bad per boy bit son sea red nor gon low buy sat met cup oil led lay eye arm win hot sun ran box sit tea won sex add aid dog key mum bar eat mhm gas hit dad dry fit inc aim due die leg ian bus aye ltd tom

Most of these are at least real words, though "Inc.", "Ltd.", "Mrs.", "cos", "gon", "mhm", "Ian", and "erm" did make it in there, some of which may not be real words, depending on your definition. "tom" (presumably mostly "Tom") occurs 5063 times.

There are 39 two-letter words that occur more frequently than that:

of to in it is on he be by at as or we an do if so no up my me go er us oh mr mm ca am uk wo na de st dr ah ii tv ec

These are mostly words, but it starts getting pretty dubious at the end there.

There are 320 four-letter words that are this common, but 320 is too large a vocabulary. The most common 128 are:

that with have this they from were been will what said more them some into then time like only your just also know well very than most over back much many yeah work down make good such year must last take even here come both does made same when want life need used home each part went look came four give mean next case find long five says took away seen fact less done area help hand best head side days john left week form face room tell able high told half eyes keep once road open full knew feel ever name mind door body book main show upon gave real view line city felt kind idea read sort care else free thus past love play land gone

The most common 128 words in English, other than those mentioned above, are:

a i which there their would about could other these people first should think between years being those because three through still after right going before government might under however world another while again against never something thought house number different really children within always without local system great during small place although little things social group second quite party every company women later given important point information national often school money public night further found since better around british having thing london taken perhaps state family water though already possible nothing where business large young whether enough development country almost council power until himself political become times service members change problem doing court towards major anything others police either problems interest probably asked available labour today education

A potential disadvantage to using real words in your programming language is that people are more likely to try to use them as identifiers, and depending on the language design, that may or may not be possible.