Better be weird

Kragen Javier Sitaker, 2019-06-17 (updated 2019-06-24) (9 minutes)

(Revised from some comments I posted on the orange website.)

Why is it crucially important, as Feynman thought, to disregard others’ opinions in order to be productive? In his case, part of the problem was that he was suffering from a fear of failure due to others’ high expectations of him. He was so worried about looking foolish that he couldn’t play with new ideas in the way that made him productive. But there’s a deeper and broader reason, one that goes far beyond performance anxiety.

In fields like art, programming, and physics, you’d better be doing something weird. If you’re doing something mainstream, the same thing thirty other people around the world are doing, you’re all competing to make the same thing — paint the same painting, write the same text editor, prove the same theorem about black holes. Twenty-nine of you are going to get scooped by whoever is the hardest-working, the smartest, the best-supported institutionally, or whatever combination of those turns out to be the deciding factor. Your chance of being in that 97% who have totally wasted their efforts? 97%.

And if you’re doing something really mainstream, like writing the next big client-side JavaScript framework, the one that will replace React, watch out! Your chances are a lot worse than that, because there are hundreds of thousands of people who fight with React every day and are frustrated with its shortcomings. Your chances are literally millions to one, unless you work at Microsoft or Google and have management support to beat those Facebook fuckers.

But if you’re doing something offbeat, working on a problem† that’s mainstream enough to be interesting if you’re successful but not mainstream enough that dozens of people are already spending their weekends trying to solve it, you have a much better chance of finding a niche for your project. Maybe it’s non-mainstream because people take for granted that it can’t be solved (in which case they might be right, as with the reactionless drive — the objective here is to be weird in your project goal, not your epistemology); maybe because they don’t understand why it would be important to solve it (“Where’s the market?”), and you do; maybe, as with OpenSSL, it’s an important problem, but there’s no way to get paid for solving it.

A thousand hackers writing a thousand versions of the same library in the same way are only epsilon more productive than one hacker. A thousand hackers writing a thousand different libraries are almost a thousand times as productive.

Winning the lottery? Well, that’s pretty much out of your control — but if you do decide to waste your money on the lottery, don’t pick a number lots of other people are picking. Then you’ll have to split the already-improbable winnings N ways.

But what if you’re determined to solve a mainstream problem anyway, one where a thousand people are also trying to solve it? Then you need all the outcome variance you can get! If, of those thousand hackers, 500 are using a very conservative approach that is guaranteed to solve the problem with some quality metric 10 ±1 in 26 weeks ±2 weeks (these being the standard deviations, not some kind of 95% confidence interval), while the other 500 are using all kinds of wild approaches that give them quality metric exp(ln(4)±ln(4)) in time exp(ln(52)±ln(4)) weeks, it’s pretty much guaranteed that the “winner” is going to be someone with a totally insane approach that hacked together a library with quality 49 in only 14 weeks. It’s not going to be one of the 26-week plodders, because 14 weeks is 12 standard deviations out on their distribution (vs. 0.95 out on the crazy hackers’ distribution), and quality 49 is 39 standard deviations out (vs. 1.3 out on the crazy hackers’ distribution).

In fact, it’s even worthwhile to sacrifice expectation to get higher variance in these situations. If your only hope of winning is to beat everyone else in a single round, you should do whatever will increase your minuscule chance of a home run, regardless of how it affects your chances of striking out.

It’s still not socially optimal for people to behave this way — note that here you have 1000 hackers whose aggregate productivity is only about 20× the productivity of an average plodder — but if you’ve gotten suckered into competing for a mainstream niche, that’s the way to play the game.

All of the above is for the simplified situation where you’re working on a project by yourself. In a teamwork situation, the relevant actor is your team, not you individually. Do not write your code in Clojure if the rest of the team is working in Ruby. Do not try to solve only problems that nobody else on the team thinks are important.

And this advice definitely does not apply to a situation where doing the same thing someone else already did is valuable. If you’re making a sandwich, there’s no reason it needs to be different from the sandwich someone else is making across the street. They’re two different sandwiches. If someone eats the sandwich across the street, it’s gone and it can’t feed your customer. They’re going to be happy if you make them a sandwich they like, even if it’s a little worse than the sandwich across the stret; even if there are many just like it, this sandwich is theirs. This is very different from the situation in software, where one sandwich feeds everybody in the world at once, except people with celiac disease. Nobody is going to be happy that you wrote a web browser from scratch for them instead of just installing Firefox. Web browsers are winner-take-all. Sandwiches aren’t.

You might think that after you work through a few iterations, the distribution would start to approach a standard normal distribution, and the mean would start to matter more. But this isn’t the case as often as you might think. If you’re hacking on free software, as you should be, then after the first iteration, everyone can start using the clearly-much-better thing that is already working, rather than wasting months finishing their own inferior versions. And the person who wins the race that time around may not be the same one who won the first time.

The other thing is that there are distributions like the Cauchy distribution that are so heavy-tailed that they don’t even have a mean, or even a variance. The Law of Large Numbers doesn’t apply to them at all! And even for more ordinary heavy-tailed distributions that do have means, like the lognormal distribution (relevant here since it’s empirically the distribution of how much we misestimate tasks by) the Law of Large Numbers takes a lot more than “a few times” to start making the distribution of the sum look normal. Try it! Pop open Jupyter and convolve the lognormal distribution with itself a few times! How long does it take before it even starts to look Gaussian? How long does it take before it still looks Gaussian in a log-linear plot?

On the other hand, if you took my first piece of advice and you’re working on something sufficiently weird, you no longer need to worry about increasing your variance further to beat the pack. There is no pack. Instead, anything you achieve will be a positive contribution; so, instead of grasping at straws to avert the almost-certain failure of competitors in a winner-take-all game, try to maximize your expectation of performance. That might mean increasing variance or it might mean decreasing variance, and it might depend on your utility function as well as the objective probabilities.

† I recognize that a painting is not “solving” a “problem”, but many of the same principles apply.

Topics