In Dan Bernstein's grant proposal to the NSF from 2001, Circuits for Integer Factorization: A proposal, he writes:
A philosophical note. I always thought that common general-purpose computers were the pinnacle of realistic computational power. Special-purpose computer architectures, such as Lehmer's bicycle chain sieve or Pomerance's Cracker or Shamir's TWINKLE, were at best a constant factor faster. Quantum computers are asymptotically faster for many computations, but it is unclear whether they can actually be built.
I also thought that parallel computing reduced the time, not the cost, of computations. Ten processors might perform a computation in one tenth the time of a single processor, but they are ten times as expensive, so the cost of the conmputation remains the same.
I was wrong. Schimmler's machine, with m² processors, can be built for m^{2+o(1)} dollars, just like a single-processor computer with m^{2+o(1)} bits of memory. It can sort m² numbers in time m^{1+o(1)}, while the single-processor machine needs time m^{2+o(1)}. The cost of the computation has dropped from m^{4+o(1)} to m^{3+o(1)}.
I have a hard time believing this, but Bernstein is pretty reliable; maybe I should go back and read more about Schimmler's machine.