Cheap textures

Kragen Javier Sitaker, 2018-10-28 (updated 2019-05-05) (5 minutes)

So I’ve been thinking about how to render low-computational-cost textures in 2D, for a few different reasons.

Objectives

Ultra-low-power e-paper computing systems; the objective here is for the energy cost of generating the texture data to be comparable to the energy cost of updating the e-paper to display it. As outlined in Keyboard-powered computers, updating an e-paper display costs on the order of 200 nJ per pixel update, while common 32-bit low-power processors use on the order of 300 pJ per instruction. This means we can afford something like 700 instructions per pixel, if we’re content to have the battery life be half of what it could be. But e.g. reducing below 175 instructions per pixel only increases your battery life by 25%.
Live video generation in software from low-memory computers; the objective here is to be able to display more interesting things on the screen than you have memory space for a full framebuffer. As mentioned in Notes on the STM32 microcontroller family, NTSC composite video has space for about 200 kilopixels in living color, but common 32-bit microcontrollers like the STM32 line have only 4–256 KiB of RAM, despite running at up to 200 MIPS. To the extent that the display contents can be encoded in RAM with many pixels per byte, then rendered in software to a scanline “framebuffer” or a “framebuffer” containing a few scanlines, they can be far richer within these limitations. 200 kilopixels at 30Hz is 6 megapixels per second. At 96 MIPS, that's a budget of 16 instructions per pixel, or 32 instructions per pixel if you’re satisfied with a 200×500 display.
Providing more powerful graphical primitives for GUI systems. Old GUIs, up to around 1998, were so constrained by the slow computers they ran on that they had to update the framebuffer incrementally, because there wasn’t enough time to redraw the entire screen in between screen updates. Modern GUIs’ programming model and appearance mostly mindlessly apes the GUIs of that epoch, although the IMGUI paradigm, gradients, filters, and transparency are making their way in; Self’s cel-animation-inspired effects have made their way into modern GUIs as animated transitions, notably in window managers and jQuery animations; CSS is popularizing arbitrary transform matrices; and Google’s Material Design is leading a move to physically-based rendering in user interfaces. But there's a much wider range of possibilities available. Even the CPU of the laptop I’m typing this on averages over 6 64-bit or 128-bit instructions per clock cycle at 1.6 GHz, or 10,000 MIPS, and its LCD screen is only 1920×1080 at 60Hz, or 124 megapixels per second. This is a computational budget of about 80 CPU instructions per pixel. Moreover, its integrated GPU (see Notes on the Intel N3700 i915 GPU in this ASUS E403S laptop) is capable of doing maybe 50 billion single-precision multiply-accumulates per second (invoked from 12.5 billion instructions), or twice that many in half-precision, and supports OpenCL; if this is correct, this would be a computational budget of about 400 multiply-accumulates per pixel (invoked from 100 4×SIMD instructions).
Reducing GUI latency and tearing. A 60Hz screen redraw is already 16.7 ms, which Dan Luu has convincingly shown is already significant in user experience. Double-buffering adds an additional 16.7 ms of latency, and doing it out of sync with the vertical refresh adds an additional random latency that ranges from 0 to 16.7 ms. Typical keystroke-to-screen latencies on modern computers are in the range of 100 ms, and decreasing that by 25% would be a significant improvement. The hardware constraints here are the same as in the previous item: 80 CPU instructions per pixel or 400 GPU operations per pixel.

So, all of these different ways that you can use low-computational-cost texture generation have different computational budgets (175 instructions per pixel for e-paper, 16 for bitbanging NTSC, 80 instructions and 400 multiply-accumulates for the laptop scenarios), but they’re all kind of in the same ballpark. Moreover, they’re in a completely different world from the Macintosh with its 6-MHz 1-Dhrystone-MIPS 68000 and its 9" 512×342 CRT, which I am assuming refreshed at around 60 Hz, giving it a dot clock a bit over 10 MHz — 10 pixels per instruction. They are, respectively, 1750 times faster, 160 times faster, and 800 times faster (not counting the GPU, which is arguably something like 4000 times faster), relative to the notional “dot clock”.

Techniques

Solid color filling
Linear gradients
Polygon filling
Alternating pixels (vertical, horizontal, or checkerboard)
LFSR uniform white noise
Adding images
Tiling
Mirrored tiling
Character generation
Strength-reduction
Bit-sliced cellular automata
Sierpinski textures
Bresenham’s algorithms
Splines
1-D and 2-D palette mapping
Zoneplates
Moiré
Thresholding/signum
Texture mapping
Animation
- Transitions
- Temporal dithering
- Dynamical systems
Domain displacement
- Subpixel jittering
Filtering
Perlin noise

To read

Topics

Performance (149 notes)
Algorithms (123 notes)
Graphics (91 notes)
Energy (63 notes)
Latency (19 notes)