Summary (TL;DR)
Hand-written assembly in Quake doubled its framerate from 22.7 to 42.2 fps on a Pentium MMX 233MHz. The key optimizations were D_DrawSpans8 (providing 12.6 fps gain), R_DrawSurfaceBlock8_mip* (4.2 fps), and D_Polyset* (2.2 fps). Techniques included loop unrolling, self-modifying code to avoid registers, overlapping FDIV with integer work, using jump tables to prevent mispredictions, and exploiting the Pentium's dual pipelines and free FXCH instruction to hide FPU latency. TransformVector, for example, computed three dot products in parallel rather than serially, avoiding stalls.
Public access expired
Save this link to your readplace queue and read every link without expiration.
Save to My Queue