The Quake2World engine draws a fair amount of attention on account of its performance when compared to other Quake- and Quake2-derived engines. On certain hardware, it does benchmark as much as 400% faster than the Quake2 3.21 engine source, upon which it was originally based. And so I figured I'd take a moment to talk about some of my optimizations.
Quake2 was released in 1997. Hardware acceleration was only available on higher-end PC's, and things like multitexture and vertex arrays which are commonplace today didn't even exist then. So naturally, Quake2's rendering techniques appear very dated in 2008. Multitexture was made a part of the OpenGL specification in version 1.2.1, and is available on most 2nd generation hardware (TNT or newer). I strongly recommend cleaning up the renderer and removing any non-multitexture rendering paths.
Vertex arrays are a bit more effort to introduce correctly, but are also one of the keys to achieving higher geometric complexity in your scenes. Vertex buffer objects are rather easy to slot in right next to your vertex array code if you plan your GL state management intelligently. The advantage of using these techniques is that you can reduce the number of API calls per frame (e.g. glVertex3fv) by an entire order of magnitude.
The smartest way to do this is to assemble massive precomputed vertex arrays for all static geometry at level load. This includes the world model (.bsp) and all non-animated mesh models (.obj, .md3, ..). World surfaces do not need to hold references to all of their vertexes and texture coordinates, but instead they can simply hold an integer offset into the shared arrays created for the .bsp. With arrays in place, drawing a series of surfaces can be broken down to binding the arrays in client state and calling glBindTexture and glDrawArrays for each surface.
Texture binds (glBindTexture) are rather expensive too, and so to minimize these per frame, you should group the world surfaces by texture at level load. I use a level of indirection via pointers arrays to accomplish this. Arrays of surface pointers are assembled according to world texture, and I iterate over these arrays after marking the visible surfaces each frame. In this way, we draw by material, rather than drawing back-to-front like the BSP recursion naturally yields. This does pose problems for overlapping transparent objects, but there are workarounds for these cases.
There are additional benefits to decoupling your BSP recursion strategy from your rendering functions as described above. The flexibility gained here opens the door for multithreading, allowing one core to tackle the BSP work while the another adds particles and entities to the view, or sorts transparent objects for the Painter's algorithm.
These are probably the most drastic and beneficial changes I've found useful. There are many others, and I'm hopeful to find more. "The fastest way to do something is to not do it." Keep that phrase in mind, and keep plugging away at it.