Hyperthreading illustrated

I'm fond of saying hyperthreading is a lie. It's true though; a dual hyperthreaded core is nowhere near as awesome as a four real cores. That's more provocative than useful, so let me draw you some pictures.

If you zoom way out, a single core, dual cores, and a single hyperthreaded core look like this:

Note how the hyperthreaded core is really a single core with an extra set of registers and instruction/stack pointers. The reason hyperthreading is a lie is you can't actually run four processes, or even four threads, at the same time. At best, you can run two threads with two others ready in the wings.

I'm a dilletante of processor architecture at best, but I think I can explain why chip designers would do this.

My best guess as to why you would design and release a hyperthreaded core is to increase the number of instructions you can retire (i.e. mark as completely executed) per clock cycle. Instructions retired per cycle is one of the primary metrics processor architects use for judging a design.

The enemy of instructions retired per clock cycle is memory accesses and branch mispredictions. When a processor has to go access a cache line or worse, something out in main memory, it has nothing to do but wait. When a branch (i.e. a conditional or loop) is incorrectly speculatively executed (how awesome is it that processors start executing code paths before they even know if its the right thing to do?) they end up in the same predicament. Cache misses and branch mispredictions are at best a recipe for some overhead, and at worst a recipe for waiting around on main memory.

Hyperthreading attempts to solve this problem by keeping extra programs (threads or processes) waiting in the wings, ready to start executing as soon as another pauses due to needing something from main memory. This means our execution units, the things that actually do math and execute the logic of a program, are (almost) always utilized and retiring instructions. And that gets us to our happy place of a higher instructions retired per clock cycle.

Why not just throw more execution units in and have real cores ready to work? I'm not sure, there must be something about how modern processor pipelines work that I don't know which makes it too expensive to implement. That said, hyper threading (as I understand it) is a pretty clever hack for improving the efficiency of a processor.