Making code faster requires insight into the particulars of how computers work. Processor instructions, hardware behavior, data structures, concurrency; it’s a lot of black art. Here’s a few things to read on the forbidden lore of fast programs:
Fast interpreters are made of machine sympathy. Implementing Fast Interpreters. What makes the Lua interpreter, and some JavaScript interpreters, so quick. Includes assembly and machine code details. Juicy!
Lockless data structures, the easy way. A Java lock-free data structures deep dive. How do those fancy java concurrent libraries work? Fancy processor instructions! Great deep dive.
Now is an interesting time to be a bottleneck. Your bottleneck is dead. Hardware, particularly IO, is advancing such that bottlenecks in code are exposed. If you’re running on physical hardware, especially if you have solid-state disks, your bottleneck is probably language-bound or CPU-bound code.
Go forth, read a lot, measure twice (beware the red herrings!), and make faster programs!