Worse: premature optimizations. The real ones, I mean (I heard you Joe, and I agree ;-). The ones where peoples sacrify readability and evolutivity to the hostel of performance (really? did you measured it concretely?). In most cases, a simple but wise choice of relevant types and datastructures within our code save us lots of time and energy without creating maintenability nightmares.
If you want to develop low latency and scalable solutions, it is obvious that you should know the core mechanisms of your platform (.Net, Java, C++, windows, linux, but also processors, RAM, network stacks and adapters...). How the GC works, how the memory and proc cache lines are synchronized, etc.
But do you have in mind the cost of some elementary operations (in term of time and CPU cycles) when you are coding? How long it takes to make a classic network hop? How long it takes to make a typical I/O read? How long to access the memory depending on its current state?
As a reminder, here is some figures (some are borrowed to Joe Duffy's blog) that you should definitely post-it in front of your development desktop. If you don't want to improve your code blindly, it's important to know what things cost.
- a register read/write (nanoseconds, single-digit cycles)
- a cache hit (nanoseconds, tens of cycles)
- a cache miss to main memory (nanoseconds, hundreds of cycles)
- a disk access including page faults (micro- or milliseconds, millions of cycles)
- a local network hop with kernel-bypassing RDMA & 10GigEth (sub 10 microseconds)
- a LAN network hop (100-500 microseconds)
- a WAN network roundtrip (milliseconds or seconds, many millions of cycles)