Monday 14 February 2011

Write performant code: keep some fundamental figures in mind

It's funny to see the amount of time spent (i would say lost) by some developers to optimize their code in some location... where the ROI will be peanuts at the end of the day ;-(

Worse: premature optimizations. The real ones, I mean (I heard you Joe, and I agree ;-). The ones where  peoples sacrify readability and evolutivity to the hostel of performance (really? did you measured it concretely?). In most cases, a simple but wise choice of relevant types and datastructures within our code save us lots of time and energy without creating maintenability nightmares.

If you want to develop low latency and scalable solutions, it is obvious that you should know the core mechanisms of your platform (.Net, Java, C++, windows, linux, but also processors,  RAM, network stacks and adapters...). How the GC works, how the memory and proc cache lines are synchronized, etc.

But do you have in mind the cost of some elementary operations (in term of time and CPU cycles) when you are coding? How long it takes to make a classic network hop? How long it takes to make a typical I/O read? How long to access the memory depending on its current state?

As a reminder, here is some figures (some are borrowed to Joe Duffy's blog) that you should definitely post-it in front of your development desktop. If you don't want to improve your code blindly, it's important to know what things cost.

  • a register read/write (nanoseconds, single-digit cycles)
  • a cache hit (nanoseconds, tens of cycles)
  • a cache miss to main memory (nanoseconds, hundreds of cycles)
  • a disk access including page faults (micro- or milliseconds, millions of cycles)
  • a local network hop with kernel-bypassing RDMA & 10GigEth (sub 10 microseconds)
  • a LAN network hop (100-500 microseconds)
  • a WAN network roundtrip (milliseconds or seconds, many millions of cycles)

2 comments:

  1. Hi Thomas,

    Really nice to hear such fundamentals restated. I'd like to add that time spent 'optimising' subcomponetnts without understanding how they fit into the bigger picture is really a blind alley. Micro-benchmarks are too easy to get sucked into - eg one component that object pools may get killed by another that takes a 'gen0 is almost free' appraoch.

    Its possible to do some nice debug weaving stuff to assert things like object pooled resources are being released in a timely fashion but really, you cannot beat implementing an end-to-end test rig early and making it part of the CI build.

    ReplyDelete
  2. Hi Stelrad,
    I completely agree with your point. According to me, it's highly
    important to check the performance from the beginning with a TDD approach,
    and the support of nightly build stress test sessions. Improving
    performance and decreasing latency are quite never ending topics. So, you
    have to know when optimization sessions should start, and when to stop them
    (because you have to delivery something to your end-user at the end of the day ;-)
    Cheers

    ReplyDelete