use case driven: threading

Thursday, 31 January 2013

Will you "spot the deadlock" with dupdop?

On "many cores", his blog dedicated to multithreading and concurrency issues, one of my friends proposed a new and very funny trivia game: "Spot the deadlock".

In a nutshell, dupdob will regularly present code snippets which exhibit some concurrency related problem, and will let every one of us finding what concurrency issue is present, why does it happend, and how can it be fixed.

The idea is to leave us one week to give our answers, before all the comments are shown.

The very first (easy) episode is already available here:

http://dupdob.wordpress.com/2013/01/23/can-you-spot-the-deadlock/

Will you spot the deadlock ?

Have fun!

Sunday, 2 January 2011

A new perspective for Ultra low latency performant systems

I just watched an awesome presentation by Martin Thompson and Michael Barker. They explained how they implemented their ultra low latency (with high throughput) systems for the London Multi-Asset eXchange (LMAX) and it's pretty impressive: 100 000 transactions per seconds at less than 1 millisecond latency in Java...

Since I'm working on FX and low latency systems in general about several years from now, I was very interested by their teasing (100K...). I have to admit that I was thrilled by their presentation.

For those that didn't have one hour to kill watching their video, here is a summary:

---HOW TO DO 100K TPS AT LESS THAN 1ms LATENCY----------------------------

UNDERSTAND YOUR PLATFORM
CHECK YOUR PERFORMANCE FROM THE BEGINNING
FOLLOW THE TIPS

---------------------------------------------------------------------------------------------------------------

UNDERSTAND YOUR PLATFORM

You have to know how modern hardwares work in order to build ultra low latency systems
The advent of multi-cores with their bigger and smarter caches (do you really know about how proc cache synchronization is working? false sharing drawbacks? etc)
Ok, free lunch is over (for Ghz), but it's time to order and use more memory!!! (144GB servers with 64bits addressing for instance)
Disk is the new tape! (fast for sequential access); rather use SSDs for random threaded access
Network is not slow anymore: 10GigE is now a commodity and you can have sub 10 microseconds for local hop with kernel-bypassing RDMA
(Not hardware, but) understand how GC and JIT work (under the hood)

CHECK YOUR PERFORMANCE FROM THE BEGINNING

Write Performance tests first
Make it run automatically and nightly to detect when you should start to optimize your code
Still no need for early and premature performance optimizations

FOLLOW THE TIPS

Keep the working set in-memory (data and behaviour co-located)
Write cache (cache-lines synchronization) friendly code (the rebirth of the arrays ;-)
Choose your data structure wisely
Queues are awful for concurrency access, rather use Ring Buffers instead (no stress: we just said that we bought lot of memory ;-)
Use custom cache friendly collections
Write simple, clean & compact code (the JIT always do better with simpler code-shorter methods are easy to inline)
Invest in modeling your domain. Also respect the single responsibility principle (one class one thing, one method one thing,...) and the separation of concerns
Take the right approach to the concurrency. Concurrent programming is about 2 things: mutual exclusion and visibility of changes which can be implemented following two main approaches. i) a difficult locking approach (with context switch to kernel), and ii) a VERY difficult atomic with no blocking (user space) instructions (remember how are implemented optimistic locking mechanism within databases). You should definitely choose the second one
Keep the GC under control. Because the GC may pause your application, you should avoid it by implementing (circular buffer) preallocation and by using a huge amount of (64bits) memory
Run business logic on a single thread and push the concurrency in the infrastructure. Because trying to put concurrency within the business model is far too hard and easy to get wrong. You would also turn the OO programmers dream of: easy to write testable and readable code. As a consequence, it should increase your time to market.
Follow the disruptor pattern which is a system pattern that tries to avoid contention wherever possible (even with business logic running on a single thread)

---------------------------------------------------------------------------------------------------------------

Ok, this presentation leave us with lot of questions about how they implemented their systems. However, I found this video very refreshing and interresting regarding several points (the need for huge amount of RAM with 64 bits addressing, the interest of network kernel-passing technologies with RDMA, the fact that queues are bad to handle concurrency properly, that clean and simple code doesn't prevent you from having excellent performances, that concurrency should stay out of the business logic, that the separation of concerns allow you to have very competitive time-to-market,... and of course that you need a write-performance-tests-first approach).

For the disruptor pattern explanations and much more, see the entire video of this presentation here: http://www.infoq.com/presentations/LMAX

Note: don't see this video in full-screen mode otherwise you will loose the benefits of the slides displayed below ;-(

Cheers, and happy new year to everyone ;-)

Tuesday, 2 September 2008

The vicious deadlock situation (the one that does not freeze the GUI but leaks memory)

This post is dedicated to windows forms UI deadlock situations.

I) The classical deadlock situation (the one that freeze the UI)

This happens when the synchronization with the UI thread is made in a synchronous manner (Control.Invoke(...), SynchronizationContext.Send(...) etc.).

Solution: To fix this kind of deadlock, you may use asynchronous API to delegate tasks execution to the UI thread.

Those asynchronous API may be :

System.Threading.SynchronizationContext.Post(...)
Control.BeginInvoke(...) - but this may require too much CPU due to the underneath .NET reflection usage - I'll write a post on that topic later)
...

II) The vicious deadlock situation (the one that does not freeze the UI but leaks memory)

This kind of deadlock is slightly more difficult to diagnostic. It happens when a "non UI" thread is making a call to the Invoke method of a Control that belongs to a closed Windows Form (this may happen just after a close to the Form).

In that case, the execution blocks indefinitely on the call to Control.Invoke() and the delegate supply to it never begins (see here). => this prevent the lock from being released by the blocked thread !

Solution: Prevent from situations where "zombie controls" are accessed (beware to unsubscribe all the .NET events to prevent from maintaining object alive) and use one of the asynchronous API to delegate tasks execution to the UI thread.

use case driven

Thursday, 31 January 2013

Will you "spot the deadlock" with dupdop?

Saturday, 2 July 2011

About Rx performances

Sunday, 2 January 2011

A new perspective for Ultra low latency performant systems

Tuesday, 2 September 2008

The vicious deadlock situation (the one that does not freeze the GUI but leaks memory)

Archive