Hard-Core Concurrency Considerations
Kevin Greer responded to
What Haskell teaches us about writing Enterprise-scale software with some insightful emails about the pros and cons of using purely functional data structures for managing concurrency in a multi-processor environment. I’ve reproduced them (with his permission) below.
Now, your first reaction might be, “Ah, Reg is not nearly as smart as he thinks he is.”
If you feel like giving me some credit, you can keep in mind that I was not writing the definitive essay on designing concurrent software, just pointing out some interesting overlaps between what I consider to be the most academic programming language around and the most practical requirements in business applications.
But there’s something more important to take from this.
The original post was a response to someone asking whether there was any value to the stuff you read on
programming.reddit.com. His query was whether reading about Haskell was
practical. My response was, yes it is, and here are some examples of where functional data structures have an analogue in concurrent data structures. My thesis (that’s a two dollar word for “point”) was that many “impractical” things have a lot to teach us about things we will encounter in the pragmatic “
real world.”
Java Concurrency in Practice is written by the people who actually designed and implemented the concurrency features in Java 5 and 6. If you are writing Java programs for two, four, eight, or more cores and CPUs (and isn’t that practically all server-side Java development?), owning and reading this book should be the very next step in your professional development.
Of course, most people read
the post and thought, “Cool, some neat stuff about concurrency.” Nothing wrong with that. If you value tips on concurrent programming (and
I certainly do), you’ll find Kevin’s emails very insightful.
And if you are still wondering whether you should look at “impractical” and “academic” material like Haskell, Erlang, or other things not being promoted by MegloSoft, consider that one of the papers Kevin cites describes a data structure running on a 768 CPU system. Is this in a University lab somewhere? No, it is in a company that promotes itself as an Enterprise-Scale Java company.
You simply can’t assume that everything the industry tells you to learn is everything you need. Or that any one source (Cool!
Raganwald has a new post on
Lazy Evaluation) has the definitive answer.
You need to commit to
life-long learning to be a software developer. Some of that learning is straightforward MSDN Magazine stuff, simple updates to things you already know. And some of that learning is a little further out on the curve.
Without further ado, here is one of the most comprehensive technical replies to a post on my blog I’ve received to date.
Concurrency ConsiderationsHi Reg,
I was just reading your article on concurrent collections and have a few comments:
- Just because readers do not update the structure doesn’t mean that they don’t need to synchronize. This belief is a common source of concurrency bugs on multi-processor systems.
Unless you synchronize on the root of your data-structure (or declare it as volatile), you can’t be sure that your cache doesn’t have a stale version of the data (which may have been updated by another processor). You don’t need to synchronize for the entire life of the method, as you would by declaring the method synchronized, but you still need to synchronize on the memory access. You hold the lock for a shorter period of time, thus allowing for better concurrency, but you still have to sync.
If you fail to synchronize your memory (or declare it as volatile), then your code will work correctly on a single CPU machine but will fail when you add more CPU’s (it will work on multi-core machines provided that the cores share the same cache). If you have a recursive datastructure (like a tree) then you will need to actually synchronize on each level/node, unless of course you use a purely functional data-structure, in which case, you’ll only need to synchronize on the root.
Given that you need to make a quick sync anyway, this approach is not much better than just using a ReadWrite-lock (it is slightly better because you can start updating before the readers finish reading (not a big deal for get()’s but potentially a big deal for iterators()), but then again updates are more expensive because of the copying).
- I don’t think that you should be using Haskell as a model of “Enterprise-scale” anything. “Enterprise-scale software” usually entails “Enterprise-scale hardware” which means, among other things, multiple-CPU’s. The problem is that Haskell purely-functional model doesn’t support multiple-CPU’s (it’s true, check the literature (except for specialized pipelined architectures, but not in the general case)).
The whole processing “state” is essentially one large purely-functional data-structure. The problem of synchronizing your data-structure appears to be eliminated, but it has really just been moved up to the top-level “state” structure (monad), where the problem is actually compounded. This is worse, because not only would you need to lock your structure in order to make an update, but you would in fact need to lock “the whole world”.
Some people will advocate launching one process per CPU and then using message passing to communicate between them. This is just a very inefficient (many orders of magnitude slower) way of pushing the synchronization issue off onto the OS/Network-stack. (Q: What do multi-core systems mean for the future viability of Haskell?)
- You forgot to mention the common technique of using “striping” to improve concurrency. Basically, what you do is create multiple sub-datastructures which each contain only a sub-set of the data. You can partition the data based on the hash of the key being stored. You then wrap the whole thing up so that it still has the same interface as the single data-structure.
The advantage of this approach is that when you use a ReadWrite lock you only need to lock a small portion of the data-structure, rather than the whole thing. This allows multiple updates to be performed in parallel. This is how Java’s concurrent collections work. See: ConcurrentHashMap: Java creates 16 sub-structures by default but you can increase the number when even more concurrency is required.
- Have a look at Azul’s non-blocking HashMap implementation. They can do 1.2 billion hashMap operations per second (with 1% of 12 million/sec of those being updates) on a 768 cpu machine (the standard Java ConcurrentHashMap still does half a billion ops/sec which isn’t bad either) . This shows how scalable non-functional methods can be.
I’ve never read of any Haskell or other purely-functional implementation being able to scale to anywhere near these numbers.
There’s another entire world of concurrency control, the world of independent actors communicating with fault-tolerant protocols. The world of Erlang. You can pre-order the most hotly anticipated book on the subject, Programming Erlang: Software for a Concurrent World, the definitive reference from the language's creator, Joe Armstrong.
Summary:
Using a purely-functional data-structure does make it easier to support multiple readers, but you still need to sync briefly at the “top” of the structure. Striping has the added advantage of also supporting multiple-writers as well, and in practice, this is a much bigger win. Haskell is inherently limited to a single CPU, which doesn’t match modern hardware; especially of the “enterprise” class. Shared-memory models of concurrency have demonstrated much better performance than purely-functional models. Best Regards,
Kevin Greer
p.s. I've actually implemented a b-tree structure for very large data-sets and chose to use a purely-functional data-structure myself. My reason for doing so was that I have some very long-running read operations (~8 hours) and I obviously can't afford a ReadWrite lock that's going to starve writer threads for that long.
Another nice advantage of using purely-functional data-structures is that they make it easy to implement temporal-databases that let you perform time-dependent queries.
I just wanted to point out that if all you have is quick-reads then purely-functional is no different than a simple ReadWrite lock and that a super-pure implementation, as with Haskell, doesn't scale to multiple-CPU's or to highly concurrent updates. However, it can be used to good effect in combination with striping or other techniques.
(A tangential note: Java's GC is getting so good in recent releases that P-F data-structures are becoming much more efficient (given that P-F generates more garbage).)
p.p.s. One more advantage of functional data-structures (the largest advantage for me actually):
They map well to log(or journal)-based storage. Functional data-structures never update old data, but instead, just create new versions. If your data-structure uses disk-based storage then this means that you never need to go back and overwrite your file; you only need to append data to the end of the file. This has two advantages:
- This works well with the physical characteristics of hard-disks. They have incredibly high transfer rates (10’s of Megs/sec) but very slow seek times (~ 200 seeks/sec if 5ms access time). If you are adding say 1k objects to a data-structure which requires 2 updates on average to a file then you’re looking at about 100 updates per second. If on the other-hand you write all of these updates to the end of the file then you’re looking at say 20,000 updates per second!
- You can’t corrupt old data with interrupted writes. The old data is always still there. The only place that a corruption occur is at the end of the file, in which case you just scan backwards until you find the previous good head of your data-structure.
This is fantastic stuff to share, thanks, Kevin!
What other tips can readers contribute for people building highly concurrent software (besides the frequent use of JProbe Threadalyzer, of course)? What online resources (papers, articles, books) do readers recommend for the professional developer?Labels: java, passion, popular