Amortized Costs, DOS, and Magicicadas
Jeff Atwood has just posted an interesting article,
Hashtables, Pigeonholes, and Birthdays. Hash Tables are a very interesting subject, you can take almost any aspect of them and follow it down interesting trails.
For example:
Performance: Quantifying the performance of a data structure like a Hash Table is interesting. There are measures like
Amortized Cost as well as
Worst Case Cost. This subject is one of the core themes of the book
Purely Functional Data Structures
. (You can also
download the author’s thesis on the same subject) I recommend it whether you have an interest in “purely functional” programming or not: “purely functional” data structures provide all of the imperative and side-effect oriented operations you know and love—like insertions and removals—while also providing the foundations for highly concurrent environments like threads with shared memory.
Security: One-way functions are strongly related to hash functions. They are the foundation for many of today’s security and secrecy protocols. For example, fingerprinting is used to allow mirror sites to host downloads while giving the public comfort that the downloads have not been compromised with Malware. However, if the one-way functions underlying them can be cracked—as MD5 was recently—miscreants can
substitute bad files for good at will.
Even when security does not appear to be a factor, compromising a hash function can have a security impact. As one person mentioned in one of Jeff’s comments, if the hash function for a programming language is public, attackers can deliberately engineer worst-case performance to create a Denial Of Service attack. Here’s a deliberately trivialized example:
Consider a public-facing site with free user registration. When you log in, the site performs a look up on the user name. Now obviously there’s a database somewhere. But perhaps the programmers put a cache in front of it. You could imagine such a cache being useful if cookies are used to keep someone logged in. If the cache uses a hash table, a vandal can create lots and lots of user names that hash into the same bucket. Anyone with a user name in the same bucket will face a long wait as the system performs a serial look up within the bucket.
Whimsey: One of the comments mentioned that the number of buckets is usually a prime number. Did you know that
insects figured this strategy out on their own?
Editorial/Love Fest:
And now for what I really wanted to say.
People have pointed out that Jeff often seems to be summarizing well-known computer science principles. A Reader’s Digest for programmers, as it were. The suggestion seems to be that this is not useful, since anybody with a degree or even a passing interest in programming ought to know these things. My own feeling is that Jeff’s blog is far more useful than my own.
Coding Horror is published out in the real world. It is written for real people.
This weblog operates firmly in
Lake Wobegon. It’s true! The very fact that you are reading about programming means that you are exceptional for reading it and I am exceptional for writing it. I am not going to start a flame war and say you and I are
better programmers. Who really knows? But I am very confident when I say that we are not
representative of the typical programmer.
The typical programmer does not know what a closure is and does not care. If they are added to Java he or she will not use them any ways. The typical programmer may have obtained their degree, but they were far more interested in memorizing whatever was needed to get a B on the examination then they were in the subject matter.
They may have a certification, but they do not own and have not read books outside of their immediate requirements, so Knuth is foreign to them. While they may talk of refactoring, they have never heard of
Martin Fowler
or
Joshua Kerievsky
.
When Jeff writes a Digg-able post about multiple monitors, or how to optimize memory on a Windows PC, he get readers that would never subscribe to a blog if they were told it was about programming. And I am talking about programmers. Try to wrap your head around the idea of programmers who do not find programming blogs interesting (in fact, they don’t find programming interesting at the moment).
If they are then exposed to something insightful and interesting about hash tables… Many will read it and forget it, but a few will be interested enough to delve deeper into the whys behind the whats.
I am all for that. It is remarkably easy to turn and preach to the choir. Evangelizing to the bored and uninterested… This is a frustrating and difficult job. I wish I had the zeal to reach out across the divide and bring people over to where programming is deeply fulfilling and fascinating. I’m glad Jeff is doing the job he’s doing and wish his blog every success.