Games People Play

(This is a snapshot of my old weblog. New posts and selected republished essays can be found at raganwald.com.)

Wednesday, March 05, 2008

My thesis when I wrote The Naïve Approach to Hiring People was that we can learn thing about hiring people from the things we know about document classification. In that post, I talked a little about Naïve Bayesian Filters and what they teach us about selecting people to interview.

I wasn’t actually suggesting we stop using humans to select résumés or decide whom to interview based on their answers to phone screen questions, I was asking us to think about what we can transfer from our knowledge about document classification to the problem of—let’s be frank—people classification.

A few people wrote to say, “No, that won’t work, here’s why.” That’s really interesting! A completely naïve filter certainly wouldn’t work. But thinking about why it wouldn’t work teaches us more about how to do a great job of hiring the right people.

Games People Play

One of the reasons you can’t build a naïve document classifier to select candidates is that people game the system.

When people perceive that they have an enormous incentive to obtain jobs as programmers, they are motivated to subvert the process and get the job for themselves regardless of what the employer is attempting to accomplish with the interview process.¹

Candidates are trying to guess what you want to read or hear and will happily parrot it to you. You may think that this is a waste of their time, because lying to get the interview will only get them thrown out of interviews, but the incentives in looking for a job are to reward anything that gets them any reasonable job offer, no matter how many times they are thrown out of interviews.

For that reason, they are constantly monitoring the behaviour of employers and attempting to adjust their behaviour to manipulate employers into giving them interviews and ultimately, giving them a job. Roughly, there is the most difficult part of hiring people regardless of how you do it: Candidates are trying to guess what you want to read or hear and will happily parrot it to you.

You may think that this is a waste of their time, because lying to get the interview will only get them thrown out of interviews, but the incentives in looking for a job are to reward anything that gets them any reasonable job offer, no matter how many times they are thrown out of interviews.

Let’s state the problem with filtering a little more explicitly: filtering works by analyzing features (years of experience, technologies, education, past employers) and looking for the features that have the highest correlation with positive outcomes. For example, if our task is to select people to interview based on asking them no more than five questions over the telephone, we could use the following protocol:

Collect a list of questions to ask from sources like Steve Yegge’s The Five Essential Phone Screen Questions and Joel Spolsky’s [The Phone Screen].

When we call a candidate, select some questions to ask from our list.

Make a note of which questions the candidate answered satisfactorily and which unsatisfactorily.

If we call the candidate in for an interview, make a note of whether they were a decent interview. Not necessarily a “HIRE,” but whether we felt interviewing them was a waste of time or not.

After following this protocol for a while, we have collected an imperfect but still useful corpus: we can do some easy analysis to determine which questions have the highest correlation between satisfactory answers and worthwhile interviews. I can’t predict what your answers might be if you follow the protocol, but here’s something important to note: we are looking for the questions with the highest correlation.

Programming Collective Intelligence provides practical examples for building systems that reason based on learning from data and behaviour, such as Naïve Bayesian Filters, collaborative filters, and recommendation engines.

Armed with our “training,” we would know the five best questions to ask on the telephone. Great! We would start using them exclusively, and we would only interview people who get all five right. We wouldn’t grant that many interviews, but let’s assume that we would be happy with this trade-off. (We’ll address the nature of that trade-off in another post.)

At first, things would probably go really well. Every person we bring in for an interview would be worth the trouble, and we would making offers to many of them. We would boast that our five questions are amazing! But then things start would start to slip, we would start getting one or two duds a week, and then every other interviewee would be a dud, and before you know it almost everyone we call would give us great answers but would be a festering pile of mediocrity when they showed up for the interview.

What in Hades could have gone wrong? Of course, we know what went wrong: the job hunters would cotton onto our questions. The word would have gone around about the questions we asked. Maybe we would have been arrogant enough to discuss them on our blogs, maybe interviewees would network, maybe they would publish your questions on their own blogs. I can tell you that tech recruiters are constantly interrogating people for interview questions and then preparing their candidates by telling them what to study in advance. So if we ever hire through recruiters, we might just as well publish our interview questions on our web site.

So now anybody sending us their résumé would have memorized our five questions and would even know the right way to answer each question to get an interview. We would have been gamed.

So what can we do about it?

Game On

Before we try to work out a coping strategy from first principles, we can look around and see whether this problem has already been solved. Indeed it has; as mentioned, the cat-and-mouse game between spammers and spam filters is a very close analogue to the cat-and-mouse battle between employers and candidates.

The point of this post is to suggest that we can learn from a similar problem, not to pretend I know the exact answers. You have to do your own thinking!

The point of this post is to suggest that we can learn from a similar problem, not to pretend I know the exact answers. You have to do your own thinking! But here are a few ideas that have worked for me as an interviewer in the past. It is easy to see their analogue to battling spammers.

First, you need an iterative strategy. You cannot think of “The five best questions” as some sort of fixed list. The correlation between a question and the likelihood of a positive outcome for you changes over time as candidates discover what you are seeking and pretend to supply it. This is exactly like spammers writing emails: they are constantly trying to reverse-engineer filters and write letters that score well as non-spam, and the filters are constantly being updated in a Red Queen’s Race.

In 2004, I was granting interviews to anyone with Python, Ruby, Spring, or Hibernate experience. At the time, these were remarkably rare and had very strong positive correlation for the type of team I was building. Today, while I still respect those technologies, I doubt they have as strong a correlation. I would definitely need a strong phone screen before granting an interview.

This isn’t just about niche tools. I’m sure my more conservative colleagues will tell you that there was a time when a Microsoft programming certification meant far more than it does today. In general, as the word goes out that employers want something, the correlation between that thing and positive outcomes goes down, and employers have to search for other “features” providing higher correlation.

If you followed the link, you know what I valued in 2004. What about today? The second thing is that I’m not going to tell you (Although I still ask “What’s the best work you’ve ever done, and why are you proud of it?”). Spammers use computer programs called “bots” (you knew that, of course) to sign up for free email accounts and send spam. One of the ways email services try to foil them is with CAPTCHAs.

In my twenty years of business experience, Growing a Business is absolutely the best book on founding and running a business organically that I have ever read. And I read a lot of books! “Growing a Business” is not about scoring business coups or raising money. It is not about sales tactics or innovation. It is about growing a business step by step, customer by customer, employee by employee.

The economics of CAPTCHAs and other means of foiling bots are simple: there is an upfront cost to the spammer to reverse-engineer whatever obstacles you put into place and write a bot that can negotiate them. Thereafter, the bot earns money for the spammer every time it encounters your obstacle. Therefore, the spammers program their bots for the obstacles that offer them the largest opportunity to profit.

According to Windows apologists, this is why Windows machines are infested with virii and Macs are not: writing a virus for OS X is just as much trouble as for Windows, but the Windows virus can infect thirty times as many machines as the OS X virus, so nobody bothers with OS X virii. Maybe true, maybe not true.

So back to hiring and questions. Training your filters as described above and retraining them from time to time is fine if the marketplace takes a while to respond to your questions. It took four years before “Ruby” went from being a must-interview-no-questions-asked to a looks-good-but-better-have-something-else-as-well. But if things are moving very quickly, the useful time of a question may fall below the amount of time needed to train questions. In that case, you can’t gather reliable statistics.

What makes the “market” move quickly? Perceived desirability. If you’re Google and your stock is on fire, people devote themselves to deciphering and gaming your hiring strategy. Or if there are a very large number of people that hire the exact same way you hire, you get the same overall effect. Going back to CAPTCHAs, if you are running Google Mail, people will spend a lot of time breaking your CAPTCHA. Or if your CAPTCHA is part of a popular package that a lot of sites use, people will take the time to break it.

The easiest—and most effective—way to secure a web site is to use an obscure CAPTCHA. Sure, it ought to be as robust as possible. But if very few people use it, the incentive for breaking it will be low. In 2004, languages like Python and Ruby were obscure by job seeker standards. Sure, if someone wanted a job with me and only me she could claim to know them and get thrown out of the interview after the first question. But who would bother faking Ruby to get an interview with me when they could fake J2EE+Struts and get a few dozen interviews with BigCos?

Today, I have a different set of hot buttons. Sure, they are different in part because I have iterated over the years and there are new items that correlate strongly with positive outcomes. the new things are not necessarily hot things. I mentioned Python and Ruby in 2004, but I also mentioned J. J wasn’t hot then and it didn’t look like it was going to become hot, but I have had very good experiences working with APL and J people.

The main thing about my hot buttons is that I try to make sure they aren’t popular. Now you might snap your fingers and say, “Aha! The Python Paradox again!” But that isn’t it. The Python Paradox is that using certain unpopular languages that have Tweak “Cred” increases an employer’s attractiveness to strong candidates. That’s reversing things and figuring out how to game the good candidates! This is different: it’s choosing candidates based on things that are unpopular amongst other employers specifically to avoid having candidates fake them to get a job with you.

Winning

The battle to secure the best employees is a game, and while there are no sure things in life, there are strategies that maximize the possibility of a positive outcome for employers. Selecting candidates is certainly not a simple problem amenable to naïve filtering, but thinking about document classification helps us do a better job.

Likewise, responding to candidates attempting to subvert your interview process is not as simple as iteratively training your “filter questions” and choosing obscure questions, but thinking about the battle between spammers and web sites helps us do a better job when hiring programmers.

There are many reasons for this. I am going to skip right past all discussions of chicanery and posit something for consideration; if the industry does a terrible job of selecting good people, we should not be surprised that candidates do not trust us to take complete control of the interview process. If a candidate has heard that some of the people working at XYZCorp are complete bozos, what is wrong with stretching the truth in the interview to get the job? If the industry constantly trumpets how tools and architecture are more important than hiring good people, why shouldn't a less-than-stellar candidate lie their way into a job? Won't Eclipse and static typing and design patterns and BDUF ensure that he can do a serviceable job?
[back]

post scripum: The fact is, almost every idea has holes in it. Finding them is important, but it’s just the first step. Bayesian filters are not going to be able to outperform a human for selecting candidates. True. But the next step is to figure out why they fall short and what we can do about it. There is a very lucrative business opportunity for someone to apply machine learning techniques to hiring people.

Where there’s muck there’s brass.

¶ 6:00 AM

Comments on “Games People Play”:

Gosh darn it, "virii" is not an English word, and certainly not the plural of virus. If it were Latin, it'd be the plural of "virius", not "virus", any way. In any case, "virus" doesn't have a recorded Latin plural -- it appears to be a mass noun. Reconstructions of what the plural should be range from only modifying the pronunciation ("virus" with a long "u") to "virora". None of them really work. Fortunately, there is a perfectly good English plural -- "viruses".

# posted by

Aaron Denney : 10:43 AM

Aaron:

I always thought that the plural of biological virus is biological viruses and the plural of computer virus is virii.

It turns out that things are a little more complex than that. Virii is definitely wrong. Except when person A uses it and person B understands it.

# posted by

Reginald Braithwaite : 10:52 AM

It is very difficult to study Bayesian Analysis topic. Not many good reference textbooks to study Markov chain.

I use Statistical Decision Theory and Bayesian Analysis, 2nd Edition to study. This is good reference textbook.

Do you have any other good Bayesian Analysis related textbooks recommend?

Regards,

Andy ^_^
Cocomartini Discount Online Bookstore

# posted by

must : 11:36 AM

I think another problem with your bayesian approach is that the message you're looking for is the message trying to be sent. While definitely a leap forward in terms of hiring efficiency, that's probably why it would never work to automate the process entirely like a spam filter can do.

Also, on the topic of captchas, the security through obscurity (or rather, unexpectedness) is an extremely good point. My own blog at http://smokinn.com/blog has an extremely simple captcha that I don't expect any bots to be breaking anytime soon. Also, since I implemented it myself, I'm the only one using it and that makes it completely undesirable to write an automation script to spam just my blog. I hate the scrambled word style captchas to I did something completely different.

If adoption of my blog code ever picks up (I doubt it ever will), there are many changes I can make that will make it very hard to automate the breaking but for now it's very basic. Currently, it makes you pick 2 out of 6 photos because I like the layout of that but even just making it 3 of 9 makes it a lot harder to game randomly. Also, the captcha requires clicking on the images (and doesn't use checkboxes) so that makes it harder to automate too. There isn't even any form on the page. I'm pretty happy with it and I can't wait to see if a spam bot ever gets through.

# posted by

Guillaume Theoret : 11:56 AM

# posted by

Reginald Braithwaite : 11:59 AM

Here's a good example of what you're talking about for your anecdote collection.

I once had a hiring manager who used a single question to filter out employees -- it was a question built on very simple visualization, and the kind of thing which had a pretty straightforward answer. The technical recruiter for the position not only prompted me with the question, she told me that my answer didn't include a few key phrases which he is looking to hear when he's asking the question.

Without prompting, that question is a good one, because it would have demonstrated and exercised an ability to abstract and visualize a very simple system in your head. That consulting agency I worked for, though, managed to get in a couple of other people who didn't actually have that capability...but the prompting got them past the filter.

# posted by

Robert Fischer : 1:26 PM

Well, it's pretty much as you say in this post. People will try and game the filter. The achilles heel of spam is that they're trying to give you a message you don't want. If you can identify the message (what the classification essentially aims for) then you can put some messages on one side and some other messages on the other side.

For hiring on the other hand, the message you're looking for is "I'm worth hiring". The message the candidate is looking to send is "I'm worth hiring". Once you've defined what kind of messages are worth hiring, the candidates can adjust to send that kind of message.

Spam on the other hand doesn't want to send email about my weekend, or what to do tonight or what covers are supposed to go on the tps reports. They want to fraud you, sell you pills and other less savory products. So you can classify. They can't adjust to the messages you want to receive because that's not the message they're looking to send. It's therefore possible to reach a fairly stable state (although the line can be *extremely* thin).

If you ever reach a stable state in the hiring algorithm you're screwed. Like robert said just above, just one consultancy will be able to game it trivially and get a bunch of bad candidates in.

Instead, it would be interesting to combine the naive classification with a kind of genetic algorithm, introducing random mutation in periodically to make sure your hiring process never reaches a stable state that's easy to game. You have to keep introducing new questions into your question pool and these new inputs will form into new combinations and your hiring process will be able to remain in flux while staying much more efficient than any ad-hoc approach.

This is getting really interesting. =)

# posted by

Guillaume Theoret : 6:50 PM

Guillaume:

If you ever reach a stable state in the hiring algorithm you're screwed. Like robert said just above, just one consultancy will be able to game it trivially and get a bunch of bad candidates in.

Instead, it would be interesting to combine the naive classification with a kind of genetic algorithm, introducing random mutation in periodically to make sure your hiring process never reaches a stable state that's easy to game. You have to keep introducing new questions into your question pool and these new inputs will form into new combinations and your hiring process will be able to remain in flux while staying much more efficient than any ad-hoc approach.

Exactly, although you are simplifying somewhat and assuming the "spammer" cares about gaming you. As I note above, the economics of hiring mean that your two defenses are changing your filter (as you note) and also avoiding popular filter characteristics to make it uneconomical to target you.

# posted by

Reginald Braithwaite : 9:57 PM

I've actually come up with a name for these (somewhat obscure) kinds of programming languages and technologies that I mention on my resume.
I refer to them as my 'Dog Whistle'.

I'm sure most recruiters probably miss them completely, but for employers and tech recruiters that are "in-the-know", it actually communicates quite a bit I believe.
It also says something about the type of company you're applying for if they pick up on it.

I guess that's just one of the ways I play the game. :)

# posted by

Dennis Hotson : 8:25 AM

Dennis:

+1!

This leads onto another subject: the games companies play to attract good candidates. This post and its predecessor both imply that there is a fixed pile of candidates, such as when you search monster.com.

But of course, if you advertise everything is different. Where you advertise and what you say in your advertisements strongly influences the types of candidates you will attract.

I've used AdWords for recruiting, and the choice of ads makes a huge difference.

# posted by

Reginald Braithwaite : 9:29 AM

<< Home