Kate Glazebrook is principal adviser and head of growth and equality with the Behavioral Insights Team, a self-described social purpose company based in London that applies behavioral science to public policy and best business practices. The organization’s latest effort could revamp the way employee recruitment is done by focusing on analytics to remove some of the observational bias that is typical in the hiring process.

Glazebook spoke recently to Cade Massey, Wharton practice professor of operations, information and decisions and co-director of the Wharton People Analytics Initiative, about how behavioral science is changing the way we recruit hires. Massey is co-host of the Wharton Moneyball show onWharton Business Radio on SiriusXM channel 111, and this interview was part of a special broadcast on SiriusXM for the Wharton People Analytics Conference.

An edited transcript of the conversation follows.

Cade Massey: Tell us about your organization.

Kate Glazebrook: The Behavioral Insights Team was started in 2010 as a very small unit in the prime minister’s office in the U.K., and the basic function was to bring the literature in behavioral science to bear on public policy design and delivery. We try and drag out the great insights of some of our colleagues, do good work in behavioral science, put it in a public policy context and also rigorously evaluate. We do something that governments are not used to doing, which is truly testing whether or not their programs really work.

Massey: The U.K. might have had the first governmental Nudge Unit. Why do you think the prime minister’s office was especially open to this?

Glazebrook: The context is hugely important here. It was 2010, a new government came in and the book Nudge: Improving Decisions about Health, Wealth, and Happiness had just been released. It’s one of those serendipitous moments. One of the PM’s advisors had recently read Nudgeand thought, “Hey, this might be a moment where we can think differently about how we design policy,” particularly in the context of austerity, where we care much more about whether programs deliver the outcomes that we really care about. There was a feeling that we want to tackle problems slightly differently, so I think the political context had a role to play there. It was also about the people. This was a growing field, but it was just cracking into the parlance of policy design. They saw that as an opportunity to take a bit of a risk and build a small team, and it’s been a success since then.

“We do something that governments are not used to doing, which is truly testing whether or not their programs really work.”

Massey: What were some of the early successes that got this going down the right path?

Glazebrook: When we first started, we said if in two years we haven’t delivered a tenfold increase in [savings to the government], then you should get rid of us. That was an explicit goal of the team to have a sunset clause on its creation. That put a deliberate pressure on the team to go out there and seek where can we really be making our greatest impact. Tenfold is a big number, so of course we went and tackled some questions in tax. One of our biggest early wins was about looking at a group of people who hadn’t paid their taxes on time. We thought, is there some way we could nudge them into paying their tax and paying off what they owe to government? We tested a bunch of different types of interventions, largely around how you talk to them in the letters the government was already sending to them. We found that by telling people who haven’t paid their tax on time that they’re in a minority, it gets them to pay their taxes on time.

Massey: A few years ago, [Behavioral Insights Team] decided to move out of government and into for-profit work, at least partly. Why make that transition?

Glazebrook: We spanned out two years ago, and the impetus was that the team had grown as big as it could within government at the time. The thought was, “There’s a lot of demand out there for different government departments to get the expertise in. Why don’t we ask them to pay for that expertise?” So they put out to tender, does anyone want to be a co-owner of this team that the government will continue to own part of? The winning bet came from Nesta, which is the U.K.’s leading innovation charity that supports entrepreneurship in science, technology and the arts. We’re now structured quite interestingly. We have one-third owned by the government, one-third by the charity itself and one-third by the team. So we’re a social purpose mutual. Worldwide, we’re about 80 people now. Most of us are based in the UK, but we’ve got a small team that’s started just six months ago in New York. We’ve got a team in Australia, and we’re about to open an office in Singapore.

Massey: You’ve spoken about a new project you are doing and testing on yourselves first. What that is all about?

Glazebrook: The problem we’re really looking to solve here is one of recruitment decision-making. Most of us know that recruitment decision-making is rife with behavioral biases, but most organizations haven’t embedded any of that knowledge of best practice into what they do. We kind of came at it in a couple of ways. One was that we spent some time in the literature and pulled out what we thought worked so we could make clearer to people what some of the pitfalls were. We were also simultaneously doing work with different public-sector organizations in the U.K., including the police and Teach First, on small nudges — small things you can do in a recruitment process that have disproportionate impact.

In the police context, for example, they were really worried about having low levels of racial diversity in their offices. That was a problem for two reasons. One is the obvious that we care about diversity and there’s a moral and ethical imperative there. But there was also an operational imperative for them that requires looking and feeling like the population that they’re serving. We went in there and did a forensic evaluation of where in their recruitment process did there appear to be disproportionate drop-off by nonwhite candidates. We found it was around a situational judgment test, an online test they were asked to do. We didn’t change the test, but we changed the environment in which people went into that test. What we looked at was changing the email people get when they click on the link to do the test. Is there something we could do about the mindset that people go in to that test with?

The stereotype threat literature basically says that if you belong to a certain group and there is a perception that that group is likely to underperform in a particular task, reminding them of their group association before they do that task will adversely impact their performance. The classic example is asking people whether or not they are female before they do a math test. Sometimes that can have an effect on a female’s performance in a math test, as opposed to just asking that same question at the end of the math test.

“Most of us know that recruitment decision-making is rife with behavioral biases, but most organizations haven’t embedded any of that knowledge of best practice into what they do.”

Small, contextual factors can have impacts on people’s performance. In this particular case, there is literature to suggest that exams for particular groups might be seen as a situation where they are less likely to perform at their best. We ran a trial where there was a control group that got the usual email, which was sort of, “Dear Cade, you’ve passed through to the next stage of the recruitment process. We would like you to take this test please. Click here.” Then for another randomly selected group of people, we sent them the same email but changed two things. We made the email slightly friendlier in its tone and added a line that said, “Take two minutes before you do the test to have a think about what becoming a police officer might mean to you and your community.” This was about bringing in this concept of you are part of this process, you are part of the community and becoming a police officer is part of that — trying to break down the barrier that they are not of the police force because it doesn’t look like them.

Massey: This is exactly what people talk about when they talk about nudges. This is bringing social science to bear in a very subtle way that doesn’t restrict anybody’s freedom.

Glazebrook: Correct.

Massey: So what did you find?

Glazebrook: Interestingly, if you were a white candidate, the email had no impact on your pass rate. Sixty percent of white candidates were likely to pass in either condition. But interestingly, it basically closed the attainment gap between white and nonwhite candidates. It increased by 50% their chance of passing that test, just by adding that line and changing the email format. That was an early piece of work that reminded us of the thousands of ways that we could be re-thinking recruitment practices to influence the kind of social outcomes we care about.

Massey: What alternative are you creating to the traditional method of reading the CV and making a hiring decision that way?

Glazebrook: We spent about a year burying ourselves in the behavioral literature and wanted to come up with the things that would really affect a recruitment decision. We settled on four different features. Three of them are about changing the choice architecture, for want of a better word, or the environment in which people make assessments. The fourth one is about what are you actually assessing in a recruitment practice? I’ll go through them very quickly. The first is anonymization — low-hanging fruit but hugely important. When I am reading aspects of your application, it’s really nice that you are Cade, but I am just wanting to tend to the details that matter. Removing distractions of name, background, university attendance, so forth. Easy.

The second one was around how are we actually assessing rather than thinking about recruitment the way we usually do, which is I start with your CV and read top to bottom. There’s a real risk that if I really like what you said on page one, it’s very hard for me to objectively assess your page two or page three. Alternatively, if you’ve got a spelling error on page one, it’s very hard for me to not feel subconsciously like maybe you’re not the candidate I’m looking for. Now, a spelling error in certain jobs might be a huge indicator of whether you are a good person. Attention to detail can matter, but we don’t want it to disproportionately matter. We have built in a bunch of different aspects to horizontal comparison — which not only accounts for reducing the halo effects on comparing answers to questions across one another, but I’m also reducing the cognitive load. So I’m getting you to say, “Who’s got the best answer to question No. 1?”

“Small, contextual factors can have impacts on people’s performance.”

The third feature is really about, how do you harness the wisdom of the crowd? A lot of recruitment decisions can be time costly. What is the optimal crowd to mitigate the risk against getting rid of your best candidate? We ran some interesting experiments on this and basically settled on three. Most organizations can afford to have three people hire their best people, right? If we live in a knowledge economy, it’s crucially important.

The fourth one is really about testing what works. What is predictive on the job is testing you on the things you’re actually going to do rather than things you’ve done in the past.

Massey: You have some data from your initial studies. Are you happy with the way these things are looking?

Glazebrook: Yeah, we’re really excited. We basically tested this process against a standard CV process, and we found that we wouldn’t have hired 60% of our ultimate hires had we used CVs — which is to say we chucked out their CV in the first round, and it turned out they were our best candidates.