Can crowd science help eliminate biases?

Roxanne Bauer
Share:
A hand holding a looking glass by a lake
Crowdsource Innovation
Get involved with our crowdsourced digital platform to deliver impact at scale

There is a new and exciting field emerging that combines the insight of analytics and psychology; it’s known as crowd science.  Already, it’s a fairly pervasive industry, involving not just data scientists but also behavioral economists, marketers, and entrepreneurs.

Crowd science analyzes data (through mining, algorithms, statistical modeling, and others) and then applies psychological or behavioral theories to make sense of the analyses. It is sometimes referred to as the “guinea pig” economy because it utilizes consumer tests— often without the consumer realizing it— to obtain its data and, therefore, insight.

One of the most popular forms of crowd science is A/B testing whereby website visitors are shown different interfaces or versions of the same site. The way in which each visitor navigates through the site is then tracked to determine which version is more appealing or effective. One reason A/B testing is so helpful is that it divides users into a control group and a treatment group, allowing the engineers of the experiment to determine not just what the issues are but how to solve them. It also allows decision-makers to test for biases in project design and implementation.

Its applications are many and varied. Amazon uses A/B testing to design and redesign its checkout screen, Facebook uses it curate content for the news feed, and Google runs thousands of A/B tests each day to improve search and other services.

In 2008, the campaign team of former presidential candidate and now current US President Barack Obama used A/B testing to optimize donation rates. The team, led by A/B test guru Dan Siroker, tested all the combinations of four buttons and six different media (three images and three videos) simultaneously against each other on the campaign’s website. Every visitor to the splash page was randomly shown one of these combinations, and the team tracked whether or not visitors signed up to “join the movement”.  The winning combination—a photograph of Obama and his family with a button to “learn more”— had a sign-up rate of 11.6% compared to the original page, which had a sign-up rate of 8.26%.  According to Siroker, this may very well have translated into 2,880,000 more email addresses collected, 288,000 more volunteers, and an additional $60 million in donations.  Had the campaign team not done the A/B test, they most likely would have led with a video on the splash page as this was the most popular option among the campaign staffers.  Clearly, testing their biases helped them.  In this case, Siroker and his team may have experienced projection bias in which they assumed that the general public would find the same combination the most appealing as the staffers did.  Only through A/B testing could they determine the discrepancy.

Likewise, in a report by PWC, David Thompson, the chief information officer of Western Union is quoted as saying, “We test certain combinations of fee and foreign exchange and see if it has an effect on volume and customer satisfaction […] We are now able to get responses back from the technologies much quicker, and over a large dataset.” Western Union also used A/B testing to revamp the company’s website, experimenting with different offers and looks in different countries.  Now, the company is able to offer country-specific deals and localize the experience with commercial buttons tailored to each user. This offers Western Union the ability to distinguish between new and returning customers, male and female clients, and other differentiating characteristics to reduce what is known as group attribution error. Group attribution error refers to the tendency to believe that the characteristics of an individual- in this case one customer- reflect the group as a whole- in this case the entire customer base.

The applications of A/B testing go beyond website design.  Tompkin and Charlevoix (2014)used A/B testing to submit students of a Coursera Massive Open Online Course (MOOC) to two different experiences: one without any interaction between students and a teacher and another with interaction between students and teachers in forums. The study found that interaction with a teacher was not significant to course completion so long as student-to-student interaction was prominent. The authors argue, “The absence of the professor did not impact the activity of the forums – the participants did generate their own knowledge in this arena” (pg. 75).

Others in the online learning community also use A/B testing. The Bill & Melinda Gates Foundation recommends the use of A/B testing to improve digital courseware as does Salman Khan of the Khan Academy, Sebastian Thrun of Udacity and Andrew Ng of Coursera, all of which provide MOOCs. In one A/B test, Udacity compared a color version of a lesson against one that black-and-white. “Test results were much better for the black-and-white version,” says Thrun. “That surprised me.”  Likewise, Ng states, “We see every mouse click and keystroke. We know if a user clicks one answer and then selects another, or fast-forwards through part of a video.”  Through A/B testing, says Ng, Coursera discovered that its practice of e-mailing students with reminders upcoming course deadlines actually made them less likely to continue the courses, but e-mails with summaries of their recent activity in the course boosted engagement significantly.
However, it is important to note a few caveats.  A/B testing requires a lot of data, good estimates of baselines, and sound variable management that makes it very difficult to do well.  Moreover, results of A/B tests do not describe how multiple factors interact with one another and they do not predict long-term reactions.

The point is not that A/B testing can give us all the answers, but that it allows data scientists and social scientists alike to make small tweaks to an interface and then see, overnight, how that impacted user behavior. It tests biases in design architecture and provides a way for organizations to determine what is and is not working before suggesting solutions.

This article was first published by the World Bank’s People, Spaces, Deliberation blog. Publication does not imply endorsement of views by the World Economic Forum.

To keep up with the Agenda subscribe to our weekly newsletter.

Author: Roxanne Bauer is a consultant to the World Bank’s External and Corporate Relations, Operational Communications department (ECROC).

Image: An illustration picture shows a projection of binary code on a man holding a laptop computer, in an office in Warsaw June 24, 2013. REUTERS/Kacper Pempel.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Share:
World Economic Forum logo
Global Agenda

The Agenda Weekly

A weekly update of the most important issues driving the global agenda

Subscribe today

You can unsubscribe at any time using the link in our emails. For more details, review our privacy policy.

About Us

Events

Media

Partners & Members

  • Join Us

Language Editions

Privacy Policy & Terms of Service

© 2024 World Economic Forum