• Academia is under scrutiny due to its systemic gender imbalances.
  • To understand gendered patterns, researchers used machine-learning methods to study letters written for candidates for entry-level positions in economics.
  • They found that, due to unconscious biases, women are more likely to be praised for being hardworking over being praised for their ability.

Academia faces increased scrutiny due to gender imbalances (Valian 1999), and this is especially true of economics (Lundberg 2020). Recent empirical work has documented that the career pipeline for women is ‘leaky’, with women dropping out of the profession at critical junctures such as between earning a PhD and becoming an assistant professor, or between assistant and associate professorships (Lundberg and Stearns 2019). In a recent paper (Eberhardt, Facchini, and Rueda 2022), we study the first step in the academic career of an economist – the junior ‘job market’. This is the stage at which the ‘leak’ in the pipeline has grown the most over the past decade (Lundberg and Stearns 2019) and which so far has not received much systematic attention in the literature (Lundberg 2020).

The academic job market in economics is uniquely structured and centralised. Every autumn, universities post job advertisements and potential applicants prepare a ‘job market package’. This package consists of one or more academic papers, a CV, and a set of reference letters written by scholars familiar with the candidate (henceforth, ‘referees’). In this market, candidates, referees, and hiring committees interact via centralised platforms. The same package is used for most jobs, making the marginal cost of an additional application very low. Reference letters are typically not tailored to any particular institution (Coles et al. 2010). As a result, any institution receiving many applications is likely to observe a sample that is arguably representative of the reference letters in the market.

A unique dataset of reference letters

We use a unique dataset encompassing all applications for entry-level positions received by a research-intensive university in the UK between 2017 and 2020. Applying natural language processing tools, we analyse the text of over 9,000 reference letters written in support of 2,800 candidates. We find that a standard letter covers a lengthy discussion of the candidate’s job market paper and some mention of their additional research, teaching and other skills. The final section summarises the candidate’s academic abilities and recruitment prospects. We mainly focus on this final section in our research.

After transforming this text into a ‘term frequency-inverse document frequency representation’, we borrow from cognitive psychology and linguistics to quantify whether letters written for female candidates emphasise systematically different attributes from those written for male candidates.


We use two complementary methods to analyse the letters. We start with an unsupervised approach, the LASSO technique, to select the terms that best predict a candidate’s gender. Among these predictors, we not only observe terms related to research interests but also to personality and traits (“determined”, “diligent”, “hardworking”, etc.).

We then apply a supervised method by building dictionaries of words that are commonly emphasised in reference letters. These dictionaries were constructed by existing research on the topic. In particular, Schmader et al. (2007) propose five language categories (‘sentiments’) that are usually present in academic reference letters. These are ability traits, ‘grindstone’ traits, research terms, standout adjectives, and teaching and citizenship terms. We add the recruitment prospects of the candidate as an additional category.

Ability traits refer to the intellectual capacity of the student, and include terms such as “talent”, “brilliant”, “creative”, and so on. ‘Grindstone’ traits relate, in the words of Trix and Psenka (2003), to “putting one’s shoulder to the grindstone”, and include terms such as “hardworking”, “conscientious”, “diligent”, and so on. Research terms describe the type of research carried out (e.g. “applied economics”, “game theory”, “public economics”, etc.). Standout terms are typically superlatives such as “excellent”, “outstanding”, or “rare”. The teaching and citizenship category refers to both the candidate’s teaching skills and interactions with colleagues. Language in this group includes “good teacher”, “excellent colleague”, “friendly”, and so on. The last category, recruitment prospects, has been added to identify words that are widely used to describe the expected placement of the candidate in the highly competitive and globalised labour market for young economists. Terms in this group include “highly recommended”, “top department”, “tenure track”, and so on.

To validate our dictionaries, we carried out an original comprehensive survey of academic economists in research-intensive UK universities. The results of this exercise, reported in Figure 1, indicate a broad consensus between our categorisation and that of the profession.

A chart showing the relationship between language and professions.
'Figure 1 - Correspondence between authors' categories and ‘wisdom of the crowd’'
Image: Vox EU


We estimate the correlation between the importance of each sentiment in the reference letter with the gender of the candidate, controlling for overall letter length and a set of candidate and referee characteristics. The baseline results are reported in Figure 2, where we plot the estimated coefficients of the female dummy from running a total of 672 different models. A darker symbol indicates a higher number of specifications that yield statistically significant estimates for the parameter of interest. Full symbols indicate significance at 1% level across all possible clustering. Hollow symbols indicate failure to find statistical significance at any level.

A chart showing the baseline regression results of gender bias in language.
'Figure 2 - Baseline regression results'
Image: Vox EU

The main message from this figure is that regardless of the institutional ranking, female candidates are significantly more likely to be associated with ‘grindstone’ terms (from 6% to 10% of a standard deviation) across all specifications. These results confirm our interpretation of the unsupervised LASSO analysis. We also observe that fewer terms related to research are used in letters supporting female candidates. Both results echo findings in other disciplines (Trix and Psenka 2003, Valian 2005). The findings appear remarkably stable across different specifications, reassuring that other unobserved confounding determinants are unlikely to change the results.

Sorting of female candidates across institutions or letter writers might play an important role in explaining differences in the language used. For example, Boustan and Langan (2019) document that female representation is a persistent attribute of economics departments, and that it matters to promote women’s careers. To address sorting across departments, we run models including candidate institution fixed effects. The results are reported in Figure 3, and suggest that among students from the same cohort, graduating from the same institution (who, for instance, were admitted to their PhD program with arguably the same entry criteria), women are still significantly more likely to be described with “grindstone” terms.

Sorting across letter writers could still explain our findings. To address this concern, we run a set of specifications including writer fixed effects. Note that we only include referees who have written two or more letters, with at least one for a female candidate. These results are also reported in Figure 3 and are broadly consistent with the patterns above.

A chart showing estimate of results of writers fixed effects in biased language.
'Figure 3 - Regressions with fixed effects'
Image: Vox EU

In the same figure, we further analyse the sample of referees who have less (more) experience with female candidates separately, i.e. fewer (more) than 50% of their references were for women. The ‘less experienced’ group appears to be the one that drives the ‘grindstone’ result.

Experience may matter for two main reasons. On the one hand, referees may vary in their perception of women, and female candidates find them to avoid stereotyping. On the other hand, it could be that referees do not differ initially, but that their exposure to female candidates leads them to update prior stereotypes. Further research is needed to disentangle these two mechanisms.

The fixed effects results also uncover a new pattern with regards to ’ability’. Female candidates are associated with noticeably fewer ability terms, albeit insignificantly, with a clearer pattern emerging in the ’less experienced’ group. In further analysis, we show that this pattern is significant for male letter writers in this group.

Lessons learned

As academics, we all know how much time is spent writing and polishing reference letters for job market candidates. This is an occasion where we try our best to promote our students. As a result, it is unlikely that, on average, we are willingly undermining female students by emphasising fewer desirable attributes. On a positive note, recent research has shown that unconscious biases can be addressed by providing the actors involved with evidence of the existence of such biases (Boring and Philippe 2021). By documenting gendered language patterns, we hope this research will be a first step towards increasing awareness of our biases and thereby reducing stereotypes in the job markets.

What's the World Economic Forum doing about the gender gap?

The World Economic Forum has been measuring gender gaps since 2006 in the annual Global Gender Gap Report.

The Global Gender Gap Report tracks progress towards closing gender gaps on a national level. To turn these insights into concrete action and national progress, we have developed the Closing the Gender Gap Accelerators model for public private collaboration.

These accelerators have been convened in ten countries across three regions. Accelerators are established in Argentina, Chile, Colombia, Costa Rica, Dominican Republic, and Panama in partnership with the InterAmerican Development Bank in Latin America and the Caribbean, Egypt and Jordan in the Middle East and North Africa, and Kazakhstan in Central Asia.

All Country Accelerators, along with Knowledge Partner countries demonstrating global leadership in closing gender gaps, are part of a wider ecosystem, the Global Learning Network, that facilitates exchange of insights and experiences through the Forum’s platform.

In 2019 Egypt became the first country in the Middle East and Africa to launch a Closing the Gender Gap Accelerator. While more women than men are now enrolled in university, women represent only a little over a third of professional and technical workers in Egypt. Women who are in the workforce are also less likely to be paid the same as their male colleagues for equivalent work or to reach senior management roles.

In these countries CEOs and ministers are working together in a three-year time frame on policies that help to further close the economic gender gaps in their countries. This includes extended parental leave, subsidized childcare and removing unconscious bias in recruitment, retention and promotion practices.

If you are a business in one of the Closing the Gender Gap Accelerator countries you can join the local membership base.

If you are a business or government in a country where we currently do not have a Closing the Gender Gap Accelerator you can reach out to us to explore opportunities for setting one up.