top of page

Revolutionizing UN Recruitment with Machine Learning: Risks & Benefits

3 December 2023

By Katja Hemmerich

Four white robots working on blue laptops.

Inspired by the innovations showcased at last week’s Career Development Roundtable for International Organizations (CDR), our spotlight this week is on the use of artificial intelligence in human resources management. The UN’s Chief Executives Board (CEB) has primarily considered the potential of artificial intelligence as a programming tool to support implementation of the SDGs and articulated guiding principles for working with member states in this regard. As the high-level panel discussion on the second day of the CDR illustrated, human resources leaders see huge potential for AI, but have yet to grapple with its complexities and define governance or guiding principles for its use in human resources management across the UN family.


UNDP is already experimenting with machine learning to optimize their workforce planning, surge and mobility processes, and highlighted its phenomenal ability to speed up the process for matching internal capacities with workforce needs and enhance data for decision-making in surge and mobility exercises. Other UN entities experimentation with AI and machine learning is likely to grow in 2024 and 2025 as many of them present updated and new human resources strategies to their governing bodies and the Fifth Committee. This will necessitate a shared understanding with those governance bodies and their member states on how AI and machine learning can be used responsibly and ethically in human resources management, in particular in relation to recruitment.


Addressing disparities in gender and geography are longstanding priorities for UN recruitment. At the same time, experience with AI has shown its potential for bias and discrimination, even if it is often an unintended consequence. Thus, HR and member states’ discussions on the use of AI in UN recruitment are likely to be dominated by these concerns. To support these discussions, our spotlight this week highlights new research on different approaches at national level to mitigating the risks of discrimination in AI decision-making and how this creates potential opportunities to address longstanding disparities in UN recruitment. We also provide HR practitioners, HR leaders and member states with some considerations to keep in mind as AI is more systematically used in UN recruitment.

AI and the risk of discriminatory decision-making

There are different forms of artificial intelligence, but for recruitment, the power of machine learning was quickly touted as a key tool to significantly improve the speed and quality of recruitment. Machine learning is the use of algorithms to sort through huge volumes of data in second or minutes while continuously learning from that data as it grows, thereby improving the accuracy of the algorithm over time. Amazon, with over half a million employees globally, very quickly saw the potential to use machine learning to sort through the thousands of job applications it receives regularly, and match the applicants skills to open positions. Already in 2014 Amazon started to develop a machine learning tool for recruitment with the intent of speeding up recruitment, reducing the human effort involved in recruitment and improving the fit between applicants and jobs. However, in testing the tool, Amazon realized that the machine was consistently proposing more male candidates for jobs, particularly in IT, and excluding qualified female candidates, leading it to scrap the experiment in 2018.


The risk of machine learning bias in recruitment is real. And similar early machine learning and AI tools also demonstrated potential for bias in taking other decisions that impacted people’s lives, for instance when getting quotes for insurance or when taking decisions about whether someone would get a loan or a credit card. Like recruitment, such financial decisions can significantly impact consumers’ lives. As a result a number of governments have adopted legislation to mitigate the risk of discrimination by AI, particularly in the financial sector. The different approaches they have adopted provide interesting lessons learnt for international organization when considering their own governance frameworks for artificial intelligence in decision-making, particularly in recruitment.


Some countries like the United States have explicitly prohibited the collection and use of gender and other ‘protected’ demographic data in machine learning and AI data models. Other countries, like Singapore allow both collection and use of such protected data while prohibiting discrimination. A significant group of countries fall in the middle of these two extremes (e.g. the European Union and Canada) and allow the collection of gender data but prohibit the use of gender as a feature in machine models that take decisions on lending.


So how well has this worked in mitigating or eliminating discrimination? In 2019, after adoption of such legislation, a woman in New York accused the new Apple credit card provider of gender discrimination. Her request to increase her credit card limit was declined but her husband - whose credit records were worse that hers - was given a limit 20 times higher than hers. When the relevant New York State authorities investigated, they found no violations of the fair lending legislation because the company had in fact not used any gender data in developing its algorithms or in taking the decisions on her credit or anyone else’s


“Goldman Sachs, a partner in the Apple Card venture, stated: ‘We have not and never will make decisions based on factors like gender...we do not know your gender or marital status.'" - ABC News, "New York probing Apple Card for alleged gender discrimination after viral tweet", 11 Nov. 2019


The Unconscious Bias of Machine Learning

If an AI tool doesn’t know the gender of the applicants, and it doesn’t have any data on gender markers in the historical data that it is learning from, how can it be biased? The Goldman Sachs defense seems valid. The reality is that we as humans have unconscious biases and this is embedded into data about our past decisions. Even if we can’t see it ourselves, machine learning will pick up on it and mimic it in its own decision-making.


“[A] pitfall is to believe in unbiased data. All data to date has inherent biases. Thus, it is fairly simple to explain the biased outcomes of algorithms.” - Per Aarvik, "Artificial Intelligence – a promising anti-corruption tool in development settings?", 2019


In the case of Amazon’s recruitment tool, it was learning from 10 years of data on recruitments. The tech industry remains male-dominated thus most of their data on applicants and who was typically hired by Amazon in those 10 years were men. The machine learned this and mimicked that same behavior when it proposed candidates for jobs. Although it wasn’t programmed to look at gender of candidates, it ‘learned’ to downgrade applicants who came from women’s colleges and


“It penalized resumes that included the word "women's," as in "women's chess club captain.” - Reuters, "Amazon scraps secret AI recruiting tool that showed bias against women", 10 Oct. 2018


These were some of the key reasons why Amazon decided to abandon the tool before it even started using it for recruitments.


But this doesn’t mean that AI can never be used for recruitment. A new study which looks at how machine learning is used to make lending decisions and the success (and failure) of different anti-discrimination efforts has some insightful and useful lessons for those considering machine learning in recruitment. It shows not only how to mitigate the risk of bias, but also has some concrete proposals for how machine learning can be used to eliminate some of our own human biases and potentially make better decisions than us.

Modeling how machines learn bias and efforts to eliminate that bias

The study, which has been named as one of the top 100 Global AI Solutions by the International Research Centre in Artificial Intelligence (IRCAI), is focused on how machine learning algorithms make biased lending decisions despite anti-discrimination legislation across multiple countries with different types of legislation. Essentially, the study is analyzing how the Apple Card example, highlighted above, can happen despite legislation, and what can be done to eliminate those problems. While not an HR example, the concepts translate quite logically to the recruitment process and how a machine can and should learn from past recruitment decisions to shortlist candidates for a particular job based on skills.


To demonstrate this, we summarize the study’s financial model, and then highlight how this translates to HR. (For those wanting to go to the source materials, there is an executive summary of the project and its findings as well as a more detailed journal article for those interested in the methodology and data). The researchers used actual publicly available data from a global lender to simulate how gender-based discrimination is learned to assess the impact of different types of anti-discrimination laws. The model looks at how men and women are treated by the algorithm when they apply for a loan and how the algorithm decides to recommend people for a loan, or reject them.


As we’ve highlighted above, machines learn from our past human processes and decisions, which in the case of this model consisted of past data on who applies for loans and who is approved or rejected. Globally, information on past borrowers tends to comprise about 80% of data from men and 20% from women. As a result, the algorithms learn much more about men’s behavior and inevitably make decisions based on typically male behaviour. In essence, by using historical data, the algorithm adopts our own society’s historical systemic biases and tends to assume that its typical customer acts like the men in its sample of data.


Both humans and machines can however use a gender lens to gain more nuanced insights into this data. All things being equal, current financial data shows that women are generally better borrowers than men. It also shows that individuals with more work experience are better borrowers than those with less. On the whole, women still tend to have less work experience than men because for various reasons, including because they generally still are the primary care giver in families. For the purpose of the simulation in the study, the researchers have assumed that women with three years of work experience are considered creditworthy and should be approved for the loan, while men should have five years experience before they are approved for a loan.


Using those parameters, the study illustrates clearly why the anti-discrimination laws that prohibit the use of gendered data does not actually lead to fairer or more appropriate decisions by the algorithm. Because most of the data it is learning from is from men, it assumes that the typical borrower behaves like a man and therefore should not be approved for a loan unless they have five years of experience. Creditworthy women with three years of experience, that the financial firm is willing and interested in lending to since they are better borrowers, automatically get excluded from the shortlist of those who should be approved for a loan. Conversely, in jurisdictions that allow consideration of gender data, the algorithm can learn to specifically include women with three years of experience but reject men with three years of experience, thereby creating a fairer and more qualified pool of candidates for loans.


Figure 1 below shows how these two scenarios play out.

The researchers therefore highlight that excluding gender and other forms of protected data from machine learning does not in fact eliminate discrimination from the process, nor does it provide the lender with the best quality pool of candidates. But the study goes further, and looks at more effective ways that machine learning could in fact reduce the historical bias in our human decisions.


Can machine learning eliminate discrimination?

The research team sees significant potential for machine learning, if it is allowed to consider gender and other protected data, to actually reduce our own unconscious bias. And this is what makes machine learning a potentially revolutionary tool for UN recruitment. Not only can it sort through thousands of applications in seconds and identify the best matches for vacancies, it can potentially address the longstanding geographic and gender disparities in UN recruitment. But only if the UN adopts a governance model for AI that is not gender blind.


The professor sees the greatest potential in the Singapore example, where gender data is not only collected but also able to be considered in the learning and decision making process. Essentially the learning parameters and decision-making parameters can be adjusted to lead to equal outcomes. If we translate this into a recruitment example, machine learning can have affirmative action elements built into how it learns to make candidate-job matches and in its hiring recommendations to address current and future geographic and gender imbalances until they disappear. Identifying and setting those parameters is not necessarily easy. More importantly, member states across the UN governance bodies will want oversight on machine learning in recruitment decisions, and the likelihood that they are able to gain consensus on such affirmative action initiatives is pretty low. Thus the operational and political risks make this an unlikely option for UN entities.


What has the most potential for UN recruitment is the hybrid Canadian/EU model, where gender data can be collected and used for the algorithm’s learning but is eliminated from the actual decision-making. In the simulation, the researchers created a level playing field in the historical data so that the machine learned customer behavior from an equal amount of women and men. By increasing or upsampling data about women, the machine learned that it should propose anyone with three years of experience for a loan. This eliminated the previous exclusion of creditworthy women, and also created benefits for men with three years experience who were now also being proposed for a loan, as outlined in Figure 2 below. Humans taking the final decision could still choose to eliminate the men with only 3 years experience. The important part is that they have a fair pool to consider and can be confident that those excluded from the pool are all consistently and equally unqualified.


This highlights the phenomenal potential that such a machine learning approach could have for UN recruitment. Imagine if a tool could help HR and a hiring manager sort out all the unqualified candidates fairly and in a matter of seconds, with the confidence that they were excluded on the basis of qualifications and not gender or nationality. There is no guarantee that this would solve the gender and geographic imbalances, but it is fascinating to think of how a machine could learn to shortlist candidates, if it learned from recruitment data where the sample size from each geographic region is representative of their population, and if the data from each region consisted of equal numbers of men and women.


Governance considerations for the CEB and member states

The study not only highlights some concrete approaches for HR practitioners to consider and experiment with, but it also highlights importance governance considerations when trying to prevent discrimination by machine learning and artificial intelligence. As such guidance is developed on the use of AI and machine learning in human resources management in the UN, it would be useful for human resources practitioners, the CEB and member states to keep in mind the following considerations:

  1. All data has inherent biases. Machines that learn from our data will adopt and learn those biases.

  2. Different approaches on preventing discrimination have yielded very different results. Evidence and comparisons of different approaches across multiple countries is crucial to understand the implications of different governance principles and modalities.

  3. All machines and models will make mistakes. Testing and validation of results is crucial and even after launching AI tools ongoing human oversight and checks are needed.


bottom of page