The Ethical Benchmarking of HR analytics
Act in such a way that you treat humanity, whether in your own person or in the person of any other, never merely as a means to an end, but always at the same time as an end.
– Immanuel Kant.
HR analytics has been defined as the systematic identification and quantification of the people drivers of business outcomes. Viewed from the perspective of Kant’s principle of mankind, it is notable that this definition fails to recognize people as an end in their own right. The question is: how do we ensure that HR analytics are ethical? In this article, we discuss the state of the art of ethical benchmarking of algorithms and provide advice for practitioners in the field.
Evaluation of algorithms
How do we evaluate the ethics of our algorithms? Legal frameworks such as the European General Data Protection Legislation (GDPR) provide guidance in differentiating right from wrong. However, what is legal is not always ethical.
GDPR establishes the right to consent, the right of access, the right to be forgotten, and the right to be informed. However, it falls short of mandating the employees’ right to be involved in the development and application of HR analytics. And as we have seen time and time again, public policy is oftentimes unable to keep up with the speed of technological development. This means that all too often employees have little or no opportunity to have their interests represented and protected.
Although existing ethical frameworks, such as the Declaration of Helsinki, the Ethical Principles of Psychologists of the American Psychological Association (APA), or the IEEE Global Initiative on Ethics of Autonomous and Intelligent Systems allow us to tread a little further into this unknown territory, they too are often past their ‘best before’ date. For example, the last amendment to the APA guideline dates from 2016.
All too often the lack of ethical and legal precedent leaves the HR analytics team with considerable autonomy. Minimal guidance and competing business interests yield fertile ground for ethical transgressions.
Related (free) resource ahead! Continue reading below ↓
People Analytics Resource Library
Download our list of key HR Analytics resources (90+) that will help you improve your expertise and initiatives. Your one-stop-shop for People Analytics!
The role of context
Similar to validity, in evaluating the ethics of our decisions, we need to be constantly aware that what works in one organization, may not work in another. What is ultimately deemed right or wrong, is oftentimes highly idiosyncratic to the context in which the decision is made, and the unique set of stakeholders.
Having said that, we may leverage frameworks and lessons learned from past ethical transgressions. Filling in the blank spaces by building on the shoulders of the giants, such as Immanuel Kant.
You have likely come across the AI algorithmic horror stories. These include Amazon with its gender-biased AI recruiting tool, Google with racist facial recognition, and Facebook’s ad serving algorithm that discriminates by gender and race. Let’s assume that organizations strive to behave ethically (which may for some of us already be a stretch of the imagination). These Giga companies, with seemingly endless resources, have still all fallen victim to biases from the input data, triggering unwanted and unintentional outcomes. Garbage in is garbage out. If you do not actively control for biases in the data your interventions are at best sub-optimal.
The answer to the question of how we can assess the ethicality of HR analytics lie in benchmarks. Benchmarks measure properties and provide scores based on the ethical framework(s) they represent. This greatly facilitates a systematic approach to the evaluation of the ethics of HR analytics. Specifically, by forcing us to develop and apply standardized metrics, benchmarks allow us to encode context. This allows us to compare and contrast novel cases to the state of the art.
The name of the game is to ensure that our ethical benchmark incorporates all relevant criteria and evidence about those criteria. Enabling us to arrive at a valid verdict regarding the question of whether the organization-wide deployment of a particular algorithm is ethical. In this regard ignorance is not bliss. It is indeed noteworthy that all three ethical frameworks cited earlier underscore the fundamental importance of competence.
The main themes that derive from current ethical frameworks are privacy, consent, accountability, safety and security, transparency and explainability, fairness and non-discrimination, human control of technology, professional responsibility, and promotion of human values. Interwoven with these themes, is the need to establish internal, construct, and external validity (as outlined elsewhere). Each of the concerns will affect the design and implementation of the workflows during the processing and utilization of data. The themes also interact with one another. How do you guard the underlying principles around privacy if you have not secured your data? How can you truly promote human values if you do not know whether your algorithms are not unduly biased against race or gender or other more artificial qualities? How can you claim professional responsibility for an algorithm that has no internal validity? Without a consistently applied framework across a problem domain, the focus of your ethics will vary. Then, what you need to measure to actualize ethical benchmarking will as well. Moreover, benchmarking can play a nontrivial role in transparency and explainability.
What increases the complexity, however, is that the underlying methods for optimizing algorithms are rapidly evolving. We are now moving into an era of Automated Machine Learning (AutoML) where algorithms will choose an optimal set of algorithms that will provide optimized solutions.
A new field that explores new methods for explaining AI is called Explainable AI (XAI). It will be interesting to see how XAI methods are going to be embedded in AutoML solutions. In all likelihood, over the next decade, HR analysts will be left to deal with the interesting side cases. The authors expect that the field will be democratized and the workflow for the selection of optimized models automated.
Making it practical
With the intricate weave of ethical requirements what can be achieved at this moment?
- First, be legal. There are already legal constraints in place around AI, for example, privacy related to data processing, GDPR is the prime example, but others, such as anti-discrimination, laws will likely also apply.
A recent study for the European Commission by Prof. Frederik Zuiderveen Borgesius on the subject of AI and discrimination noted that although there are legal frameworks, they are attuned to certain categories such as bias against skin color or gender. However, AI might develop biases in new artificial classes dependent on the underlying structure of the data and how variables or features were operationalized.
- Second, explicate your values and seek to abide by them. Although Google’s catchall “Don’t be evil” phrase may leave something to be desired, it opens the door to scrutiny and critique. And when it comes to ethical benchmarking, criticism is free advice.
- Third, keep track of changes. Recognize that AI is impactful and therewith likely to yield a competitive advantage that is not going away and therefore is worth early investment. Documenting how we navigate the myriad decisions we are confronted with research and development, not only helps with ethical accountability, it also facilitates communication with key stakeholders.
- Fourth, look towards adjacent fields of practice. A practical AI ethical benchmark needs to focus on a specific set of properties or indicators of compliance with basic ethical principles that are measurable and related to the People Analytics problem domain. It needs to encapsulate the key operational features of AI judgment that are representative of the HR field.
As we are dealing with decisions that impact real people we need to differentiate between diagnostic and intervention purposes. This has similar properties to the medical field where you can, for example, have one device that assesses your health and another device that keeps your heart pumping and intervenes when there are irregularities. In general, the second category has a more immediate impact and should, therefore, be monitored closely. The FDA is currently reviewing how it will legislate Software as Medical Devices (SaDM).
- Fifth, review examples of actionable benchmarks such as the AI Fairness 360, that uses a wide range of methods to evaluate. To gain experience, consider running their tour. What you notice is that keeping bias out of the sampling of the data currently requires an elaborate understanding of the details. Humans need to be in the loop. This may of course change with time, but the technical benchmarks are only as good as the understanding of those that configure and deploy them. Unbiased training data is also about training your staff.
- Sixth, consider how to target co-development in your processes. Changing how we bias the AI training impacts on whom the inventions take place. Therefore, the interventions themselves need some adjustment. Once you have removed sources of bias from your data samples, consider a new phase of co-development with those who are affected by your decisions. Only then will you have a chance of meeting the Kantian principle of Humanity.
- Seventh, audit trails. Garbage in is garbage out, be careful with your sample sizes and methods. For example, humans annotate data so that AI can be trained based on those annotations. Through the training, humans can deploy their own biases. Therefore, we recommend keeping an audit trail of what you have done during the lifecycle. AI is a reflection of the organization. An auditable life cycle enables you to provide forensics later to show control and improve your processes.
- Finally, consider working as a community towards providing training, examples, and a shared space to gather experiences, ethical practices, and the best of breed AI models and benchmarks. Our problem domain has its unique requirements that we as a community are best placed to evaluate. For those looking for an example of this from an adjacent field, then review the AI-LAB for radiologists.
A lab for HR analytics would be a central resource to grow and strengthen ideas, share fundamental knowledge, and negotiate and adopt ethical practices enforced by mechanized benchmarks. A community-friendly, future-ready training ground.