- New documentation standards in machine learning can enable responsible technology.
- These risk management strategies highlight how organizations can be compliant while protecting their valuable intellectual property.
The growing use of machine learning (ML), within a global drive toward digital acceleration, raises the question, “Who’s minding the store?”
Have you read?
We’ve seen many examples of both the power and pitfalls of ML models. While organizations are increasingly aware of the downsides (our research shows many are still behind the curve), they find themselves trying to manage these risks without a set of agreed upon tools or standards. Meanwhile, they’re keenly aware of the competitive environment, rationally protecting their proprietary methods.
Governments in some regions have offered early guardrails by beginning to regulate the underlying model data, however, a global standard has not been established and regulation for ML models has yet to catch up.
Two key components for using ML responsibly provide a prudent “start here” for organizations: model explainability and data transparency. The inability to explain why a model arrived at a particular result presents a level of risk in nearly every industry. In some areas, like healthcare, the stakes are particularly high when a model could be presenting a recommendation for patient care. In financial services, regulators may need to know why a lender is making a loan. Data transparency can ensure there is no unfair or unintended bias in the training data sets used to build the model, which can lead to disparate impact for protected classes – and consumers have what is increasingly a legally protected right to know how their data is being used.
But how can organizations developing ML models enforce explainability and transparency standards when doing so might mean sharing with the public the very features, data sets, and model frameworks that represent that organization’s proprietary intellectual property (IP)?
Managing risk through documentation
Given machine learning’s complexity and interdisciplinary nature, executives should employ a wide variety of approaches to manage the associated risks, which include building risk management into model development and applying holistic risk frameworks that leverage and adapt principles used in managing other types of enterprise risk. Executives must also simply step back regularly to consider the broad implications for employees and society when using ML.
One horizontal practice underlying these approaches is documentation, which can serve as a mechanism for both explainability and transparency while providing a proxy for aspects of the model and its contents without exposing the underlying data or model features.
Whereas standard technical documentation is created to help practitioners implement a model, documentation focused on explainability and transparency informs consumers, regulators, and others about why and how a model or data set is being used. Such documentation includes a high-level overview of the model itself, including: its intended purpose, performance, and provenance; information about the training data set and training process; known issues or tradeoffs with the model; identified risk mitigation strategies; and any other information that can help contextualize the technology. It can be delivered in the form of model fact sheets, score cards, algorithmic impact assessments, and/or data set labels.
Listing the ingredients without revealing the recipe
A good analogy may be drawn from considering a cookie sold by a food manufacturer (an analogy inspired by the Data Nutrition Project effort, where one of our authors is a contributor in a personal capacity). While the company making such a cookie is not eager to share the proprietary recipe, food regulators require a nutrition label on the package with a list of ingredients for consumers. Consumers don’t need to know the exact recipe to make informed choices about the cookie; they only need an overview of the ingredients and a macronutrient profile presented in an understandable way.
Similarly, model documentation can become the proxy for sharing the model and its features and data sets with the world as opposed to sharing the actual “cookie recipe.” This gives others the ability to check the “health” and rationale of the model without needing to know, or being able to derive, how to recreate the model. No trade secrets are leaked.
How is the World Economic Forum ensuring that artificial intelligence is developed to benefit all stakeholders?
Artificial intelligence (AI) is impacting all aspects of society — homes, businesses, schools and even public spaces. But as the technology rapidly advances, multistakeholder collaboration is required to optimize accountability, transparency, privacy and impartiality.
The World Economic Forum's Platform for Shaping the Future of Technology Governance: Artificial Intelligence and Machine Learning is bringing together diverse perspectives to drive innovation and create trust.
- One area of work that is well-positioned to take advantage of AI is Human Resources — including hiring, retaining talent, training, benefits and employee satisfaction. The Forum has created a toolkit Human-Centred Artificial Intelligence for Human Resources to promote positive and ethical human-centred use of AI for organizations, workers and society.
- Children and young people today grow up in an increasingly digital age in which technology pervades every aspect of their lives. From robotic toys and social media to the classroom and home, AI is part of life. By developing AI standards for children, the Forum is working with a range of stakeholders to create actionable guidelines to educate, empower and protect children and youth in the age of AI.
- The potential dangers of AI could also impact wider society. To mitigate the risks, the Forum is bringing together over 100 companies, governments, civil society organizations and academic institutions in the Global AI Action Alliance to accelerate the adoption of responsible AI in the global public interest.
- AI is one of the most important technologies for business. To ensure C-suite executives understand its possibilities and risks, the Forum created the Empowering AI Leadership: AI C-Suite Toolkit, which provides practical tools to help them comprehend AI’s impact on their roles and make informed decisions on AI strategy, projects and implementations.
- Shaping the way AI is integrated into procurement processes in the public sector will help define best practice which can be applied throughout the private sector. The Forum has created a set of recommendations designed to encourage wide adoption, which will evolve with insights from a range of trials.
- The Centre for the Fourth Industrial Revolution Rwanda worked with the Ministry of Information, Communication Technology and Innovation to promote the adoption of new technologies in the country, driving innovation on data policy and AI – particularly in healthcare.
Contact us for more information on how to get involved.
Importantly, documentation can help organizations identify critical issues early on. In a timely example, one state government building a model to help manage COVID-19 vaccine allocation realized, through the creation of model documentation, that several regions were underrepresented and quickly recalibrated the model to correct the issue.
Documentation provides other benefits to an organization. It can, for example, help companies respond to the market. In the case of cookies, this is comparable to the blinking red light that goes off for nutritionists when a cookie is high in trans fats. Once consumers realize the harm from an excess amount of trans fats and decide instead to purchase a cookie without them, the food producer may determine independently to make a healthier recipe in order to match consumer need.
Putting documentation into practice
Organizations looking to operationalize the practice could do so with a bottom-up approach, where practitioners and business stakeholders decide what needs to be recorded about the model contents and processes. A ML or data set developer could then be provided with a checklist of questions aligned to selected categories. For a data set, for example, we’ve seen the emergence of categories such as description, composition, provenance, collection, and management.
The process of building the documentation would also require cross-functional support. Additional team members will likely include subject-matter experts (to understand implications of model and data use cases), a legal/regulatory expert, as well as those who helped curate and manage the data. Teams may also include a member of any group affected by the AI systems to ensure that the documentation is actually understood by those who will experience the impact.
Not all models require the same level of risk management. Documentation processes can take time and effort, so we recommend building this practice into a risk-assessment framework. High-stakes models are likely a good place to start. Over time, as aspects of documentation become automatable, lower-risk use cases may also become good candidates for more diligent documentation.
Don’t wait for a universal documentation standard
Nutrition labels are useful because they are standardized, enabling consumers to know where to check for certain ingredients. Likewise, model and data documentation is more powerful when universally applied and understood, or at least interoperable. There are several ongoing ML documentation initiatives in the works, but no winning contender yet.
Governments will further regulate in this space, and we are not advocating for specific laws or policy changes. But beginning to engage in the practice of documentation itself can act as a forcing function for collaboration and further research, while protecting your IP.