Why artificial intelligence design must prioritize data privacy

Mar 31, 2022

Data privacy is often linked with artificial intelligence (AI) models based on consumer data. Image: Photo by Adeolu Eletu on Unsplash

Einaras von Gravrock

Founder and Chief Executive Officer, Cube AI

Listen to the article

Artificial intelligence is integral to developments in healthcare, technology, and other sectors, but there are concerns with how data privacy is regulated.
Data privacy is essential to gain the trust of the public in technological advances.

Data privacy is often linked with artificial intelligence (AI) models based on consumer data. Understandably, users are wary about automated technologies that obtain and use their data, which may include sensitive information. As AI models depend on data quality to deliver salient results, their continued existence hinges on privacy protection being integral to their design.

More than just a way to dispel customers’ fears and concerns, good privacy and data management practices have a lot to do with the company’s core organizational values, business processes, and security management. Privacy issues have been extensively studied and publicized, and data from our privacy perception survey indicates that privacy protection is a crucial concern for consumers.

Addressing these concerns contextually is crucial, and for companies operating with consumer-facing AI, there are several methods and techniques that help solve privacy concerns often linked to artificial intelligence.

Some products and services need data, but they don’t need to invade anyone’s privacy

Companies working with artificial intelligence are already facing a disadvantage in the public’s eye in terms of privacy. According to the European Consumer Organization in 2020, a survey showed that 45-60% of Europeans agree that AI will lead to more abuse of personal data.

There are many popular online services and products that rely on large datasets to teach and improve their AI algorithms. Some of the data in those datasets might be considered private even by the least privacy-conscious users. Streams of data from networks, social media pages, mobile phones, and other devices contribute to the volume of information that businesses use to train machine learning systems. Thanks to overreaching personal data use and mismanagement by some companies, privacy protection is becoming a public policy issue around the world.

Much of our sensitive data is gathered to improve AI-enabled processes. A lot of the data analyzed is also driven by machine learning adoption, as sophisticated algorithms need to make decisions in real-time, based on those data sets. Search algorithms, voice assistants, and recommendation engines are just a few solutions that leverage AI based on large datasets of real-world user data.

Have you read?

Massive databases might encompass a wide range of data, and one of the most pressing problems is that this data could be personally identifiable and sensitive. In reality, teaching algorithms to make decisions does not rely on knowing who the data relates to. Therefore, companies behind such products should focus on making their datasets private, with few, if any, ways to identify users in the source data, as well as creating measures to remove edge cases from their algorithms to avoid reverse-engineering and identification.

The relationship between data privacy and artificial intelligence is quite nuanced. While some algorithms might unavoidably require private data, there are ways to use it in a lot more secure and non-invasive ways. The following methods are just some of the ways how companies using private data can become part of the solution.

Designing artificial intelligence with data privacy in mind

We have talked about the issue of reverse engineering, where bad actors discover vulnerabilities in AI models and discern potentially critical information from the model's outputs. Reverse engineering is why changing and improving databases and learning data is vital for AI use in cases facing this challenge.

For instance, combining conflicting datasets in the machine learning process (adversarial learning) is a good option for distinguishing flaws and biases in the AI algorithm’s output. There are also options for using synthetic data sets that do not use actual personal data, yet their efficacy is still in question.

Healthcare is a leader in the governance around AI and data privacy, especially handling sensitive private data. It has also been doing a lot of work on consent, both for medical procedures or handling their data – the risks are high and have been legally enforced.

As for the overall design of AI products and algorithms, de-coupling data from users via anonymization and aggregation is key for any business using user data to train their AI models.

There are many considerations that can strengthen privacy protection in AI companies:

Privacy at the core: put privacy protection on the developer's radar and find ways to reinforce security effectively

Anonymize and aggregate datasets, remove all personal identifiers and unique data points

Have strict control over who in the company has access to specific data sets and continuously audit how this data is accessed, as it has been the reason behind some data breaches in the past

More data is not always the best solution. Test your algorithms with minimized data to learn what is the least amount of data you need to gather and process that makes your use case viable

It is essential to provide a streamlined way to eliminate personal data at the user's request. Companies that only pseudo-anonymize user data should then continuously retrain their models with the most up to date data

Leverage strong de-identification tactics, e.g., aggregated and synthetic datasets with full anonymization, non-reversible identifiers for algorithm training, auditing, and quality assurance, among others

Safeguard both the autonomy and privacy of users by rethinking ways of obtaining and using critical information from third parties – examine data sources closely and only use those that gather data with clear and informed user consent

Consider the risks: could an attack feasibly jeopardize user privacy from the outputs of your AI system?

What is the future of data privacy and AI?

AI systems need lots of data, and some top-rated online services and products could not work without personal data used to train their AI algorithms. Nevertheless, there are many ways to improve the acquisition, management, and use of data, including the algorithms themselves and the overall data management. Privacy-respecting AI requires privacy-respecting companies.

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.