Artificial intelligence image generators bring delight - and concern

Oct 7, 2022

One user asked an AI text to image generator to create an image of 'a wise cat meditating in the Himalayas searching for enlightenment.' Image: Photo by Microsoft 365 on Unsplash

Spencer Feingold

Lead Editor, World Economic Forum

Listen to the article

Advances in artificial intelligence have been used to create sophisticated text-to-image systems.
These AI text to image generators—already used by millions of digital artists—have sparked concerns about disinformation, harmful content, copyright protections and biases.
"It needs to be governed and regulated accordingly," the Forum's lead platform curator for artificial intelligence said.

The number of available artificial intelligence (AI) text to image generators is increasing rapidly, giving the public an array of new tools to create digital imagery.

Last month, the AI development company OpenAI removed its waitlist for DALL·E, an AI-powered image generator that the organization built. DALL·E, which is named after the Spanish artist Salvador Dalí and the Disney Pixar animation film WALL-E, creates images from text descriptions that users can input into the system.

For example, as seen on the Twitter account of OpenAI CEO Sam Altman, a user could request DALL·E to create an image of “an elephant tea party on a grass lawn” or “a wise cat meditating in the Himalayas searching for enlightenment.”

DALL·E was first unveiled in April to select users including an array of graphic designers, authors and architects. In June, a DALL·E image made headlines after it was featured on the cover of Cosmopolitan—the first ever AI-generated magazine cover, according to Cosmopolitan.

“While each attempt takes only 20 seconds to generate, it took hundreds of attempts,” Karen X Cheng, a digital artist that helped produce the Cosmopolitan cover, explained on Instagram. “Hours and hours of prompt generating and refining before getting the perfect image.”

The innovations and concerns around AI text to image generators

DALL·E was built using an OpenAI text generator known as GPT-3. The system was updated and trained to produce corresponding images from complex text phrases with 12 billion parameters. Already, DALL·E has over 1.5 million active users who are creating more than 2 million images a day, according to OpenAI.

DALL·E joins several other AI-powered image generators that have been unveiled to the public in recent months. For instance, in August, the AI research organization Stability.ai released Stable Diffusion, another AI text to image generator. Google is also working to develop two text-to-image systems called Imagen and Parti, an acronym for Pathways Autoregressive Text-to-Image.

Midjourney is another AI text to image generator that allows users to input prompts through Discord, a popular group messaging platform. The system gained widespread attention in August after an image created on Midjourney won an art contest in the United States. The win sparked intense debate within the art world about creative processes and what constitutes art.

Already, thousands of AI-generated images are being offered for sale and download on certain stock photo websites. Yet the use of AI art generators has sparked widespread copyright concerns.

In September, Getty Images—one of the world's top visual media and photo supply companies—announced it would not accept AI-generated images on its various platforms. The company said in a statement that the decision was made because of copyright liabilities and usage rights concerns.

"We will continue to support creatives who use tools, including those that may leverage AI, to enhance their original concepts and visual work in line with our acceptance policies," Getty Images' statement added. "We stand ready to work with those who want to advance AI in a socially responsible manner and one that respects personal and intellectual property rights."

Moreover, skeptics have raised the alarm over AI image generators being used to create harmful content, spread disinformation and advance negative biases and stereotypes. And while some text-to-image systems have certain guardrails—like prohibiting images of public figures such as celebrities or politicians—many people have called for a more stringent governing system to regulate the sector.

"Just because images are created by AI doesn’t mean it shouldn’t be governed by ethical principles or abide by national laws and regulations e.g. privacy laws, child protection laws, copyrights,” said Hubert Halopé, the World Economic Forum's lead platform curator for AI and machine learning.

AI researchers are working to perfect text-to-image systems and address such concerns. OpenAI, for instance, has already unveiled its second iteration of DALL·E, DALL·E 2, which has been built to better identify misuse of personal imagery like individual faces and to counter biases when text inputs do not specify race or gender. Such inputs might include “CEO,” “firefighter” or “doctor.”

“Based on our internal evaluation, users were 12× more likely to say that DALL·E images included people of diverse backgrounds after the technique was applied,” OpenAI said in a statement.

AI-powered text-to-video systems are also being developed. And while video generators are at an earlier stage than text-to-image systems, several have been debuted in recent months.

In May, researchers at China's Tsinghua University and the Beijing Academy of Artificial Intelligence unveiled an AI-powered video generator called CogVideo.

More recently, in September, Meta—the parent company Facebook, Instagram and WhatsApp—launched Make-A-Video, an AI system that creates short video clips from text prompts. “This is pretty amazing progress,” Meta CEO Mark Zuckerberg said on Facebook.

“It's much harder to generate video than photos because beyond correctly generating each pixel, the system also has to predict how they'll change over time," Zuckerberg added.

Meta's release came just a few weeks before Google unveiled Imagen Video, a similar AI text-to-video system. The company said in a research paper that Imagen Video is "not only capable of generating videos of high fidelity, but also having a high degree of controllability and world knowledge, including the ability to generate diverse videos and text animations in various artistic styles and with 3D object understanding."

Have you read?

Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.