Opinion

Real-time deepfakes are rewriting the rules of child safety

A child is seen wearing a headset and gaming.

Current online child safety infrastructure was built for a different internet. Image: Emily Wade/Unsplash

Ben Colman
Co-Founder and Chief Executive Officer, Reality Defender
This article is part of: Centre for AI Excellence
  • Predators are increasingly using real-time generative AI voice models to impersonate children on gaming platforms.
  • Current safety protocols rely on reactive moderation instead of the real-time audio verification needed today.
  • Legislative frameworks must expand beyond visual deepfakes to regulate synthetic audio manipulation in live channels.

I was sitting in the room while my kids played Roblox when I heard something through the speakers that stopped me cold: voices that sounded like children, but almost certainly weren’t. The cadence was off. The emotional range was flat in ways a parent notices but a platform doesn’t.

It was then that I fully understood what I was hearing: other players, likely adults, using real-time AI voice models to sound like kids in online games and social platforms.

This abuse of real-time voice communication in a gaming platform, especially one used by children, isn’t terribly new. Roblox, which was first released two decades ago, has over 144 million daily active users, and is used by roughly half of all American children under 16. The platform made age verification mandatory in January 2026 after a wave of lawsuits alleging that its safety measures were insufficient to protect minors from predatory contact.

But Roblox is just one of many other similar platforms. Discord, Fortnite and dozens of others have live voice chat features widely used by millions and a sizable young audience. And they most certainly face the same horrific risk I encountered while my kids were playing.

Current safety protocols were built for a different internet

Most online child safety infrastructure assumes a basic model: verify identity at sign-up, moderate content after it’s posted, flag accounts that violate terms of service. This model made sense when risks were primarily text-based, though it has struggled with bypass attempts since its inception.

In the last few years, generative AI has made it trivially easy to manipulate a voice in real time. Open-source voice cloning tools that required technical expertise two years ago now run as consumer-grade apps. The FBI’s 2025 Internet Crime Report documented over 22,000 AI-related complaints in a single year, with voice cloning specifically identified as a growing vector in impersonation schemes.

The result is a gap that widens every month. A platform might verify a user’s age through a selfie at registration (itself easy to bypass with a deepfake) yet never analyses the audio channel where the actual conversation takes place. An adult can pass a visual identity check and then use a synthetic child’s voice for every interaction that follows.

Moderation teams are ill-prepared

Content moderation asks: Is this content harmful? This question assumes the content is real and the task is judgement. But when a voice itself is synthetic, the first question should be: Is this person who they appear to be?

This question poses a fundamentally different technical challenge. It requires analysing audio streams in real time, not reviewing flagged posts after the fact.

The data suggests the current approach isn’t keeping pace. In 2024, NCMEC saw a 1,325% increase in CyberTipline reports involving generative AI, climbing from 4,700 to 67,000. Reports of online enticement, which includes adults communicating with children for sexual purposes, reached over 546,000, a 192% increase from the prior year. By mid-2025, generative AI-related reports had surged to over 440,000 in just six months.

When I testified before the United States Senate Judiciary Committee on the threat deepfakes pose to society, I noted that non-consensual deepfake imagery overwhelmingly targets girls and women, and that even high school students have used the technology to harm peers and educators.

While this testimony focused on election integrity and synthetic media broadly, the threat has since moved into territory that existing frameworks fail to address: live voice channels where adults use AI to impersonate children in real time.

The progression from fabricated content to fabricated identity, or from something you see after the fact to someone you interact with live, is the shift that demands a new response.

What needs to change

Three different shifts would meaningfully close this gap.

First, platforms need to treat live audio and video channels with the same rigour they apply to uploaded content. Real-time voice is now a primary interaction layer for millions of children. Detection infrastructure should match that reality.

Second, the technology to identify AI-generated voices in real time already exists, but platforms such as Roblox are not deploying it to catch these incidents at scale. Platforms like Roblox need to integrate synthetic media detection into their trust and safety operations, just as they did with content classifiers a decade ago. This doesn’t require building from scratch; it requires applying proven detection where it’s needed most.

Have you read?

Third, policy needs to catch up. The TAKE IT DOWN Act, signed into law in May 2025, was a meaningful step in criminalizing non-consensual deepfake imagery and requiring platforms to remove it. Its first conviction came in April 2026. Yet the Act focuses primarily on published visual content and doesn’t address real-time voice impersonation in live channels, which is the vector through which trust is built and grooming occurs. Expanding legislative scope to cover real-time audio manipulation, particularly on platforms used by minors, would address the most immediate and least-regulated gap.

The time for legislation is now

The quality of real-time voice synthesis improves with every model release, and the cost drops with every update. The platforms children use are adding more voice features, not fewer.

None of this means we should fear the technology itself. AI voice tools have extraordinary applications in accessibility, creative expression and communication. Yet when the same tools can be used to impersonate a child’s peer in a live conversation, we need verification infrastructure that operates at the speed of the interaction.

The previous generation of child safety tools was built around the idea that harmful content could be caught after it appeared. The next generation needs to verify authenticity in real time before trust is established.

That shift from reactive moderation to real-time verification is the most important infrastructure investment platforms and policy-makers can make right now. Because at the end of the day, the kids on the other end of that voice channel deserve systems as sophisticated as the threats they face.

Loading...
Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Stay up to date:

Cybersecurity

Share:
The Big Picture
Explore and monitor how Cybersecurity is affecting economies, industries and global issues
World Economic Forum logo

Forum Stories newsletter

Bringing you weekly curated insights and analysis on the global issues that matter.

Subscribe today

About us

Engage with us

Quick links

Language editions

Privacy Policy & Terms of Service

Sitemap

© 2026 World Economic Forum