Emerging Technologies

When AI gets a body: How robots can learn to belong in human society

The future of AI in physical form rethinking how machines learn – not just from the internet, but from us.

How do we ensure robots using 'embodied' AI integrate into human society in safe, ethical and inclusive ways? Image: Getty Images/iStockphoto

Vanessa Evers
Director, Centrum Wiskunde & Informatica (CWI) Amsterdam, the national institute of Mathematics and Computer Science
  • Large language models like ChatGPT have captured the world's attention by drafting essays, summarizing research and even writing code.
  • But what happens when an AI-driven robot does more than just reply by text and instead engages with you in a way you find natural and acceptable?
  • The future of 'embodied' AI in physical form such as robots requires rethinking how machines learn – not just from the internet, but from us.

Large language models such as ChatGPT, Claude or DeepSeek have captured the world’s attention. By predicting the most likely next word, they can draft essays, summarize research and even write code.

These systems demonstrate how far AI has come in processing and generating text. But what happens when AI gains a body?

Discover

How is the World Economic Forum creating guardrails for Artificial Intelligence?

Imagine an AI-powered robot that doesn’t just predict the next word but the next appropriate action. Instead of replying in text, it could greet you, hand you an object or walk you to a meeting room in a way you find natural and acceptable.

This vision of “embodied AI” raises new questions: How can robots learn to behave in a world that is far richer and more unpredictable than text? And how do we ensure they integrate into human society in safe, ethical and inclusive ways?

Why training robots is harder than training chatbots

Today’s large language models (LLMs) are trained on trillions of words scraped from the internet, books and conversations.

They learn statistical patterns in language and are then fine-tuned with human feedback to improve safety and relevance. Training ChatGPT-4 reportedly required roughly a petabyte of data – the equivalent of watching more than 13 years of continuous high-definition video.

Yet even that amount pales in comparison to what would be needed for robots. The physical world is far more complex than text. To navigate it, a robot would need to predict not just the next word, but the next possible event or sensory input.

Have you read?

Consider a simple example: a robot facing a front door. It must know that doors swing on hinges, that bells remain fixed and that walls don’t simply disappear. This requires a world model grounded in physics, objects and social norms.

While such models can be built in controlled settings like warehouses or assembly lines, they break down in everyday life with its endless variations and uncertainties.

For perspective, by age four, a child has already processed an estimated equivalent of 10 terabytes of visual sensory input alone, close to the scale of data used to train ChatGPT-4.

And that’s only counting vision, before adding in auditory, tactile, taste, proprioceptive and olfactory inputs. This embodied learning allows children to understand how the world works in ways that current AI systems cannot match.

What humans can teach us about robot learning

Humans are effective learners not just because of our brains and learning through our bodies, but because we learn our cultures.

Philosopher Daniel Dennett argued that enculturation – that is, learning norms, values and behaviours of one’s culture through observation, experience and instruction – explains why humans so effectively use the ‘thinking tools’ in their brains and are developed far beyond other primates.

Chimpanzees also learn socially, but without active teaching. For instance, a chimpanzee child takes years to learn how to crack a nut with a stone from passive observation and imitation, because chimpanzee adults do not actively teach a child how to open nuts.

Human children, by contrast, benefit from intentional teaching by others. Over generations, these cultural adaptations are innovated and accumulate, fuelling the extraordinary success of our species.

If we want robots to learn effectively, we may leverage our proficiency for social learning and active teaching and consider a model where we combine embodied learning with social interaction and cultural transmission through active teaching.

From social signals to active teaching

Two approaches stand out on how that learning could happen, with the best path being a mix of both:

  • Social reinforcement learning (SRL): Unlike Chimpanzees, humans can actively teach and learn socially across generations. Robots could learn from this encultured human feedback in real time by interpreting social signals such as smiles, frowns, surprised looks, raised voices or people stepping away. These cues could guide robots toward more acceptable behaviours, much as children learn from social approval and disapproval.
  • Active teaching: Beyond passive observation, humans could play an active role in teaching robots. This could involve traditional methods such as learning from demonstration and explore more socially-based methods such as subtle negative feedback on mistakes that people perceive more subconsciously, or actively choosing the most appropriate action from a set of options the robot generates in a certain context. Training could happen in real environments (like hospitals or schools) or in simulations.

Over time, these methods would help robots develop patterns of behavior suited to their specific contexts of deployment. A hospital robot, for example, could gradually learn bedside etiquette, while a logistics robot could refine how it interacts with workers in a warehouse.

The challenge of planning and complexity

Even with social learning, robots face a further challenge: hierarchical planning.

Consider the seemingly simple goal of bringing someone to the train station. This requires managing multiple layers of subgoals: leaving the apartment, locking the door, catching a bus, going to the right platform. Each step involves unpredictable obstacles, from traffic to human interactions.

A robot cannot realistically script every tiny movement in advance down to subsequent individual muscle controls. Instead, it must combine pre-training with the ability to adapt subgoals as situations unfold.

Active teaching by humans – whether through demonstration, online or offline feedback – may be essential for helping robots navigate this complexity.

Why this matters now

Robots are no longer confined to science fiction. They are delivering food in cities, assisting in warehouses and even providing companionship in care homes. As generative AI begins to inhabit physical machines, the implications multiply:

  • Opportunities: Robots can take on dangerous tasks in construction or mining, support elderly care, or respond to climate disasters.
  • Risks: Without careful design, they may displace jobs, deepen inequalities or even be weaponized. Privacy, safety and trust are at stake.
  • Choices: Leaders must decide where robots belong, who benefits from their deployment, and how to build systems that earn public trust.

A call for cultural robotics

The future of AI in physical form cannot be solved by scaling up data alone.

It requires rethinking how machines learn – not just from the internet, but from us. If robots are to share our homes, workplaces and streets, they will need to be enculturated: learning through embodied interaction, social signals and active teaching in ways that compliment human development.

This is a profound challenge, but also an opportunity. By embedding cultural learning into robot training, we can shape autonomous systems that do more than perform tasks – they can integrate responsibly into the social fabric of our lives.

The question is not whether AI will gain a body, but how we will teach it to belong in our world.

Loading...
Don't miss any update on this topic

Create a free account and access your personalized content collection with our latest publications and analyses.

Sign up for free

License and Republishing

World Economic Forum articles may be republished in accordance with the Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International Public License, and in accordance with our Terms of Use.

The views expressed in this article are those of the author alone and not the World Economic Forum.

Stay up to date:

Artificial Intelligence

Share:
The Big Picture
Explore and monitor how Artificial Intelligence is affecting economies, industries and global issues
World Economic Forum logo

Forum Stories newsletter

Bringing you weekly curated insights and analysis on the global issues that matter.

Subscribe today

More on Emerging Technologies
See all

India’s river crisis: Why industry must lead the cleanup

Shivam Parashar and Hitesh Dahiya

December 5, 2025

7 ways the tech sector can lead the nature-positive transition

About us

Engage with us

Quick links

Language editions

Privacy Policy & Terms of Service

Sitemap

© 2025 World Economic Forum