The Rise of Embodied AI

Dec 03, 2024

Why did we get LLMs first, rather than robots, in the evolution of AI? Was it random, or is there something fundamental about it?

There is a lot to it, and it’s tied to evolutionary biology.

And why, counterintuitively, has cognition turned out to be “easier to solve for” (we haven’t solved that yet, but LLMs can fluidly speak languages, do the math, and now start their own chain of inferences) than real-world movements?

That is an answer to it: Moravec’s Paradox.

Moravec’s Paradox taught us a valuable lesson. It was articulated in the 1980s by roboticist Hans Moravec and pioneers in AI and robotics Marvin Minsky and Rodney Brooks.

They observed that AI systems excelled in tasks requiring abstract reasoning but were significantly underperforming in tasks requiring real-world interactions.

The paradox is rooted in evolutionary biology.

Skills like perception and motor control have been honed over millions of years in humans and animals, making them profoundly ingrained and unconscious.

Abstract reasoning and formal problem-solving are recent evolutionary developments requiring conscious effort and learning.

The paradox was confirmed when ChatGPT started to showcase advanced cognitive skills by 2022, while robotics still struggled.

And yet now, thanks to advancements towards embodied AI, which integrates perception, movement, and adaptability.

Are we getting closer to a “ChatGPT moment” for robots?

First of all, let’s take a step back, then move full steam ahead!

What’s the thesis behind humanoids?

While the name itself (humanoid) might seem something from sci-fi (and indeed it is!), in reality, the thesis behind it is based on a form factor: humans.

In other words, since the whole world has already been built for humans, the first form factor, able to scale for robots, might be that of a human.

Figure AI’s founder, Brett Adcock, explains the thesis extremely well on No Priors:

Of course, there is a significant risk here, as these humanoids will scale, which is the uncanny valley effect.

The uncanny valley effect refers to the discomfort people feel when encountering humanoid entities that appear almost, but not precisely, human.

For humanoid robots, that can lead to user discomfort, reduced trust, and hindered acceptance due to their near-human appearance.

Said that, let’s see where we are right now!

AI Robotics

AI Robotics is a field that combines artificial intelligence (AI) with robotics.

It enables robots to perform complex tasks autonomously by integrating AI algorithms for object recognition, navigation, and decision-making tasks.

This integration enhances robot capabilities, allowing them to mimic human-like intelligence and adapt to changing environments more effectively.

AI robotics is crucial for applications like autonomous vehicles, precision manufacturing, and advanced home automation systems.

This time, though, is quite different for a simple reason: We're also entering a general-purpose revolution in robotics!

Enter general-purpose robotics via world modeling

World modeling is a crucial stepping stone for the next step in the evolution of AI.

The next frontier of general-purpose robotics depends on the evolution of “world models” or AI-based environmental maps/representations, enabling robots to predict interactions and navigate complex, dynamic settings effectively.

All major big tech players are massively investing in it.

For instance, NVIDIA just announced new advancements in world modeling that will transform how robots understand and interact with their surroundings.

Robots can now better anticipate and adapt to real-world scenarios by building detailed AI-powered representations of environments.

This breakthrough enables robots to handle tasks with greater awareness and precision, allowing smarter, more human-like automation across industries.

As a result, sectors like logistics, healthcare, and retail stand to benefit from robots that are more capable and more adaptable to diverse, complex environments.

Why does it matter?

Enhanced Environmental Understanding: Robots can build AI-powered representations of their surroundings, allowing them to predict how objects and environments will respond to their actions.
Adaptability: World modeling enables robots to navigate better and adapt to diverse, dynamic environments, making them suitable for complex, real-world applications.
Human-Like Precision: By “understanding” their environments, robots achieve more precise, natural movements, bringing them closer to human-like interactions.
Broad Industry Impact: This advancement holds transformative potential across logistics, healthcare, retail, and more, as robots can handle a wider range of tasks more accurately.
Scalable Automation: World modeling supports more intelligent, efficient automation, paving the way for robots that perform tasks and learn and adjust in real time.

Another aspect is dexterity.

Why has dexterity become the “holy grail” of general-purpose robotics?

We humans take our dexterity for granted, yet, at this stage, it is among the hardest challenges in robotics. If solved, this problem can create the next trillion-dollar industry, as it would open up the space to general-purpose robotics.

Indeed, robot dexterity is challenging because it requires robots to handle diverse, delicate objects in unpredictable environments—something we humans do instinctively.

Achieving this demands sophisticated sensors, machine learning, and real-time adaptability to avoid damaging items or failing tasks. Unlike repetitive, controlled tasks, dexterity involves adjusting to unique shapes, textures, and weights in dynamic settings.

This complexity has made robot dexterity a “holy grail” in robotics, as it’s essential for automating tasks like sorting, packing, or even assisting in healthcare, where human-like precision and adaptability are critical.

Solving it could unlock new levels of automation across industries, reshaping labor and efficiency.

That’s why a company like Physical Intelligence got $400 million in funding led by Jeff Bezos to try to revolutionize robotics by enabling robots to handle objects with human-like precision.

Its breakthrough pi-zero software empowers robots to adapt and perform complex tasks autonomously, promising transformative impacts across logistics, healthcare, and beyond but raising employment implications.

This shows impressive momentum in the field as:

Investment for Precision Robotics: Backed by Jeff Bezos and others, Physical Intelligence secured $400 million to advance robotic dexterity, aiming to give robots a human-like touch. This breakthrough could reshape logistics, retail, and other sectors by enabling robots to handle diverse objects.
Pi-zero Software: The startup’s new control software, pi-zero, uses machine learning to enable robots to perform complex tasks like folding laundry, bagging groceries, and even removing toast from a toaster. It allows robots to adjust in real time, enhancing their adaptability in unpredictable environments.
Broader Industry Impact: This innovation addresses key automation challenges as businesses seek solutions amid labor shortages, especially in warehousing and retail. The technology also holds potential for agriculture, healthcare, and hospitality, where robots could handle labor-intensive or support tasks, potentially reducing manual roles.
Industry Momentum in AI Robotics: Amazon, Walmart, and SoftBank are deploying intelligent robots to handle tasks in fulfillment, inventory, and customer service. These robots perform repetitive or labor-intensive duties, allowing human employees to focus on higher-level tasks.

Spatial intelligence is the next frontier

Spatial intelligence, through world modeling, is making impressive leaps.

Boston Dynamics’ latest Atlas robot showcases autonomous power in this video. It moves car parts with adaptive sensors and no teleoperation. Atlas performs real-time adjustments, targeting automotive factory work.

Boston Dynamics’ Atlas robot is impressive because it demonstrates true autonomy in complex tasks—picking and moving automotive parts without human guidance.

It adapts to environmental changes, like shifts in object positions or action failures, using advanced sensors and real-time adjustments.

This level of independence, especially in dynamic factory settings, sets a high bar for robotics, as most competitors still rely on pre-programmed or remote-controlled actions.

Atlas’s efficient, powerful movements save time, showcasing its potential to transform industrial automation with speed and adaptability.

But of course, a reminder that this is only a demo!

As I've shown you so far, general-purpose robotics will see incredible development in the coming decade.

However, this is a key reminder, as there are still many limitations, and we don't know yet at which stage of development of these world models we are!

An interesting study that came out from MIT and Harvard really “stress-tested” LLMs regarding world modeling.

From there, MIT and Harvard researchers reveal that large language models (LLMs) lack a coherent understanding of the world, performing well only within set parameters.

Using new metrics, they found that AI models can navigate tasks but fail when conditions change, underscoring the need for adaptable, rule-based world comprehension models.

As per the study:

Research Findings: MIT and Harvard researchers found that large language models (LLMs) can perform tasks like giving driving directions with high accuracy yet lack a true understanding of the underlying world structure. Model performance dropped significantly when faced with changes, such as street closures.
New Metrics for World Models: The team developed two metrics—sequence distinction and sequence compression—to test whether AI models have coherent “world models.” These metrics helped evaluate how well models understand differences and similarities between states in a structured environment.
Testing Real-World Scenarios: By applying these metrics, researchers discovered that even high-performing AI models generated flawed internal maps with imaginary streets and incorrect orientations when navigating New York City.
Implications: The study suggests that current AI models may perform well in specific contexts but fail if the environment changes. For real-world AI applications, models need a more robust, rule-based understanding.
Future Directions: Researchers aim to test these metrics on more diverse problems, including partially known rule sets, to build AI with accurate, adaptable world models, which could be valuable for scientific and real-world tasks.

So let's mind that...

And to recap:

AI Robotics Integration: Combines AI with robotics for tasks like object recognition, navigation, and decision-making across industries such as logistics and manufacturing.
General-Purpose Robotics: Focused on robots handling diverse tasks with adaptability across various industries.
World Modeling: Enables robots to create AI-powered environmental maps for better prediction and navigation in dynamic settings.
NVIDIA Advances: Developing AI-powered world models to enhance robotic awareness and precision.
Boston Dynamics Atlas: Showcased autonomous factory work with real-time adaptability in moving car parts.
Dexterity in Robotics: A key challenge requiring robots to handle diverse objects in unpredictable environments.
Physical Intelligence (pi-zero): $400M funded software enabling human-like robotic dexterity for tasks like packing and healthcare assistance.
MIT & Harvard Research: Found AI struggles with dynamic real-world changes, highlighting gaps in robust world comprehension.
Spatial Intelligence: Enhances robotic capabilities in tasks requiring precise environmental awareness and adaptation.
Broader Industry Impact: Applications in logistics, healthcare, retail, and agriculture, addressing labor shortages and improving efficiency.

In the meantime, as 2024 ends, we see an impressive explosion of "humanoids."

A humanoid robot is designed to resemble the human body in shape and function, typically featuring a torso, head, arms, and legs.

These robots are created to mimic human motion and interaction, allowing them to perform tasks that require a human-like form and motion, such as walking, talking, and interacting with environments.

As of 2024, the sector is already boasting a broad set of companies working on the problem!

This is where we are right now, with a list of the top players in the field of humanoid robots:

HD Atlas (Boston Dynamics)
NEO (1X, Norway)
GR-1 (Fourier, Singapore)
Figure 01 (USA)
Phoenix (Sanctuary AI, Canada)
Apollo (Apptronik, USA)
Digit (Agility, USA)
Atlas (Boston Dynamics, USA)
H1 (Unitree, China)
Optimus Gen 2 (Tesla, USA)

More precisely:

Atlas by Boston Dynamics: A highly dynamic, fully electric humanoid robot designed for real-world applications. Atlas features an advanced control system and state-of-the-art hardware, enabling it to perform complex movements and tasks with agility and precision.
Salvius: An open-source humanoid robot project focused on creating a versatile platform for research and development. Salvius is built with a versioned engineering specification to ensure each component meets a minimum standard of functionality before integration.
Digit by Agility Robotics: A bipedal humanoid robot with a unique leg design for dynamic movement. Digit has nimble limbs and a torso packed with sensors and computers, enabling it to navigate complex environments and perform tasks in warehouses and other settings.
Figure 02 by Figure AI: The second-generation humanoid robot developed by Figure AI, designed to set new standards in AI and robotics. Figure 02 combines human-like dexterity with cutting-edge AI to support various industries, including manufacturing, logistics, warehousing, and retail.
HRP-4: A humanoid robot developed as a successor to HRP-2 and HRP-3, focusing on a lighter and more capable design. HRP-4 aims to improve manipulation and navigation in human environments, making it suitable for various research and practical applications.
Optimus by Tesla: A humanoid robot designed by Tesla to perform unsafe, repetitive, or boring tasks for humans. Optimus intends to leverage Tesla's expertise in AI and robotics to create a versatile and capable robotic assistant.
H1 by Unitree Robotics: Unitree's first universal humanoid robot, H1, is a full-size bipedal robot capable of running. H1 represents a significant step forward in humanoid robotics, aiming to integrate into various applications with its advanced mobility and adaptability.
Roboy: An advanced humanoid robot developed at the Artificial Intelligence Laboratory of the University of Zurich. Roboy is designed to emulate human movements and interactions, with applications in research and development of soft robotics and human-robot interaction.
RH5: A series-parallel hybrid humanoid robot designed for high dynamic performance. RH5 can perform heavy-duty dynamic tasks with significant payloads, utilizing advanced control systems and trajectory optimization techniques.
NimbRo-OP2X: An affordable, adult-sized, 3D-printed open-source humanoid robot developed for research purposes. NimbRo-OP2X aims to lower the entry barrier for humanoid robot research, providing a flexible platform for various applications and studies.

Thus, the development of humanoid robots is progressing rapidly, with significant investments and technological advancements.

These robots have the potential to transform industries by automating tasks, addressing labor shortages, and increasing efficiency as initial use cases!

What are the two critical commercial use cases humanoids will help address?

Humanoids might help address two major demographic shifts - which can’t be controlled via simple/gradual progress - but rather via a few technological breakthroughs.

Of course, as we speak, there are advancements in biology, which might make not just our lifespan, but our healthspan much longer (that’s what longevity is trying to tackle), in reality, we still don’t have too much time left (probably a couple of generations) to tackle the massive demographic shifts that are undergoing.

Labor shortages

While “job displacement” makes a lot of noise.

Reality is, outside a few regions of the world where there is a massive availability of relatively specialized workforce, for manufacturing, that’s not the case in most parts of the world.

We’re at a critical turning point, also in a demographically solid country, like the US, where there is a wide gap between the workforce entering the marketplace, and the massive population retiring (that’s known as the "Silver Tsunami"):

And taking into account that, for specific jobs - especially in some manufacturing niches - there is a complete lack of available workforce.

That is why we’re seeing the boundaries of commercial availability of humanoids across this vertical.

And the thing is, we’re about to go into a major demographic collapse.

Addressing a major demographic collapse!

While humanoids do seem scary, there is a key point to understand in this current timeline for humanity:

This is a major issue affecting the whole world, where a combination of industrialization and urbanization has brought prosperity to billions of people, enabling all of us to live much longer lives.

Yet, it has also created an irreversible demographic effect—without a new model of industrialization, the country won’t be able to meet even the bare minimum of the manufacturing needs of an entire aging population.

Not by chance, the countries that are facing this threat are those who are moving faster. Take the case of South Korea, among the most critically ill, from a demographic perspective:

This is a classic inverted demographic pyramid, which is as bad as it gets. What it means, briefly, is that the new generations coming not only cannot replace the old ones retiring, but they’re not even close!

South Korea, together with Japan and Italy, is among the most extreme cases of demographic collapse.

Yet, the point is the whole combo of the urbanized/industrialized world is going through this.

That is why a country like South Korea has gone as far as filling 10% of its workforce with robots.

Of course, it’s not all good here.

Just like any tech, humanoids might be used for military purposes, and we should guard for it!

Recap: In This Issue

Moravec’s Paradox: Abstract reasoning is easier for AI to master (e.g., ChatGPT) than perception and motor control, which are deeply ingrained from evolution.
Humanoids’ Design Rationale: The human form is practical because environments are built for humans.
Uncanny Valley: Near-human appearances in robots can trigger discomfort and reduce trust.
AI Robotics: Combines AI with robotics to enhance capabilities like object recognition, navigation, and decision-making.
World Modeling: AI-powered environmental maps help robots predict and adapt to real-world scenarios.
NVIDIA Advances: AI models improve robotic precision and awareness for dynamic settings.
Dexterity Challenge: Human-like dexterity is critical but difficult for robots, requiring adaptability to diverse objects and environments.
Physical Intelligence: $400M invested in pi-zero software to enable robots to handle tasks like folding laundry and bagging groceries with precision.
Boston Dynamics’ Atlas: Demonstrated autonomy in factory tasks using real-time adaptability.
MIT & Harvard Research: Found AI models struggle with dynamic real-world changes, lacking robust "world understanding."
Top Humanoid Robots (2024): Notable players include Atlas (Boston Dynamics), Digit (Agility Robotics), and Optimus (Tesla).
Demographic Collapse: Countries face labor shortages as aging populations outpace workforce replacements (e.g., South Korea filling 10% of its workforce with robots).
Commercial Use Cases: Humanoids address labor shortages and support industries like logistics, healthcare, and manufacturing.
Ethical Concerns: Potential misuse of humanoids for military purposes requires vigilance.

Ciao!

With ♥️ Gennaro, FourWeekMBA

The Business Engineer

Discussion about this post