Understanding Embodied Intelligence: Key Components and Technologies

Understanding Embodied Intelligence: Key Components and Technologies

Currently, there are two major trends in the tech industry: one is the wave of large models sparked by ChatGPT, and the other is the trend of humanoid robots, more broadly referred to as the wave of embodied intelligence. Especially after witnessing the investments and showcases of companies in humanoid robots at last week’s World Robot Conference, I can only say that the era of robots is approaching!

What is embodied intelligence? What are its key components?

Embodied intelligence is the ability to understand the world, interact, and accomplish tasks through learning and evolution in both physical and digital realms. It is generally considered to consist of the ‘body’ and the ‘agent’ that perform tasks in complex environments.

The ultimate goal is for the agent to adapt to new environments, learn new knowledge, and solve real-world problems through interaction with the physical world (virtual or real).

  • Body: The robot body that perceives and executes tasks in physical or virtual environments.

  • Agent: The intelligent core embodied on top of the body, responsible for perception, understanding, decision-making, and control.

  • Data: Used for generalization and training.

What is the cornerstone of the technology stack for embodied intelligence?

From the concept of embodied intelligence, it is hoped that the embodied intelligence body can help people solve real problems, thereby liberating our productivity.

Returning to our existing models, how does the robot body help solve problems? The most common approach is to define the requirements, after which engineers customize solutions for specific scenarios through programming or teaching; the robot itself cannot think and find solutions beyond the code.

The model of embodied intelligence differs in that the embodied intelligence body typically has sensors such as vision and language, which, combined with visual signals and voice information, allow the robot to decompose tasks and understand the environment based on the information it reads, and then program itself to accomplish its goals.

The difference between the two models is that one involves humans teaching machines to work, while the other involves robots learning to work by mimicking humans. You will find that embodied intelligence is somewhat like a combination of deep learning and traditional robotics.

  • Large models can help robots understand and digest knowledge, forming the robot’s agent;

  • The robot body continues to leverage traditional robotic knowledge to solve actual physical tasks.

What are the cutting-edge research areas in embodied intelligence?

Robot Bodies

Robot Type

Main Application Areas

Technical Details

Representative Robots

Fixed-base Robots

Laboratory automation, education and training, industrial manufacturing

High-precision sensors and actuators, programming flexibility, micron-level precision

Franka Emika Panda, Kuka iiwa, Sawyer

Wheeled Robots

Logistics, warehousing, security inspection

Simple structure, low cost, high efficiency, fast movement

Kiva Robot, Jackal Robot

Crawler Robots

Agriculture, construction, disaster recovery, military applications

Strong off-road capability and maneuverability, stability, and traction

PackBot

Quadrupedal Robots

Exploring complex terrains, rescue missions, military applications

Multi-joint design, strong adaptability, strong environmental perception capabilities

Unitree A1, Go1, Boston Dynamics Spot, ANYmal C

Humanoid Robots

Service industry, healthcare, collaborative environments

Humanoid shape, multi-degree-of-freedom hand design, ability to perform complex tasks

Atlas, HRP series, ASIMO, Pepper

Bionic Robots

Healthcare, environmental monitoring, biological research

Simulate the movements and functions of natural organisms, flexible materials and structures

Fish robots, insect robots, soft robots

Understanding Embodied Intelligence: Key Components and Technologies

Data Source — Simulators

Simulators play a crucial role in embodied intelligence by providing virtual environments that help researchers conduct cost-effective, safe, and highly scalable experiments and tests.

General Simulators

General simulators provide a virtual environment that closely resembles the physical world, used for algorithm development and model training, offering significant cost, time, and safety advantages.

Specific simulator case studies:

  • Isaac Sim: An advanced platform for robot and AI research simulation, featuring high-fidelity physics simulation, real-time ray tracing, and a rich library of robot models, applicable to scenarios including autonomous driving, industrial automation, and human-robot interaction.

  • Gazebo: An open-source robot research simulator that supports various sensor simulations and multi-robot system simulations, mainly used for robot navigation and control.

  • PyBullet: A Python interface to the Bullet physics engine, easy to use, supporting real-time physics simulation, mainly used for reinforcement learning and robot simulation.

Real-World Scene-Based Simulators

These simulators create highly realistic 3D scenes by collecting real-world data, making them the preferred choice for embodied intelligence research in home activities.

Specific simulator case studies:

  • AI2-THOR: An indoor embodied scene simulator based on Unity3D, containing rich interactive scene objects and physical properties, suitable for multi-agent simulation and complex task research.

  • Matterport 3D: A large 2D-3D visual dataset containing rich indoor scenes, widely used for embodied navigation benchmark testing.

  • Habitat: An open-source, large-scale human-robot interaction simulator based on the Bullet physics engine, providing high-performance, fast, parallel 3D simulation and rich interfaces, suitable for reinforcement learning in embodied intelligence research.

Understanding Embodied Intelligence: Key Components and Technologies

Agents

Research Area

Main Goals

Specific Methods

Embodied Perception

Visual Simultaneous Localization and Mapping (vSLAM)

Traditional vSLAM (MonoSLAM, PTAM, ORB-SLAM), Semantic vSLAM (SLAM++, DynaSLAM)

3D Scene Understanding

Projection methods (MV3D), Voxel methods (VoxNet), Point cloud methods (PointNet)

Active Visual Perception

Interactive environment exploration (Pinto et al.), Exploration based on visual direction changes (Jayaraman et al.)

Tactile Perception

Non-visual tactile sensors (BioTac), Visual tactile sensors (GelSight)

Embodied Interaction

3D visual localization

Two-stage methods (ReferIt3D, TGNN), Single-stage methods (3D-SPS, BUTD-DETR)

Visual Language Navigation (VLN)

Memory and understanding-based methods (LVERG), Future prediction-based methods (LookBY)

Embodied Interaction in Dialogue Systems

Large model-based dialogue systems (DialFRED), Multi-agent collaboration (DiscussNav)

Embodied Agents

Multimodal foundational models

Multimodal data fusion and representation (VisualBERT), Representative models and applications (UNITER)

Embodied Task Planning

Task decomposition and execution (HAPI), Planning and realization of complex tasks (TAMP)

Sim-to-Real Adaptation

Embodied world models

Simulation and understanding of world models (Dreamer), Real-world application case studies (PlaNet)

Data Collection and Training

Creation and optimization of datasets (Gibson)

Embodied Control

Control algorithms and strategies (PPO), Instances and applications (DRL)

Basic Knowledge for Embodied Intelligence Development

Brief Introduction

You will also find that whether it is the body or the agent’s learning, there are many subdivisions, but some basic content is consistent. Next, I will introduce some general foundational knowledge:

  • Programming Languages and Data Structures

    • C++: Can be used for efficient embedded function execution and inference engine development; future articles will be published by GuYue Academy.

    • Python: For rapid functionality verification;

    • MatLab: For quick theoretical algorithm validation;

  • Basic Data Structures

  • ROS: A universal robot middleware that can quickly deploy basic robot functions; many LLMs now have typical cases with ROS.

  • Deep Learning

    • Fundamentals of deep learning, basic convolutional neural network architectures, AlexNet, ResNet, etc., context-aware RNNs, LSTMs, and Transformer architectures under self-attention mechanisms;

    • Deep learning frameworks: Pytorch;

    • (Advanced) Robot deep learning architectures: RT, RT-2, AutoRT/SARA-RT/RT-Trajectory, RT-H;

  • Embedded Development

    • Common chip development, such as ST, ESP, GD, Infineon series, etc.;

    • Ability to understand schematics and PCB boards;

    • Development of general Linux kernel drivers.

Introduction to Humanoid Robot Bodies

The image below is a schematic diagram of the joints and structure of the Qinglong full-size general humanoid robot.

Understanding Embodied Intelligence: Key Components and Technologies

Core Joints of the Robot

The core joints of the robot are mainly divided into linear joints, rotational joints, joint sensors, and joint drive systems;

Understanding Embodied Intelligence: Key Components and Technologies

The complexity of humanoid robots largely stems from their requirement for many degrees of freedom, which corresponds to the need for many joints in the robot body, involving a complex supply chain.

Looking back at the World Robot Conference, the component manufacturers included many different types:

Understanding Embodied Intelligence: Key Components and Technologies

What are the uses of these components? We need to go back to the linear joints, rotational joints, joint sensors, and joint drive systems.

  • Linear joints are a combination of motors and screws, allowing robots to perform linear motion;

  • Rotational joints are a combination of motors and reducers, enabling robots to perform rotational motion.

Motors

A motor is a device that converts electrical energy into rotational kinetic energy. A motor typically consists of a stator and a rotor; the stator is the fixed part, while the rotor is the rotating part. When power is applied, the current flows through the windings, generating a magnetic field that creates a reaction force between the stator and rotor, causing the rotor to rotate.

Understanding Embodied Intelligence: Key Components and TechnologiesUnderstanding Embodied Intelligence: Key Components and Technologies

Reducers

Understanding Embodied Intelligence: Key Components and Technologies

Screws

Understanding Embodied Intelligence: Key Components and TechnologiesUnderstanding Embodied Intelligence: Key Components and Technologies

Understanding Embodied Intelligence: Key Components and Technologies

Understanding Embodied Intelligence: Key Components and Technologies

Understanding Embodied Intelligence: Key Components and Technologies

Understanding Embodied Intelligence: Key Components and Technologies

Leave a Comment