As a developer working with robotic agents, I’ve seen how brittle long-horizon tasks can be. The gap between high-level instructions and low-level motor control often breaks the chain of execution. BAAI’s new approach aims to close that loop by treating the robot’s “brain” as a distinct, modular system rather than a monolithic model.
On March 29, at the “Future AI Pioneer Forum” during the 2025 Zhongguancun Forum, the Beijing Academy of Artificial Intelligence (BAAI) unveiled RoboOS, its first cross-embodiment cerebellum-cerebrum collaboration framework, and RoboBrain, an open-source embodied brain. These tools are designed to enable lightweight, rapid deployment across scenarios for multi-tasking and cross-embodiment collaboration, propelling single-machine intelligence toward swarm intelligence. They provide foundational technical support to accelerate scenario-based applications in building an open-source unified ecosystem for embodied AI.

△ Demo of cross-embodiment collaborative delivery tasks among multiple robots based on RoboOS and RoboBrain
Video Link:
https://mp.weixin.qq.com/s/APgi5k53hrJo8lpxcAkE-g
Enhancing Long-Horizon Task Capabilities: Building a Perception-Cognition-Decision-Action Closed Loop

In embodied AI scenarios, long-horizon manipulation tasks are a core capability for robots executing complex operations. The embodied brain, RoboBrain, integrates three-dimensional capabilities in robot task planning, affordance perception, and trajectory prediction. By mapping abstract instructions into concrete action sequences, it enhances the robot’s ability to handle long-horizon tasks.
RoboBrain consists of three modules: a foundation model for task planning, an A-LoRA module for affordance perception, and a T-LoRA module for trajectory prediction. During inference, the model first perceives visual inputs and decomposes instruction commands into a series of executable sub-tasks, followed by performing affordance perception and trajectory prediction. RoboBrain employs a multi-stage training strategy to equip it with long-term historical frame memory and high-resolution image perception capabilities, thereby improving scene understanding and operational planning.
RoboBrain demonstrates excellent performance in evaluations for task planning, affordance perception, and trajectory prediction.
In terms of task planning, RoboBrain outperforms six leading closed-source/open-source Multimodal Large Language Models (MLLMs), including GPT-4V and Claude 3, across multiple dimensions on robot planning benchmarks such as OpenEQA, ShareRobot (self-built), and RoboVQA, without compromising general capabilities.

I think modular brains like this could simplify agent integration in heterogeneous robot fleets. I’m watching how well the A-LoRA module handles real-world visual noise. As a builder, open-sourcing the brain is a big step for community-driven embodied AI improvements.
△ RoboBrain Performance on Embodied Planning Benchmarks
I looked at how RoboBrain handles the messy reality of physical tasks, specifically affordance perception. On the AGD20K test set, it achieved an average accuracy that surpassed Qwen2-VL, the then state-of-the-art open-source model. This validates its superior capabilities in instruction understanding and object attribute recognition.

△ RoboBrain Performance on Affordance Perception Benchmarks

△ RoboBrain Performance on Trajectory Prediction Benchmarks
In trajectory prediction, the operational trajectories predicted by RoboBrain exhibit high similarity to real-world trajectories. This demonstrates high precision and stability. Future iterations of RoboBrain will continue to enhance its trajectory prediction capabilities.
Personally, high-fidelity trajectory prediction is critical for safe robot deployment in shared spaces. I think surpassing Qwen2-VL on AGD20K suggests strong potential for complex instruction following. As a builder, the closed-loop architecture reduces the friction between planning and execution.
Currently, RoboBrain can interpret human instructions and visual images to generate action plans and evaluations based on real-time image feedback. It predicts the trajectory for each step and perceives corresponding affordances. Specifically, RoboBrain effectively utilizes environmental information and the state of interactive objects—whether captured from first-person or third-person perspectives—to generate task plans tailored to different types of robotic manipulation tasks. Based on human instructions and visual data, it provides reasonable affordance regions and demonstrates strong generalization across various scenarios, generating trajectories that are both feasible and logical.

The embodied brain RoboBrain, the cerebellum skill library, and the cross-robot data hub are core components of the cross-embodiment framework, RoboOS. The embodied brain RoboBrain is responsible for global perception and decision-making, constructing mechanisms for dynamic spatiotemporal perception, planning guidance, and feedback error correction. The cerebellum skill library handles low-latency precise execution, enabling flexible and delicate operations. The cross-robot data hub facilitates the real-time sharing of spatial, temporal, and embodiment memories, providing informational support for decision planning and optimized collaborative operations, thus forming a closed loop of perception-cognition-decision-action.
One Brain, Multiple Robots: From Single-Agent to Swarm Intelligence
As a developer who has wrestled with the fragmentation of robotic stacks, I see immediate value in BAAI’s new approach. They are tackling the “swarm” problem not by building one super-robot, but by creating an operating system that lets different robots talk to each other seamlessly. This is the kind of infrastructure work that usually gets ignored until it breaks production.
BAAI has unveiled RoboOS, a cross-embodiment framework built on a “brain-cerebellum” hierarchical architecture. It’s designed to move us from single-machine intelligence to actual swarm intelligence through modular design and intelligent task management.
Under the hood, this architecture integrates the complex perception of “RoboBrain” with the high-efficiency execution of a cerebellum skill library. The goal is stable operation in long-cycle, highly dynamic tasks. It aims for true “plug-and-play” integration between brain models like LLMs/VLMs and cerebellum skills such as grasping or navigation.
Currently, RoboOS supports a diverse hardware lineup:
- Unitree’s dual-arm robots
- Realman’s single/dual-arm robots
- Zhiyuan humanoid robots
- Unitree G1 humanoid robots
Personally, cross-embodiment support is the only way to avoid vendor lock-in for large-scale deployments. I think the cerebellum separation allows us to swap execution modules without retraining the whole brain. As a builder, real-world robotics fails when hardware changes; this abstraction layer looks like a necessary fix.
The framework shares a unified memory system—covering spatial, time, and embodiment memories—to achieve state synchronization and intelligent collaboration among multiple robots. This breaks through traditional “information silo” limitations, enabling cross-embodiment collaborative control.
RoboOS also dynamically manages multi-robot task queues, supporting priority preemption and resource optimization allocation. This ensures real-time response in complex scenarios, achieving high-concurrency task scheduling. Furthermore, it dynamically adjusts strategies based on execution feedback and environmental changes, continuously optimizing task planning to enhance robustness and achieve real-time closed-loop optimization.
To demonstrate this, BAAI ran a “deliver apples and fruit knives” scenario involving three distinct robots:
- A Realman single-arm robot (transport)
- A Unitree humanoid G1 (selecting fruits)
- A Unitree dual-arm robot (selecting fruit knives)
The workflow was intricate. The Realman robot called “navigation skills” to move to the dining table. The Unitree G1 used “visual grasping skills” to select specified objects. Then, the Realman robot called “grasping skills” to lift the fruit basket and navigate to the Unitree dual-arm robot at the table. Finally, the dual-arm robot retrieved a fruit knife and placed it in the center of the basket before the Realman robot navigated to an office desk location based on “spatial memory,” delivered the items, and returned to standby.
When receiving the instruction “pick the fruit closest to the cup and deliver a fruit knife”, RoboBrain decomposed the task via its embodied brain module and distributed sub-tasks to the three cross-embodiment robots. The brain perceived the environment through “spatial memory,” identified locations, and broke down the task into: “Unitree G1 selects apple → Realman transports fruit basket → Unitree dual-arm robot grasps fruit knife → Realman returns.”
During execution, RoboOS provides edge-cloud collaboration capabilities, decomposing tasks into skill granularities. This enables cloud-based distribution of plans via RoboBrain and edge-side execution of skills with real-time feedback. The embodied brain identified specific affordances—such as the “location of the fruit closest to the cup” and “affordance for grasping the fruit basket.” These insights are delivered via RoboOS to guide each robot embodiment in completing their tasks.
Personally, edge-cloud splitting is critical; you can’t run high-latency LLM inference on every joint movement. I think if the memory system isn’t shared, swarm coordination is just a distributed monolith with more failure points.
Unified Ecosystem for Cross-Embodiment AI
I’ve watched the embodied AI space struggle with fragmentation. Robots are great at specific tasks but terrible at talking to each other or adapting when hardware changes. BAAI’s new RoboOS framework claims to solve this by treating the robot body and the cloud brain as a single, cohesive unit rather than siloed components.
”Plug-and-Play” Rapid Lightweight Generalized Deployment: Building a Unified Ecosystem
RoboOS is positioned as a cross-embodiment cerebellum-cerebrum collaboration framework for multi-robot systems. It targets the specific pain points of unifying access for heterogeneous embodiments, low task scheduling efficiency, and the lack of dynamic error feedback mechanisms. The architecture relies on “cerebellum-cerebrum synergy.”
On the cloud side, RoboBrain handles unified task understanding, planning decision-making, and context awareness. On the robot itself, lightweight cerebellum execution modules manage perception, cognition, decision, and action in a closed loop. This setup allows the system to dynamically perceive embodiment differences, flexibly adapt operation instructions, and automatically repair abnormal behaviors. The result is enhanced robustness and generalization in complex scenarios.
As a builder, heterogeneous robot access remains a massive integration headache for most teams. Personally, a unified brain model could finally simplify cross-robot task delegation. I think automatic error repair sounds promising but needs real-world stress testing.
RoboOS natively supports flexible access for heterogeneous robot embodiments using a Profile template mechanism. This allows for rapid capability modeling and adaptation without deep custom coding. The embodiment’s cerebellum module can call various skill interfaces, including open-source libraries and self-developed low-level controllers. This creates an operational system that supports modular reuse and plug-and-play functionality, significantly lowering development barriers and integration costs.
On the cloud side, RoboOS provides comprehensive model adaptation and API access capabilities. It is compatible with self-developed multimodal VLMs (Vision-Language Models), acting as a pluggable brain decision engine. This architecture supports multi-robot collaboration for complex tasks in service robotics, industrial automation, smart logistics, and intelligent manufacturing.
Leveraging RoboOS’s edge-cloud integrated collaborative capabilities and dynamic scheduling mechanisms, the system boasts high scalability and portability. It aims to lay a general operating-system-level foundation for the large-scale deployment of future embodied AI ecosystems.

RoboOS is built upon FlagScale, BAAI’s parallel training and inference framework. It natively supports edge-cloud collaboration for multi-robot systems, creating a unified foundation for embodied AI. The design fully accounts for “multi-robot-multi-modal-multi-task” scenarios, offering extremely high scalability and low-latency response capabilities.
In edge-side deployment, robots automatically establish bidirectional communication links with the cloud-deployed RoboBrain upon registration. Through an efficient publish-subscribe mechanism, it achieves real-time task scheduling and state feedback. Instruction response latency is kept under 10ms, meeting the closed-loop control requirements for complex dynamic tasks.
RoboOS also addresses data management challenges. Facing massive perception and behavioral data generated during long-term operation, the system provides a memory-optimized data access engine. This supports random memory access to TB-level historical data, enabling task reproduction, anomaly tracing, and cross-task knowledge transfer. Combined with RoboBrain’s task reasoning modules, this historical data facilitates collaborative knowledge sharing among multiple robots, driving stronger intelligent evolution and autonomous learning capabilities.
Additionally, FlagScale serves as the underlying support framework. It supports parallel inference across multiple devices and multi-task collaborative scheduling for large models. It can seamlessly integrate subsystems such as vision-language models, trajectory generation modules, and perception recognition systems, fully unleashing the systemic potential of embodied large models.
Currently, BAAI is collaborating with universities like Peking University, Tsinghua University, and the Chinese Academy of Sciences, along with industry partners including Galbot, Leju, Accelerate Evolution, and Unitree. They are actively building an embod
Toward Swarm Intelligence: BAAI Unveils First Cross-Embodiment Brain-Cerebellum Collaboration Framework and Open-Source Embodied AI Brain
The fragmented nature of embodied AI has long been a bottleneck for developers trying to build systems that work across different hardware. We are used to seeing models optimized for specific robots, forcing us to rewrite integration layers every time we switch platforms. BAAI’s latest release aims to solve this by decoupling the “brain” from the specific body it controls.
BAAI has unveiled RoboOS, a cross-embodiment cerebellum-cerebrum collaboration framework, alongside RoboBrain, an open-source embodied AI brain. This isn’t just another model drop; it is an infrastructure play designed to organically integrate and widely connect differently configured embodiments with diverse models. The goal is clear: accelerate cross-embodiment collaboration and enable large-scale applications in embodied AI that were previously siloed by hardware constraints.
As a builder, this framework could finally stop us from rewriting integration code for every new robot chassis we touch. Personally, decoupling the brain from the cerebellum logic might simplify how we handle real-time motor control. I think open-sourcing the core brain model is a strong signal that BAAI wants to set the standard, not just sell it.
The release represents a shift toward an ecosystem where openness, collaboration, and sharing are treated as inevitable paths to prosperity. BAAI explicitly states its willingness to join hands with more industry partners to co-draw the blueprint for this embodied AI ecosystem. For developers, this means we might soon see a shared language between different robotic hardware and AI models, reducing the friction of deployment.
To support this vision, BAAI is also releasing high-quality heterogeneous datasets specifically designed for robotic manipulation tasks. This data layer is critical because models are only as good as the scenarios they’ve been trained on. By providing both the framework (RoboOS) and the foundational intelligence (RoboBrain), along with the training data, BAAI is attempting to lower the barrier to entry for building robust embodied agents.
Open Source Links
Embodied Multimodal Brain Model RoboBrain
Github: https://github.com/FlagOpen/RoboBrain
High-quality heterogeneous dataset ShareRobot designed for robotic manipulation tasks
GitHub: https://github.com/FlagOpen/ShareRobot
Gitee: https://gitee.com/flagopen/share-robot
Huggingface: https://huggingface.co/datasets/BAAI/ShareRobot
Comments
Sign in to join the discussion and leave a comment.
Sign in with Google