RoboPearls, an editable video simulation framework for robotic manipulation. RoboPearls reconstructs photo-realistic scenes with semantic features from demonstration videos. Then, with various simulation operators, RoboPearls leverages multiple LLM agents to process user commands into specific editing functions. Furthermore, RoboPearls utilizes a VLM to analyze learning issues and generate corresponding simulation demands to enhance robotic performance.
The development of generalist robot manipulation policies has seen significant progress, driven by large-scale demonstration data across diverse environments. However, the high cost and inefficiency of collecting real-world demonstrations hinder the scalability of data acquisition. While existing simulation platforms enable controlled environments for robotic learning, the challenge of bridging the sim-to-real gap remains. To address these challenges, we propose RoboPearls, an editable video simulation framework for robotic manipulation. Built on 3D Gaussian Splatting (3DGS), RoboPearls enables the construction of photo-realistic, view-consistent simulations from demonstration videos, and supports a wide range of simulation operators, including various object manipulations, powered by advanced modules like Incremental Semantic Distillation (ISD) and 3D regularized NNFM Loss (3D-NNFM). Moreover, by incorporating large language models (LLMs), RoboPearls automates the simulation production process in a user-friendly manner through flexible command interpretation and execution. Furthermore, RoboPearls employs a vision-language model (VLM) to analyze robotic learning issues to close the simulation loop for performance enhancement. To demonstrate the effectiveness of RoboPearls, we conduct extensive experiments on multiple datasets and benchmarks, including RLBench, COLOSSEUM, Ego4D, and Open X-Embodiment, which demonstrate our satisfactory simulation performance.
The RoboPearls Framework. (a) RoboPearls extends the Gaussian representation to reconstruct dynamic scenes with semantic features from demonstration videos. (b) RoboPearls includes and refines various simulation operators. (c) RoboPearls leverages multiple LLM agents to automate and streamline the simulation production process following user natural language commands.