Legged robots are supposed to traverse complicated environments, which makes it challenging to design a model-based controller due to their functional complexity. Currently, using deep reinforcement learning to improve the adaptability of robots in complex scenarios has been a major research trend. In this paper, we propose Adaptive Latent Aggregation for Reliable Mimicry (ALARM), a reinforcement learning framework that enables safe and robust locomotion in legged robots using only proprioception. This work features a one-step teacher-student training paradigm by constructing an adaptive aggregation strategy, which integrates the merits of imitation learning and reinforcement learning effectively. The framework integrates normalized penalized proximal policy optimization, which penalizes constraint-violating behaviors while optimizing locomotion policy. Our method facilitates efficient sim-to-real transfer, offering a promising approach for real-world legged robot applications.
Overview of the training method. We first normalize state information and last action. Then, the teacher encoder uses the processed privileged information to output the reference latent variables, while the student encoder imitates the implicit encoding and estimates the explicit states. By adopting the adaptive aggregation strategy, the RL network can leverage the privileged information to accelerate convergence during the early stages of training, ultimately relying solely on proprioceptive feedback to navigate challenging terrains. During this process, the decoder provides contextual guidance for the latent representation. NP3O effectively constrains the robot's behavior. The overall optimization objective consists of the supervised learning loss and the reinforcement learning loss. (Dashed arrows indicate that gradient backpropagation is not performed.)
In this paper, we propose ALARM, an end-to-end locomotion control framework for legged robots based on imitation learning and reinforcement learning. Through a single training process, it achieves a seamless transition from teacher to student. By introducing NP3O optimization, it effectively and concisely constrains robot behavior, enabling both safe and robust locomotion on complex terrains using only proprioception. Additionally, ALARM facilitates direct transfer from simulation to the real world, featuring low computational resource consumption and fast inference speed. Through comparative and ablation experiments, we demonstrate that our method surpasses current state-of-the-art reinforcement learning controllers in terms of training efficiency and locomotion performance. Our policy has been deployed on multiple quadrupedal robots, showcasing strong adaptability and resilience in complex environments.
@article{zhou2025alarm,
title={ALARM: Safe Reinforcement Learning with Reliable Mimicry for Robust Legged Locomotion},
author={Zhou, Qiqi and Ding, Hui and Chen, Teng and Man, Luxin and Jiang, Han and Zhang, Guoteng and Li, Bin and Rong, Xuewen and Li, Yibin},
year={2025},
journal={IEEE Robotics and Automation Letters},
volume={10},
number={7},
pages={6768-6775},
doi={10.1109/LRA.2025.3572427}
}