ALARM :
Safe Reinforcement Learning with Reliable Mimicry for Robust Legged Locomotion

Qiqi Zhou1             Hui Ding1             Teng Chen1*             Luxin Man1             Han Jiang1            
Guoteng Zhang1             Bin Li2             Xuewen Rong1             Yibin Li1

1 Shandong University 2 Qilu University of Technology * Corresponding author

Abstract

Legged robots are supposed to traverse complicated environments, which makes it challenging to design a model-based controller due to their functional complexity. Currently, using deep reinforcement learning to improve the adaptability of robots in complex scenarios has been a major research trend. In this paper, we propose Adaptive Latent Aggregation for Reliable Mimicry (ALARM), a reinforcement learning framework that enables safe and robust locomotion in legged robots using only proprioception. This work features a one-step teacher-student training paradigm by constructing an adaptive aggregation strategy, which integrates the merits of imitation learning and reinforcement learning effectively. The framework integrates normalized penalized proximal policy optimization, which penalizes constraint-violating behaviors while optimizing locomotion policy. Our method facilitates efficient sim-to-real transfer, offering a promising approach for real-world legged robot applications.

Method

Framework

Overview of the training method. We first normalize state information and last action. Then, the teacher encoder uses the processed privileged information to output the reference latent variables, while the student encoder imitates the implicit encoding and estimates the explicit states. By adopting the adaptive aggregation strategy, the RL network can leverage the privileged information to accelerate convergence during the early stages of training, ultimately relying solely on proprioceptive feedback to navigate challenging terrains. During this process, the decoder provides contextual guidance for the latent representation. NP3O effectively constrains the robot's behavior. The overall optimization objective consists of the supervised learning loss and the reinforcement learning loss. (Dashed arrows indicate that gradient backpropagation is not performed.)

Indoor Experiments

Anti-disturbance


Indoor Parkour


Trolley


Outdoor Experiments

Gravel Road

Grassland


Running on Grass

Grass with Obstacle


Sloping Grassland

Asphalt Slope


Broken Porcelain Pieces

Gravel


Steep Slope


Stairs


Outdoor High Platform


60cm High Platform


Conclusion

In this paper, we propose ALARM, an end-to-end locomotion control framework for legged robots based on imitation learning and reinforcement learning. Through a single training process, it achieves a seamless transition from teacher to student. By introducing NP3O optimization, it effectively and concisely constrains robot behavior, enabling both safe and robust locomotion on complex terrains using only proprioception. Additionally, ALARM facilitates direct transfer from simulation to the real world, featuring low computational resource consumption and fast inference speed. Through comparative and ablation experiments, we demonstrate that our method surpasses current state-of-the-art reinforcement learning controllers in terms of training efficiency and locomotion performance. Our policy has been deployed on multiple quadrupedal robots, showcasing strong adaptability and resilience in complex environments.

BibTeX

              
          @article{zhou2025alarm,
              title={ALARM: Safe Reinforcement Learning with Reliable Mimicry for Robust Legged Locomotion}, 
              author={Zhou, Qiqi and Ding, Hui and Chen, Teng and Man, Luxin and Jiang, Han and Zhang, Guoteng and Li, Bin and Rong, Xuewen and Li, Yibin},
              year={2025},
              journal={IEEE Robotics and Automation Letters}, 
              volume={10},
              number={7},
              pages={6768-6775},
              doi={10.1109/LRA.2025.3572427}
          }