SoNIC: Safe Social Navigation with Adaptive Conformal Inference and Constrained Reinforcement Learning
- Jianpeng Yao
- Xiaopan Zhang
- Yu Xia
- Zejing Wang
- Amit K. Roy-Chowdhury
- Jiachen Li
A novel algorithm for social navigation that combines adaptive conformal inference and constrained reinforcement learning with spatial relaxation, achieving state-of-the-art results in generating safe robot trajectories that adhere to social norms.
SoNIC Performance in Different Environments
Abstract
Reinforcement Learning (RL) has enabled social robots to generate trajectories without human-designed rules or interventions, which makes it more effective than hardcoded systems for generalizing to complex real-world scenarios. However, social navigation is a safety-critical task that requires robots to avoid collisions with pedestrians while previous RLbased solutions fall short in safety performance in complex environments. To enhance the safety of RL policies, to the best of our knowledge, we propose the first algorithm, SoNIC, that integrates adaptive conformal inference (ACI) with constrained reinforcement learning (CRL) to learn safe policies for social navigation. More specifically, our method augments RL observations with ACI-generated nonconformity scores and provides explicit guidance for agents to leverage the uncertainty metrics to avoid safety-critical areas by incorporating safety constraints with spatial relaxation. Our method outperforms state-of-the-art baselines in terms of both safety and adherence to social norms by a large margin and demonstrates much stronger robustness to out-of-distribution scenarios.
Key Ideas and Contributions
2) Spatial Relaxtion: We propose a technique to increase the applicability of CRL in the context of social navigation by introducing spatial relaxation. Compared to previous methods, spatial relaxation provides richer cost feedback and facilitates convergence without sacrificing safety.
3) Performance Boost: Our method achieves state-of-the-art (SOTA) performance in social navigation in both safety and adherence to social norms, outperforming baselines by a large margin. Our method also shows much stronger robustness to out-of-distribution (OOD) scenarios.
Test Results in In-distribution and OOD Settings