3D Active Metric-Semantic SLAM

Abstract

We address the problem of exploration and metric-semantic mapping of multi-floor GPS-denied indoor environments using Size Weight and Power (SWaP) constrained aerial robots. Most previous work in exploration assumes that robot localization is solved. However, neglecting the state uncertainty of the agent can ultimately lead to cascading errors both in the resulting map and in the state of the agent itself. Furthermore, actions that reduce localization errors may be at direct odds with the exploration task. We develop a framework that balances the efficiency of exploration with actions that reduce the state uncertainty of the agent. In particular, our algorithmic approach for active metric-semantic SLAM is built upon sparse information abstracted from raw problem data, to make it suitable for SWaPconstrained robots. Furthermore, we integrate this framework within a fully autonomous aerial robotic system that achieves autonomous exploration in cluttered, 3D environments. From extensive real-world experiments, we showed that by including Semantic Loop Closure (SLC), we can reduce the robot pose estimation errors by over 90% in translation and approximately 75% in yaw, and the uncertainties in pose estimates and semantic maps by over 70% and 65%, respectively. Although discussed in the context of indoor multi-floor exploration, our system can be used for various other applications, such as infrastructure inspection and precision agriculture where reliable GPS data may not be available.

Experiments

3D Exploration

Semantic Loop Closure

Full Video

System Overview

Our system takes in data from an RGB-D camera and the pose estimates from the VOXL VIO module. Instance segmentation is performed on RGB images with a pre-trained deep neural network (YOLO-V8) model. The metric-semantic SLAM module then takes in these inputs and estimates (1) a global voxel map for sampling exploration viewpoints, (2) a local voxel map for trajectory planning, (3) optimized robot pose estimates, and (4) a semantic map comprising object landmarks to generate SLC candidates. Next, a COP-based exploration planning algorithm takes in the exploration viewpoints and plans a long-horizon exploration path (a) consisting of a sequence of viewpoints, which seeks to maximize the Information Gain (IG) given the travel budget. This exploration path is then refined by inserting SLC viewpoints so that the robot can trade off exploration with uncertainty reduction. The refined path (b) is used to generate goals for the low-level trajectory planning algorithm, which constantly replans dynamically feasible 3D trajectories (c) in the local voxel map.

BibTeX

@ARTICLE{10423804,
  author={Tao, Yuezhan and Liu, Xu and Spasojevic, Igor and Agarwal, Saurav and Kumar, Vijay},
  journal={IEEE Robotics and Automation Letters}, 
  title={3D Active Metric-Semantic SLAM}, 
  year={2024},
  volume={9},
  number={3},
  pages={2989-2996},
  keywords={Semantics;Simultaneous localization and mapping;Three-dimensional displays;Uncertainty;Planning;Autonomous aerial vehicles;Real-time systems;Aerial systems: Perception and autonomy;mapping;perception-action coupling},
  doi={10.1109/LRA.2024.3363542}}
}