Real-time Progressive 3D Semantic Segmentation
for Indoor Scenes

1Singapore University of Technology and Design
2The University of Tokyo
3Deakin University
4Hong Kong University of Science and Technology

Winter Conf. on Applications of Computer Vision (WACV), 2019

Paper  /  Arxiv  /  Video  /  Poster

Overview of our progressive indoor scene segmentation method. From continuous frames of an RGB-D sensor, our system performs on-the-fly reconstruction and semantic segmentation. All of our processing is performed on a frame-by-frame basis in an online fashion, thus useful for real-time applications.

Comparison between our method and other systems. We compare our method with other state-of-the-art real-time semantic reconstruction systems, i.e. SemanticFusion, and SemanticPaint on SceneNN and ScanNet dataset. Results show that our method outperforms others while still running at 10–15Hz.

The widespread adoption of autonomous systems such as drones and assistant robots has created a need for real-time high-quality semantic scene segmentation. In this paper, we propose an efficient yet robust technique for on-the-fly dense reconstruction and semantic segmentation of 3D indoor scenes. To guarantee (near) real-time performance, our method is built atop an efficient super-voxel clustering method and a conditional random field with higher-order constraints from structural and object cues, enabling progressive dense semantic segmentation without any precomputation. We extensively evaluate our method on different indoor scenes including kitchens, offices, and bedrooms in the SceneNN and ScanNet datasets and show that our technique consistently produces state-of-the-art segmentation results in both qualitative and quantitative experiments.