JSIS3D: Joint Semantic-Instance Segmentation of 3D Point Clouds with Multi-Task Pointwise Networks and Multi-Value Conditional Random Fields

Quang-Hieu Pham1     Duc Thanh Nguyen2     Binh-Son Hua3     Gemma Roig1     Sai-Kit Yeung4    

1Singapore University of Technology and Design     2Deadkin University    
3The University of Tokyo     4Hong Kong University of Science and Technology

Conference on Computer Vision and Pattern Recognition (CVPR), 2019. Oral

Our proposed MT-PNet architecture for joint semantic-instance segmentation. The point cloud first go through a feed-forward neural network to compute a 128-dimension feature vector for each point. Here it splits into to branches: one for instance embedding and the other for semantic segmentation.


Deep learning techniques have become the to-go models for most vision-related tasks on 2D images. However, their power has not been fully realised on several tasks in 3D space, e.g., 3D scene understanding. In this work, we jointly address the problems of semantic and instance segmentation of 3D point clouds. Specifically, we develop a multi-task pointwise network that simultaneously performs two tasks: predicting the semantic classes of 3D points and embedding the points into high-dimensional vectors so that points of the same object instance are represented by similar embeddings. We then propose a multi-value conditional random field model to incorporate the semantic and instance labels and formulate the problem of semantic and instance segmentation as jointly optimising labels in the field model. The proposed method is thoroughly evaluated and compared with existing methods on different indoor scene datasets including S3DIS and SceneNN. Experimental results showed the robustness of the proposed joint semantic-instance segmentation scheme over its single components. Our method also achieved state-of-the-art performance on semantic segmentation.


  title = {{JSIS3D}: {J}oint semantic-instance segmentation of 3{D} point clouds with multi-task pointwise networks and multi-value conditional random fields},
  author = {Pham, Quang-Hieu and Nguyen, Duc Thanh and Hua, Binh-Son and Roig, Gemma and Yeung, Sai-Kit},
  booktitle = {IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  year = 2019


This research project is partially supported by an internal grant from HKUST (R9429) and the MOE SUTD SRG grant (SRG ISTD 2017 131).