University College London
In this paper we introduce a new dense SLAM system that takes a live stream of RGB-D images as input and segments the scene into different objects, using either motion or semantic cues, while simultaneously tracking and reconstructing their 3D shape in real time.
Crucially, we use a multiple model fitting approach where each object can move independently from the background and still be effectively tracked and its shape fused over time using only the information from pixels associated with that object label. Previous attempts to deal with dynamic scenes have typically considered moving regions as outliers that are of no interest to the robot, and consequently do not model their shape or track their motion over time. In contrast, we enable the robot to maintain 3D models for each of the segmented objects and to improve them over time through fusion. As a result, our system has the benefit to enable a robot to maintain a scene description at the object level which has the potential to allow interactions with its working environment; even in the case of dynamic scenes.
This work has been supported by the SeconHands project, funded from the EU Horizon 2020 Research and Innovation programme under grant agreement No 643950.