Multi-Camera 3D Position Estimation Using Conditional Random Field

Matsuda, Shusuke; Techasarntikul, Nattaon; Shimonishi, Hideyuki

Shusuke Matsuda, Nattaon Techasarntikul, Hideyuki Shimonishi; Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2023, pp. 1908-1916

Abstract

In order to realize effective and safe human-robot collaboration where many humans and robots complement each other in close proximity, digital twin of the space would play a crucial role to monitor the behaviors of many robots and humans simultaneously and precisely in real time. Constructing such a digital twin requires estimating the precise 3D positions of instances in space, but Bluetooth sensors lack accuracy, and LiDARs are costly when covering wide areas. Therefore, we propose the use of multiple cameras to capture overlapping videos of the space and reconstruct the 3D positions of instances using geometrical methods. We propose a multimodal approach that utilizes not only vision features, but also position features, to detect the same objects in multiple cameras and use Conditional Random Field (CRF) to infer the identicality of objects detected in multiple cameras. The 3D positions of an instance taken from multiple 2D cameras are then geographically estimated. In the evaluation, we demonstrate the effects of CRF and multimodal approach, and achieve comparative performance with the state-of-the-art method.

Related Material

[pdf]

[bibtex]

@InProceedings{Matsuda_2023_ICCV, author = {Matsuda, Shusuke and Techasarntikul, Nattaon and Shimonishi, Hideyuki}, title = {Multi-Camera 3D Position Estimation Using Conditional Random Field}, booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops}, month = {October}, year = {2023}, pages = {1908-1916} }