Estimating Position & Velocity in 3D Space From Monocular Video Sequences Using a Deep Neural Network
Arturo Marban, Vignesh Srinivasan, Wojciech Samek, Josep Fernandez, Alicia Casals; Proceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 1460-1469
Abstract
This work describes a regression model based on Convolutional Neural Networks (CNN) and Long-Short Term Memory (LSTM) networks for tracking objects from monocular video sequences. The target application being pursued is Vision-Based Sensor Substitution (VBSS). In particular, the tool-tip position and velocity in 3D space of a pair of surgical robotic instruments (SRI) are estimated for three surgical tasks, namely suturing, needle-passing and knot-tying. The CNN extracts features from individual video frames and the LSTM network processes these features over time and continuously outputs a 12-dimensional vector with the estimated position and velocity values. A series of analyses and experiments are carried out in the regression model to reveal the benefits and drawbacks of different design choices...
Related Material
[pdf]
[supp][
bibtex]
@InProceedings{Marban_2017_ICCV,
author = {Marban, Arturo and Srinivasan, Vignesh and Samek, Wojciech and Fernandez, Josep and Casals, Alicia},
title = {Estimating Position & Velocity in 3D Space From Monocular Video Sequences Using a Deep Neural Network},
booktitle = {Proceedings of the IEEE International Conference on Computer Vision (ICCV) Workshops},
month = {Oct},
year = {2017}
}