Gutenberg Open Science: Robust Monocular Pose Estimation of Rigid 3D Objects in Real-Time

Please use this identifier to cite or link to this item: http://doi.org/10.25358/openscience-2815

Authors:	Tjaden, Henning
Title:	Robust Monocular Pose Estimation of Rigid 3D Objects in Real-Time
Online publication date:	18-Jan-2019
Year of first publication:	2019
Language:	english
Abstract:	Being able to measure the spatial motion of arbitrary objects with high accuracy and low latency is vital for numerous higher level tasks in many fields of application. These include, but are not limited to: robotic perception, medical navigation and mixed reality systems. Such measurements are typically obtained by consecutively estimating the object‘s pose, i.e. its location and orientation in three-dimensional space, relative to a known frame of reference. The most successful are approaches based on optical sensors, such as digital cameras. But despite the large amount of literature and actively conducted research on this issue, fast, robust and accurate 3D object pose estimation still remains a key challenge in computer vision. This dissertation presents novel approaches to visual 3D object pose estimation from 2D images. The particular feature of the proposed solutions is that they operate in real-time while only requiring a single (monocular) camera. The main parts of this work describe an innovative active infrared LED marker-based system as well as a novel algorithm for passive markerless pose estimation, both developed within the course of this thesis. For the marker-based approach, two original, nearly co-planar LED patterns are proposed. These enable high-speed, single-image pose estimation of multiple markers as well as robustly avoiding common pose ambiguities. The proposed markerless method presents a novel combination of region-based and direct photometric pose estimation. It is enabled by a new numerical pose optimization strategy derived for the region-based part as well as an innovative statistical object segmentation model. The overall approach thereby significantly improved the robustness towards challenging conditions, such as dynamic lighting, cluttered backgrounds, different object appearances, occlusions and fast and complex motion, compared to the state of the art. It is furthermore the first capable of estimating the poses of multiple arbitrarily textured objects in real-time on a commodity laptop. In addition to this, a new complex dataset dedicated to the task of monocular object pose tracking has been created and made publicly available. Both proposed pose estimation solutions are extensively evaluated in numerous experiments, including the proposed as well as another popular public dataset. It is also shown that these solutions have been successfully applied in various practical scenarios, where they have enabled a variety of new problem solving opportunities. Die präzise und unmittelbare Vermessung der räumlichen Bewegung von Objekten (Trajektorien), ist eine essentielle Grundlage für die Lösung zahlreicher abstrakterer Probleme in diversen Anwendungsbereichen. Dazu gehören unter anderem: Robotik, medizinische Navigation und Mixed-Reality-Systeme. Derartige Trajektorien werden typischerweise durch fortlaufende Bestimmung der Pose des Objekts, d.h. seiner dreidimensionalen Position und Orientierung, relativ zu einem Bezugssystem berechnet. Am erfolgreichsten sind dabei optische Ansätze, die Sensoren, wie z.B. Digitalkameras, nutzen. Die schnelle, stabile und genaue Posenbestimmung von 3D-Objekten, bleibt jedoch trotz umfassender Literatur und aktiver Forschung zu diesem Thema, eine der größten Herausforderungen des maschinellen Sehens. Diese Dissertation präsentiert neuartige Verfahren zur Posenbestimmung von 3D-Objekten aus 2D-Bildern. Das Besondere an diesen Ansätzen ist, dass sie in Echtzeit und mit nur einer einzigen (monokularen) Kamera funktionieren. Der Hauptteil beschreibt ein neues, aktives System, basierend auf Infrarot-LED-Marken, sowie einen neuartigen Algorithmus zur passiven markenlosen Posenbestimmung, die im Rahmen dieser Arbeit entwickelt wurden. Für den markenbasierten Ansatz werden zwei innovative LED-Muster vorgestellt. Diese ermöglichen Hochgeschwindigkeits-Posenbestimmung von mehreren Marken aus einem einzigen Bild sowie die zuverlässige Vermeidung von üblicherweise auftretenden Mehrdeutigkeiten. Die entwickelte markenlose Methode ist eine neuartige Kombination aus regionenbasierten und direkten, photometrischen Ansätzen. Diese wird ermöglicht durch eine neue numerische Posenoptimierung sowie ein innovatives, statistisches Segmentierungsmodell. Das Gesamtverfahren ist aktuell das stabilste gegenüber Bedingungen, wie z.B. dynamisches Licht, überladene Hintergründe, unterschiedliche Objektoberflächen, Verdeckungen und schnelle und komplexe Bewegungen. Es ist außerdem das erste Verfahren, das die Posen mehrerer beliebig texturierter Objekte auf einem Laptop in Echtzeit berechnet. Zusätzlich wurde ein neuer, komplexer Datensatz für monokulares Objektposen-Tracking erstellt und der Community zur Verfügung gestellt. Beide Verfahren werden in zahlreichen Experimenten, inklusive des eigenen und eines weiteren populären Datensatzes, evaluiert. Weiterhin wird gezeigt, dass die Methoden bereits erfolgreich in verschiedenen realen Szenarien eingesetzt wurden.
DDC:	004 Informatik 004 Data processing
Institution:	Johannes Gutenberg-Universität Mainz
Department:	FB 08 Physik, Mathematik u. Informatik
Place:	Mainz
ROR:	https://ror.org/023b0x485
DOI:	http://doi.org/10.25358/openscience-2815
URN:	urn:nbn:de:hebis:77-diss-1000025478
Version:	Original work
Publication type:	Dissertation
License:	In Copyright
Information on rights of use:	https://rightsstatements.org/vocab/InC/1.0/
Extent:	X, 212 Seiten
Appears in collections:	JGU-Publikationen

Files in This Item:

	File	Description	Size	Format
	100002547.pdf		95.42 MB	Adobe PDF	View/Open

Show full item record