This page describes work done on using inertial sensors combined with cameras to enhance the performance of computer vision algorithms. This work was done while at the University of Bristol as part of the Bioinspired Visual Guided Locomotion Project.
I am now a research assistant working on a research prohject looking into bioinspired models for vision guided locomotion. Papers so far:
Fusing Inertial Data with Vision for Enhanced Image Understanding |
|
Using Inertial Data to Enhance Image Segmentation -
Knowing camera orientation can improve segmentation of outdoor scenes |
|
This video shows an example of the orientation-enhanced surface segmenter, running in real time in a separate thread (left shows the original input video, right shows the most recently processed frame). This is being run offline on a video sequence recorded from the view of an oculus rift.
These two videos show side-by side what happens using only visual features on points (left) and when adding inertial information and line regions (right). Note that every frame is being processed, to show what happens when processing time isn't an issue (i.e. not run in real time).
Finally, these videos show the new version (as described in the CCIS paper) which uses a fully connected dense CRF instead of a MRF for segmentation, to give much higher fidelity results. Although it takes longer to process each frame, it is still just about fast enough for real-time use when the processing is run in its own thread.
The dataset for the VISAPP paper is available here: Download here (includes readme file)
This comprises six sets of images, from different video sequences, gathered with a head mounted camera. Each image is equipped with its orientation from the Oculus Rift inertial sensor.
The original video sequences from which these data are taken are not currently available online, but I could be persuaded to share the if you are interested. They consist of around 90 minutes of footage taken with an IDS uEye USB 2.0 camera strapped to the front of an Oculus Rift while the live camera view is all that can be seen by the participant (this was fun). Orientation data (pitch, roll, yaw) are recorded v=for every frame. This would be an ideal dataset for performing similar work or anything else combining vision with orientation.
To main research page
Home: www.osianh.com
The Visual Information Laboratory www.bris.ac.uk/vi-lab