Daniel Wedge
July 2007


Video sequence synchronization is necessary for any computer vision application that integrates data from multiple simultaneously recorded video sequences. With the increased availability of video cameras as either dedicated devices, or as components within digital cameras or mobile phones, a large volume of video data is available for processing by a growing range of computer vision applications that process multiple video sequences. To ensure that the output of these applications is correct, accurate video sequence synchronization is essential.

Whilst hardware synchronization methods can embed timestamps into each sequence on-the-fly and require no post-processing, they require specialized hardware and it is necessary to set up the camera network in advance. On the other hand, computer vision-based software synchronization algorithms can be used to post-process video sequences recorded by cameras that are not networked, such as common consumer hand-held video cameras or cameras embedded in mobile phones, or to synchronize historical videos for which hardware synchronization was not possible.

The current state-of-the-art software algorithms vary in their input and output requirements and camera configuration assumptions. Many algorithms operate without requiring knowledge of the geometry relating the two cameras, however, for most of these algorithms it is necessary to specify the sequences' frame rate ratio. Only a handful of algorithms exist that recover the frame rate ratio of two sequences, and of these, most require the cameras' geometry to be known. Most synchronization algorithms require frame-to-frame object motion or object trajectories throughout an entire sequence to be provided as input data from which the synchronization can be recovered. In this thesis, I present three algorithms for recovering both the frame offset and the ratio of the frame rates of pairs of video sequences. One of my algorithms does not require trajectory data as input, and only one of the three algorithms requires weak camera calibration.

Firstly, I present an algorithm that uses the motion of a single tracked object to synchronize two video sequences recorded by stationary cameras with fixed intrinsic parameters. The algorithm is unique in that it synchronizes a pair of video sequences using the trajectory of a single object tracked throughout each sequence and does not require camera calibration. A coarse-to-fine approach is used. At the coarse level, each sequence is divided into a number of sub-sequences, and corresponding sub-sequences from each sequence are determined. From this, an initial estimate of the ratio of frame rates is proposed and a voting scheme is used to provide bounds on the frame offet. The fine level of synchronization involves a search for the frame offset and frame rate ratio that minimize a measure of synchronization based on epipolar geometry. It is shown that this measure, whilst not new, is preferred for use in place of the reprojection error used by many synchronization algorithms.

Next, I describe an approach that synchronizes two video sequences where an object exhibits ballistic motions. Given the epipolar geometry relating the two cameras and the imaged ballistic trajectory of an object, the algorithm uses a novel iterative approach that exploits object motion to rapidly determine pairs of temporally corresponding frames. This algorithm accurately synchronizes videos recorded at different frame rates and takes few iterations to converge to sub-frame accuracy. Whereas the method presented by the first algorithm integrates tracking data from all frames to synchronize the sequences as a whole, this algorithm recovers the synchronization by locating pairs of temporally corresponding frames in each sequence.

Finally, I introduce an algorithm for synchronizing two video sequences recorded by uncalibrated stationary cameras. This approach is unique in that it recovers both the frame rate ratio and the frame offset of the two sequences by finding matching space-time interest points that represent events in each sequence; the algorithm does not require object tracking. RANSAC-based approaches that take a set of putatively matching interest points and recover either a homography or a fundamental matrix relating a pair of still images are well known. This algorithm extends these techniques using space-time interest points in place of spatial features, and uses nested instances of RANSAC to also recover the frame rate ratio and frame offset of a pair of video sequences.

In this thesis, it is demonstrated that each of the above algorithms can accurately recover the frame rate ratio and frame offset of a range of real video sequences. Each algorithm makes a contribution to the body of video sequence synchronization literature, and it is shown that the synchronization problem can be solved using a range of approaches.