Segmentation of Image Sequence into Scene-Coherent Parts
Abstract: The Narrative Clip is a small wearable camera that takes an image every 30 seconds. By wearing the clip a whole day a user captures a long image sequence of the day's events. In this thesis we will segment such a sequence into the individual events automatically. Multiple sequences are segmented by humans in order to nd a groundtruth for each sequence. The groundtruth will be used to determine the performance of the algorithm and also how well individual humans are able to segment a sequence. The method presented here takes a sequence of images and tries to nd the location where the mean of the descriptors changes. The images are described using various image descriptors that capture colors, lines, textures and similar low level features. We also introduce an indoor/outdoor classication method that combines a SVM and a HMM. The classication method is combined with the segmentation for each descriptor in order to create a combined segmentation. The indoor classication method achieves an accuracy of 97% which is to be considered very good results. The best human segmentation has an F1-score of 0.82 while the best automatic segmentation method's F1-score is 0.43. The conclusion is that the current system is not suitable for any practical usage.
AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)