Automating the process of dividing a map image into sections : Using Tesseract OCR and pixel traversing

University essay from Linköpings universitet/Institutionen för datavetenskap

Abstract: This paper presents an algorithm with the purpose of automatically dividing a simple floor plan into sections. Sections include names, size and location on the image, all of which will be automatically extracted by the algorithm as a step of converting a simple image into an interactive map. The labels for each section utilizes tesseract-OCR wrapper tesseractJS to extract text and label location. In regards to section borders pixel traversing is employed coupled with CIE76 for color comparison which results in the discovery of size and location of the section. Performance of the algorithm was measured on three different maps using metrics such as correctness, quality, completeness, jaccard index and name accuracy. The metrics showed the potential of such an algorithm in terms of automating the task of sectioning an image. With results ranging between lowest percentage of 48% and highest of 100% on three different maps looking at correctness, quality, completeness, average jaccard index and average name accuracy per map.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)