Automated Extraction and Retrieval of Metadata by Data Mining : a Case Study of Mining Engine for National Land Survey Sweden

University essay from Institutionen för teknik och byggd miljö

Abstract: Metadata is the important information describing geographical data resources and their key elements. It is used to guarantee the availability and accessibility of the data. ISO 19115 is a metadata standard for geographical information, making the geographical metadata shareable, retrievable, and understandable at the global level. In order to cope with the massive, high-dimensional and high-diversity nature of geographical data, data mining is an applicable method to discover the metadata. This thesis develops and evaluates an automated mining method for extracting metadata from the data environment on the Local Area Network at the National Land Survey of Sweden (NLS). These metadata are prepared and provided across Europe according to the metadata implementing rules for the Infrastructure for Spatial Information in Europe (INSPIRE). The metadata elements are defined according to the numerical formats of four different data entities: document data, time-series data, webpage data, and spatial data. For evaluating the method for further improvement, a few attributes and corresponding metadata of geographical data files are extracted automatically as metadata record in testing, and arranged in database. Based on the extracted metadata schema, a retrieving functionality is used to find the file containing the keyword of metadata user input. In general, the average success rate of metadata extraction and retrieval is 90.0%. The mining engine is developed in C# programming language on top of the database using SQL Server 2005. Lucene.net is also integrated with Visual Studio 2005 to build an indexing framework for extracting and accessing metadata in database.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)