Multilingual Text Robots for Abstract Wikipedia – Using Grammatical Framework to generate multilingual articles on Swedish localities

University essay from Göteborgs universitet/Institutionen för data- och informationsteknik

Abstract: The vast amount of Wikipedia articles and languages has resulted in a high cost of Wikipedia, i.e. the required time and dedication for making every article available in every language. This paper describes the development of a multilingual text robot that will use data from the database Wikidata to generate articles on Swedish localities in various languages and how such a text robot can be beneficial for reducing the cost of Wikipedia. The text robot has been developed using the functional programming language Grammatical Framework, the query language SPARQL, and Python. The topic of Swedish localities was selected due to the large number of localities in Sweden, the sparseness of currently existing Wikipedia articles on the topic (excluding Swedish articles), and the fact that the same structure, with only slight variation, can be used to describe all of the localities. The results were articles containing approximately five sentences describing the locality, a bullet list of events occurring in the locality, and corresponding media, such as a picture of the locality or a weather forecast for the upcoming week. Based on the results, one can deduce that the use of a text robot might be a good approach for reducing the cost of Wikipedia since it produces over a thousand articles in several different languages. Another notable fact is that all project group members are bachelor’s students with no previous knowledge of Grammatical Framework or linguistics, which shows that it is possible to develop a text robot with limited previous knowledge.

  AT THIS PAGE YOU CAN DOWNLOAD THE WHOLE ESSAY. (follow the link to the next page)