Georeferencing for Societal Safety
Abstract
This thesis describes a project which aims to develop a geoparser for Norwegian languagetext. A geoparser is a tool that reads a piece of text, extracts any potential location mentions,and then resolves these location mentions to their real-world toponyms. At the time thisthesis was written, there were no known geoparsers available that specialize exclusively onNorwegian text. The solution produced here is therefore unique in this sense.The task of geoparsing is non-trivial, as there are often many geographical locations thatshare the same name. The geoparser must therefore be able to disambiguate a location men-tion, using whatever clues it has available to it. In this project, the geoparser will try to infergeographical regions of relevance, and also try to identify potential geographical hierarchiesbetween the different location mentions in the text. Furthermore, it is also based on commongeoparsing heuristics, such as population size being a strong indicator of toponym impor-tance. To find potential candidates for a location mention, it uses GeoNames, a geographicalgazetteer containing entries for more than 11 million toponyms from all over the world. It alsouses Stedsnavn, a Norwegian dataset containing over 1 million Norwegian toponym entries.Basic testing is done to check the viability of the solution, but evaluating the geoparser ingeneral is tough, as there are no proper datasets with which to test it.