A Critical Look at the Evaluation of Knowledge Graph Question Answering

Linjordet, Trond

dc.contributor.advisor	Balong, Krisztian
dc.contributor.advisor	Engan, Kjersti
dc.contributor.advisor	Eftestøl, Trygve
dc.contributor.author	Linjordet, Trond
dc.date.accessioned	2022-11-25T11:50:20Z
dc.date.available	2022-11-25T11:50:20Z
dc.date.issued	2022-11
dc.identifier.citation	A Critical Look at the Evaluation of Knowledge Graph Question Answering by Trond Linjordet, Stavanger : University of Stavanger, 2022 (PhD thesis UiS, no. 671)	en_US
dc.identifier.isbn	978-82-8439-130-4
dc.identifier.issn	1890-1387
dc.identifier.uri	https://hdl.handle.net/11250/3034091
dc.description	PhD thesis in Information technology	en_US
dc.description.abstract	The field of information retrieval (IR) is concerned with systems that “make a given stored collection of information items available to a user population” [111]. The way in which information is made available to the user depends on the formulation of this broad concern of IR into specific tasks by which a system should address a user’s information need [85]. The specific IR task also dictates how the user may express their information need. The classic IR task is ad hoc retrieval, where the user issues a query to the system and gets in return a list of documents ranked by estimated relevance of each document to the query [85]. However, it has long been acknowledged that users are often looking for answers to questions, rather than an entire document or ranked list of documents [17, 141]. Question answering (QA) is thus another IR task; it comes in many flavors, but overall consists of taking in a user’s natural language (NL) question and returning an answer. This thesis describes work done within the scope of the QA task. The flavor of QA called knowledge graph question answering (KGQA) is taken as the primary focus, which enables QA with factual questions against structured data in the form of a knowledge graph (KG). This means the KGQA system addresses a structured representation of knowledge rather than—as in other QA flavors—an unstructured prose context. KGs have the benefit that given some identified entities or predicates, all associated properties are available and relationships can be utilized. KGQA then enables users to access structured data using only NL questions and without requiring formal query language expertise. Even so, the construction of satisfactory KGQA systems remains a challenge. Machine learning with deep neural networks (DNNs) is a far more promising approach than manually engineering retrieval models [29, 56, 130]. The current era dominated by DNNs began with seminal work on computer vision, where the deep learning paradigm demonstrated its first cases of “superhuman” performance [32, 71]. Subsequent work in other applications has also demonstrated “superhuman” performance with DNNs [58, 87]. As a result of its early position and hence longer history as a leading application of deep learning, computer vision with DNNs has been bolstered with much work on different approaches towards augmenting [120] or synthesizing [94] additional training data. The difficulty with machine learning approaches to KGQA appears to rest in large part with the limited volume, quality, and variety of available datasets for this task. Compared to labeled image data for computer vision, the problems of data collection, augmentation, and synthesis are only to a limited extent solved for QA, and especially for KGQA. There are few datasets for KGQA overall, and little previous work that has found unsupervised or semi-supervised learning approaches to address the sparsity of data. Instead, neural network approaches to KGQA rely on either fully or weakly supervised learning [29]. We are thus concerned with neural models trained in a supervised setting to perform QA tasks, especially of the KGQA flavor. Given a clear task to delegate to a computational system, it seems clear that we want the task performed as well as possible. However, what methodological elements are important to ensure good system performance within the chosen scope? How should the quality of system performance be assessed? This thesis describes work done to address these overarching questions through a number of more specific research questions. Altogether, we designate the topic of this thesis as KGQA evaluation, which we address in a broad sense, encompassing four subtopics from (1) the impact on performance due to volume of training data provided and (2) the information leakage between training and test splits due to unhygienic data partitioning, through (3) the naturalness of NL questions resulting from a common approach for generating KGQA datasets, to (4) the axiomatic analysis and development of evaluation measures for a specific flavor of the KGQA task. Each of the four subtopics is informed by previous work, but we aim in this thesis to critically examine the assumptions of previous work to uncover, verify, or address weaknesses in current practices surrounding KGQA evaluation.	en_US
dc.language.iso	eng	en_US
dc.publisher	University of Stavanger, Norway	en_US
dc.relation.ispartofseries	PhD thesis UiS;
dc.relation.ispartofseries	;671
dc.rights	Copyright the author
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.subject	information retrieval	en_US
dc.subject	informasjonsgjenfinning	en_US
dc.subject	informasjonsteknologi	en_US
dc.title	A Critical Look at the Evaluation of Knowledge Graph Question Answering	en_US
dc.type	Doctoral thesis	en_US
dc.subject.nsi	VDP::Mathematics and natural science: 400::Information and communication science: 420	en_US
dc.source.pagenumber	131	en_US

Tilhørende fil(er)

Filnavn:: PhD_Trond_Linjordet.pdf
Størrelse:: 2.028Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

PhD theses (TN-IDE) [23]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Copyright the author