Fusion-based information retrieval

Zhang, Shuo

Zhang, Shuo

Master thesis

Åpne

Zhang_Shuo.pdf (2.122Mb)

Permanent lenke

http://hdl.handle.net/11250/2413906

Utgivelsesdato

2016-06-14

Metadata

Vis full innførsel

Samlinger

Studentoppgaver (TN-IDE) [835]

Sammendrag

Information retrieval technique assists us to extract information from a huge

amount of information sources. Web search engine is a typically commercial

system implementing information retrieval technique and receiving increasing

popularity with larger amount of searching demands nowadays.

Users’ requirements on web search could be quite various. They may search

for entities like music, people, locations, products, etc, or verticals like “shopping”, “news”, “images”, etc. All these entities or verticals could be placed in

multiple documents and possibly in additional sources. As a result, when information retrieval is searching for objects associated with multiple documents, we

need to “fuse” information from multiple documents. Normally, there are two

ways to fuse documents, one strategy is “early” fusion, where a term-based representation is built for each object (e.g., entity or vertical). The other strategy

is “late” fusion, where firstly relevant documents are retrieved, then their scores

are combined. In this project, two general fusion strategies, which are objectcentric model and document-centric model respectively, will be introduced and

implemented across federated search and expert search.

Federated search is a search task for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely

to return relevant answers. Fusion-based methods are used for ranking these

collections by similarity between query and collection. Expert search is a task

for locating expertise with the associated documents, topics, etc. An expert’s

knowledge can be modeled based on the associated documents, or modeling topics enables to find the documents. In this project, the literature on federated

search, expert search and blog distillation tasks and their experiment data sets

will be introduced, of which the last one is for further experiment.

To evaluate the performance of two fusion-based methods in different tasks,

comparison and analysis are carried out both between fusion methods and probability estimation methods. The effectiveness and efficiency of search results are

the most concerned evaluation factors. Finally, conclusion is drawn based on

the performances of object-centric and document-centric models.

Beskrivelse

Master's thesis in Computer science

Utgiver

University of Stavanger, Norway

Serie

Masteroppgave/UIS-TN-IDE/2016;

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 3.0 Norge