Fusion-based information retrieval
Master thesis
Permanent lenke
http://hdl.handle.net/11250/2413906Utgivelsesdato
2016-06-14Metadata
Vis full innførselSamlinger
- Studentoppgaver (TN-IDE) [823]
Sammendrag
Information retrieval technique assists us to extract information from a huge
amount of information sources. Web search engine is a typically commercial
system implementing information retrieval technique and receiving increasing
popularity with larger amount of searching demands nowadays.
Users’ requirements on web search could be quite various. They may search
for entities like music, people, locations, products, etc, or verticals like “shopping”, “news”, “images”, etc. All these entities or verticals could be placed in
multiple documents and possibly in additional sources. As a result, when information retrieval is searching for objects associated with multiple documents, we
need to “fuse” information from multiple documents. Normally, there are two
ways to fuse documents, one strategy is “early” fusion, where a term-based representation is built for each object (e.g., entity or vertical). The other strategy
is “late” fusion, where firstly relevant documents are retrieved, then their scores
are combined. In this project, two general fusion strategies, which are objectcentric model and document-centric model respectively, will be introduced and
implemented across federated search and expert search.
Federated search is a search task for searching multiple text collections simultaneously. Queries are submitted to a subset of collections that are most likely
to return relevant answers. Fusion-based methods are used for ranking these
collections by similarity between query and collection. Expert search is a task
for locating expertise with the associated documents, topics, etc. An expert’s
knowledge can be modeled based on the associated documents, or modeling topics enables to find the documents. In this project, the literature on federated
search, expert search and blog distillation tasks and their experiment data sets
will be introduced, of which the last one is for further experiment.
To evaluate the performance of two fusion-based methods in different tasks,
comparison and analysis are carried out both between fusion methods and probability estimation methods. The effectiveness and efficiency of search results are
the most concerned evaluation factors. Finally, conclusion is drawn based on
the performances of object-centric and document-centric models.
Beskrivelse
Master's thesis in Computer science