Tweet-based event summarization
Master thesis
View/ Open
Date
2014-06-12Metadata
Show full item recordCollections
- Studentoppgaver (TN-IDE) [891]
Abstract
Social media has become an ever-growing source of information over the last years. Facebook, Twitter, Instagram and other types of social media services have all grown to contain large amounts of data, written by anyone from everyday users to companies and institutions.
In this thesis, we explore the possibility of creating an event summarization system, which summarizes events based on microblog posts published to Twitter. We design a website interface for displaying event-related data and store all tweets in a scalable solution using Hbase. To determine a tweet’s relevance to an event we introduce a two-step filtering technique, where we use simple regular expression matching and apply a machine learning technique to predict a tweet’s relevance, based on feedback on previously accepted data.
We provide a viable solution for creating a tweet-based event summarization system. The system delivers a scalable and responsive end user experience by storing all event-related data in a non-relational database, namely the row-key store, Hbase. By using machine learning algorithms to determine if a tweet is event-relevant, we effectively reduce the number of false-positive tweets passing the filter. We evaluate three different classifiers, Random Forest, Naive Bayes and C4.5, and measure their precision over time as the system receives feedback. We also test three different model training strategies, using a single model strategy, where we creating a single model for all topics, a split model strategy where we use two models, one for ambiguous topics and one for unambiguous topics and an individual model strategy, creating a model per topic. Our results show that using a single training model with Random Forest perform best.
Description
Master's thesis in Computer science