Vis enkel innførsel

dc.contributor.authorHåland, Chris
dc.date.accessioned2014-09-24T12:37:34Z
dc.date.available2014-09-24T12:37:34Z
dc.date.issued2014-06-12
dc.identifier.urihttp://hdl.handle.net/11250/221462
dc.descriptionMaster's thesis in Computer sciencenb_NO
dc.description.abstractSocial media has become an ever-growing source of information over the last years. Facebook, Twitter, Instagram and other types of social media services have all grown to contain large amounts of data, written by anyone from everyday users to companies and institutions. In this thesis, we explore the possibility of creating an event summarization system, which summarizes events based on microblog posts published to Twitter. We design a website interface for displaying event-related data and store all tweets in a scalable solution using Hbase. To determine a tweet’s relevance to an event we introduce a two-step filtering technique, where we use simple regular expression matching and apply a machine learning technique to predict a tweet’s relevance, based on feedback on previously accepted data. We provide a viable solution for creating a tweet-based event summarization system. The system delivers a scalable and responsive end user experience by storing all event-related data in a non-relational database, namely the row-key store, Hbase. By using machine learning algorithms to determine if a tweet is event-relevant, we effectively reduce the number of false-positive tweets passing the filter. We evaluate three different classifiers, Random Forest, Naive Bayes and C4.5, and measure their precision over time as the system receives feedback. We also test three different model training strategies, using a single model strategy, where we creating a single model for all topics, a split model strategy where we use two models, one for ambiguous topics and one for unambiguous topics and an individual model strategy, creating a model per topic. Our results show that using a single training model with Random Forest perform best.nb_NO
dc.language.isoengnb_NO
dc.publisherUniversity of Stavanger, Norwaynb_NO
dc.relation.ispartofseriesMasteroppgave/UIS-TN-IDE/2014;
dc.rightsAttribution 3.0 Norway*
dc.rights.urihttp://creativecommons.org/licenses/by/3.0/no/*
dc.subjectdata and information managementnb_NO
dc.subjectmachine learningnb_NO
dc.subjectNaïve Bayesnb_NO
dc.subjectRandom Forestnb_NO
dc.subjectC4.5nb_NO
dc.subjectsocial medianb_NO
dc.subjectTwitternb_NO
dc.subjectinformasjonsteknologinb_NO
dc.subjectsosiale mediernb_NO
dc.subjectdatateknikk
dc.titleTweet-based event summarizationnb_NO
dc.typeMaster thesisnb_NO
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550::Computer technology: 551nb_NO


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Attribution 3.0 Norway
Med mindre annet er angitt, så er denne innførselen lisensiert som Attribution 3.0 Norway