Vis enkel innførsel

dc.contributor.advisorWlodarczyk, Tomasz Wiktor
dc.contributor.authorTherese Jose, Dhanya
dc.date.accessioned2017-09-19T11:18:31Z
dc.date.available2017-09-19T11:18:31Z
dc.date.issued2017-06-15
dc.identifier.urihttp://hdl.handle.net/11250/2455437
dc.descriptionMaster's thesis in Computer Sciencenb_NO
dc.description.abstractIn recent years, organizations have changed their work culture in which the business and IT leaders work together with the organizational data in order to make decisions and planning. The handling of these big data was always a challenging taking for IT people as it involved large and complex information, which cannot be handled by conventional tools. For the present study on big data analytics, yelp dataset is taken as a case study. Yelp is a website which publishes crowd-sourced reviews about local businesses and provides opportunity to business owners to improve their services and helps the users to choose best business amongst available. However, it is not possible for the business owners to go through all the user reviews and make important decisions for the improvement of their business. Here comes the importance of big data analytics. There have been many researchers in the past who worked with yelp dataset and produced very good results with the data. However, many of those studies were focussed on prediction algorithms. In the present study, an attempt is made to interpret the yelp review data using two different data processing techniques; change point analysis and sentiment analysis. Our approach is aimed to provide the owners a more realistic interpretation of the yelp data and finally make some important decisions on the improvement of the business. The relevant businesses for the present study are obtained based on certain criteria, in order to have a better applicability of the analysis methods. The businesses which have adequate number of reviews and highest fluctuation in the business star ratings are chosen for the study. The change point algorithm is used to obtain the period of fluctuation in the star rating over the past years. In order to ensure optimum number of change points obtained, various parameters used in the change point algorithm is determined based on a sensitivity study. The change points obtained indicated the time where there is a noticeable deviation in the business star ratings. From the present study, it is observed that the number of change points obtained strongly depends on the penalty function used in the algorithm. Further in the study, sentiment analysis is performed on the review text data corresponding to the same business and star rating data used in change point analysis. Sentiment analysis is meant for text data processing, in which the overall polarity of the text is obtained based on the positive and negative words and phrases used in the text data. In the present study, the polarity of the review text data is obtained using sentiment analysis. Sentiment analysis is performed using Textblob text processing in python. It was observed that there is an overall agreement with the sentiment score of the review text and business ratings. The correlation between sentiment score and change points obtained for the selected businesses were further investigated. There was clear deviation in the sentiment score whenever there is a change point obtained. The possible reasons for the deviation in the star ratings were made based on reviewing the positive and negative noun phrases in the business review text data.nb_NO
dc.language.isoengnb_NO
dc.publisherUniversity of Stavanger, Norwaynb_NO
dc.relation.ispartofseriesMasteroppgave/UIS-TN-IDE/2017;
dc.subjectinformasjonsteknologinb_NO
dc.subjectchange point analysisnb_NO
dc.subjectsentiment analysisnb_NO
dc.subjectYelp datasetnb_NO
dc.subjectbig data analysisnb_NO
dc.titleBig Data Analytics- Case Study- Yelp Datasetnb_NO
dc.typeMaster thesisnb_NO
dc.subject.nsiVDP::Teknologi: 500::Informasjons- og kommunikasjonsteknologi: 550::Datateknologi: 551nb_NO


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel