Vis enkel innførsel

dc.contributor.authorAgrawal, Bikash
dc.date.accessioned2013-10-01T13:13:04Z
dc.date.available2013-10-01T13:13:04Z
dc.date.issued2013
dc.identifier.urihttp://hdl.handle.net/11250/181819
dc.descriptionMaster's thesis in Computer Scienceno_NO
dc.description.abstractIn recent years, the quantity of time series data generated in a wide variety of domains have grown consistently. Analyzing time-series datasets at a massive scale is one of the biggest challenges that data scientists are facing. This thesis focuses on implementation of a tool for analyzing large time-series data. It describes a way to analyze the data stored by OpenTSDB. OpenTSDB is an open source distributed and scalable time series database. It has become a challenge for statisticians and data scientists to analyze such massive data sets with the same level of comprehensive details as is possible for smaller analyses. Currently tools available for time-series analysis are time and memory consuming. Moreover, no single tool exists that specializes on providing an efficient implementations of analyzing time-series data through MapReduce programming model at massive scale. For these reason, we have designed an efficient and distributed computing framework - R2Time. R2Time integrates R open source project for statistical computing and visualization with the OpenTSDB [1] and RHIPE [2] based on the MapReduce framework for the distributed processing of large data sets across a cluster. It creates the programming environment by integrating R and HBase for the data scientists. This thesis describes the architecture of R2Time framework. The usefulness of this framework is verified by the performance analysis based on carefully choosen types of statistical analysis for time-series data. With the increase in the time-series data size and complexity of statistical functions, we have noticed supralinear nature in the performance of R2Time framework. The performance of this framework is verified by the performance analysis based on different configurations setting. Configuration settings as scan cache and batch size plays vital role with the performances of timeseries data.no_NO
dc.language.isoengno_NO
dc.publisherUniversity of Stavanger, Norwayno_NO
dc.relation.ispartofseriesMasteroppgave/UIS-TN-IDE/2013;
dc.subjectRHIPEno_NO
dc.subjecttime-seriesno_NO
dc.subjectR2Timeno_NO
dc.subjectHadoopno_NO
dc.subjectMapReduceno_NO
dc.subjectHBaseno_NO
dc.subjectOpenTSDBno_NO
dc.subjectinformasjonsteknologino_NO
dc.subjectdatateknikkno_NO
dc.titleAnalysis of large time-series data in OpenTSDBno_NO
dc.typeMaster thesisno_NO
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550no_NO


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel