Analysis of large time-series data in OpenTSDB

Agrawal, Bikash

dc.contributor.author	Agrawal, Bikash
dc.date.accessioned	2013-10-01T13:13:04Z
dc.date.available	2013-10-01T13:13:04Z
dc.date.issued	2013
dc.identifier.uri	http://hdl.handle.net/11250/181819
dc.description	Master's thesis in Computer Science	no_NO
dc.description.abstract	In recent years, the quantity of time series data generated in a wide variety of domains have grown consistently. Analyzing time-series datasets at a massive scale is one of the biggest challenges that data scientists are facing. This thesis focuses on implementation of a tool for analyzing large time-series data. It describes a way to analyze the data stored by OpenTSDB. OpenTSDB is an open source distributed and scalable time series database. It has become a challenge for statisticians and data scientists to analyze such massive data sets with the same level of comprehensive details as is possible for smaller analyses. Currently tools available for time-series analysis are time and memory consuming. Moreover, no single tool exists that specializes on providing an efficient implementations of analyzing time-series data through MapReduce programming model at massive scale. For these reason, we have designed an efficient and distributed computing framework - R2Time. R2Time integrates R open source project for statistical computing and visualization with the OpenTSDB [1] and RHIPE [2] based on the MapReduce framework for the distributed processing of large data sets across a cluster. It creates the programming environment by integrating R and HBase for the data scientists. This thesis describes the architecture of R2Time framework. The usefulness of this framework is verified by the performance analysis based on carefully choosen types of statistical analysis for time-series data. With the increase in the time-series data size and complexity of statistical functions, we have noticed supralinear nature in the performance of R2Time framework. The performance of this framework is verified by the performance analysis based on different configurations setting. Configuration settings as scan cache and batch size plays vital role with the performances of timeseries data.	no_NO
dc.language.iso	eng	no_NO
dc.publisher	University of Stavanger, Norway	no_NO
dc.relation.ispartofseries	Masteroppgave/UIS-TN-IDE/2013;
dc.subject	RHIPE	no_NO
dc.subject	time-series	no_NO
dc.subject	R2Time	no_NO
dc.subject	Hadoop	no_NO
dc.subject	MapReduce	no_NO
dc.subject	HBase	no_NO
dc.subject	OpenTSDB	no_NO
dc.subject	informasjonsteknologi	no_NO
dc.subject	datateknikk	no_NO
dc.title	Analysis of large time-series data in OpenTSDB	no_NO
dc.type	Master thesis	no_NO
dc.subject.nsi	VDP::Technology: 500::Information and communication technology: 550	no_NO

Tilhørende fil(er)

Filnavn:: r2time_MasterThesis.pdf
Størrelse:: 898.4Kb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Studentoppgaver (TN-IDE) [850]
Studentoppgaver i informasjonsteknologi, datateknikk / kybernetikk, signalbehandling

Vis enkel innførsel