Vis enkel innførsel

dc.contributor.authorChiliboyi, Erick
dc.date.accessioned2012-09-25T13:50:55Z
dc.date.available2012-09-25T13:50:55Z
dc.date.issued2012
dc.identifier.urihttp://hdl.handle.net/11250/181801
dc.descriptionMaster's thesis in Computer Scienceno_NO
dc.description.abstractK-means is the most commonly known partitioning algorithm used for data clustering. It was originally designed to run on a single processor. Therefore, this created a limitation on dealing with large amounts of data because of the requirement to have data resident in memory. However, the advent of distributed systems has led to the design of parallel versions of the algorithm. This is done with the intention to allow the algorithm to work with large sets of data that would otherwise be impossible to handle on a single machine, due to limited processing power and memory. Since there are currently many applications that are generating large sets of data, for example in oil exploration, social media and image processing, research in parallel clustering algorithms has received a great deal of attention so as to find efficient ways of data analysis. Clustering algorithms that can be formulated according to the MapReduce framework such as K-means provide a great opportunity for data analysis. This is because the actual data is used to get statistics. Otherwise, statistics would only be obtained by applying sampling methods on the original data. Running a parallel algorithm on a distributed platform to handle large data sets, requires alot of data transfer on the network. This exchange of data could choke the network if it is beyond the capacity of the network. Therefore, there is a way of reducing the amount of traffic on the network and that is by partial aggregation. Reducing traffic on the network has been known to increase efficiency. In this thesis, we look at two things. First, how partial aggregation impacts the K-means algorithm running on a distributed system. Second, we compare two implementations of the K-means algorithm, Java and R.no_NO
dc.language.isoengno_NO
dc.publisherUniversity of Stavanger, Norwayno_NO
dc.relation.ispartofseriesMasteroppgave/UIS-TN-IDE/2012;
dc.subjectdata miningno_NO
dc.subjectparallel algorithmsno_NO
dc.subjectcluster analysisno_NO
dc.subjectinformasjonsteknologino_NO
dc.subjectdatateknikkno_NO
dc.titleInvestigating how partial aggregation impacts K-meansno_NO
dc.typeMaster thesisno_NO
dc.subject.nsiVDP::Technology: 500::Information and communication technology: 550no_NO


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel