Parallel PSO in Spark

Cui, Long

dc.contributor.author	Cui, Long
dc.date.accessioned	2014-09-24T12:45:40Z
dc.date.available	2014-09-24T12:45:40Z
dc.date.issued	2014-06-16
dc.identifier.uri	http://hdl.handle.net/11250/221469
dc.description	Master's thesis in Computer science	nb_NO
dc.description.abstract	It is now commonly realized that the energy consumption in our world is high but the energy efficiency is comparatively low. Therefore scientists and engineers are trying to investigate and develop new methodologies and applications which could lead to improve the total energy efficiency ratio to ease the problem. Moreover, the quantity and diversity of data required to solve energy optimization problems is increasing dramatically every year. Therefore it is urgent to adapt and create new algorithms that can effectively and efficiently solve Big Data energetic optimization problems in useful time. Particle Swarm Optimization (PSO) is an algorithm that can efficiently be used to solve energy optimization problems. However, most of today's energy optimization problems requires huge computational power and an efficient distributed implementation of the PSO. This thesis focuses in contributing to the above mentioned problem of efficiently distributing the PSO using a new technology named Spark. This thesis describes how to adapt the classic Particle Swarm Optimization algorithm to the distributed Big Data platform Spark. The solution of the problem is to firstly define the Spark data structure (Resilient Distributed Dataset) for our PSO algorithm and then create the initialization algorithm for parallel PSO. The main processing algorithm consists of Foreach Operation design and Collect Operation design. We implemented our algorithm using Java and tested it with a real world use case of energy optimization for buildings. The use case is part of the EU FP7 research project named SEEDS 1 (Self-Learning Energy Efficient Buildings and Open Spaces) that has the participation of the University of Stavanger. The experiments show that both Spark and Hadoop could carry out big data calculation which normal serial PSO could not handle. Even given enough memory, serial PSO could take days or months to calculate. We found out that Spark could compute 32 times faster then the well known platform Hadoop and that this difference is even bigger when the number of iterations of PSO is more than 19.	nb_NO
dc.language.iso	eng	nb_NO
dc.publisher	University of Stavanger, Norway	nb_NO
dc.relation.ispartofseries	Masteroppgave/UIS-TN-IDE/2014;
dc.subject	PSO	nb_NO
dc.subject	Spark	nb_NO
dc.subject	RDD	nb_NO
dc.subject	parallel	nb_NO
dc.subject	energy efficiency	nb_NO
dc.subject	informasjonsteknologi	nb_NO
dc.subject	datateknikk	nb_NO
dc.subject	energy optimization	nb_NO
dc.title	Parallel PSO in Spark	nb_NO
dc.type	Master thesis	nb_NO
dc.subject.nsi	VDP::Technology: 500::Information and communication technology: 550::Computer technology: 551	nb_NO

Tilhørende fil(er)

Filnavn:: thesis.pdf
Størrelse:: 1.739Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Studentoppgaver (TN-IDE) [823]
Studentoppgaver i informasjonsteknologi, datateknikk / kybernetikk, signalbehandling

Vis enkel innførsel