Parallel PSO in Spark
Master thesis
Åpne
Permanent lenke
http://hdl.handle.net/11250/221469Utgivelsesdato
2014-06-16Metadata
Vis full innførselSamlinger
- Studentoppgaver (TN-IDE) [823]
Sammendrag
It is now commonly realized that the energy consumption in our world is high but the energy
efficiency is comparatively low. Therefore scientists and engineers are trying to investigate and
develop new methodologies and applications which could lead to improve the total energy
efficiency ratio to ease the problem. Moreover, the quantity and diversity of data required to solve
energy optimization problems is increasing dramatically every year. Therefore it is urgent to adapt
and create new algorithms that can effectively and efficiently solve Big Data energetic optimization
problems in useful time. Particle Swarm Optimization (PSO) is an algorithm that can efficiently be
used to solve energy optimization problems. However, most of today's energy optimization
problems requires huge computational power and an efficient distributed implementation of the
PSO.
This thesis focuses in contributing to the above mentioned problem of efficiently distributing
the PSO using a new technology named Spark. This thesis describes how to adapt the classic
Particle Swarm Optimization algorithm to the distributed Big Data platform Spark.
The solution of the problem is to firstly define the Spark data structure (Resilient Distributed
Dataset) for our PSO algorithm and then create the initialization algorithm for parallel PSO. The
main processing algorithm consists of Foreach Operation design and Collect Operation design.
We implemented our algorithm using Java and tested it with a real world use case of energy
optimization for buildings. The use case is part of the EU FP7 research project named SEEDS
1
(Self-Learning Energy Efficient Buildings and Open Spaces) that has the participation of the
University of Stavanger. The experiments show that both Spark and Hadoop could carry out big
data calculation which normal serial PSO could not handle. Even given enough memory, serial PSO
could take days or months to calculate. We found out that Spark could compute 32 times faster then
the well known platform Hadoop and that this difference is even bigger when the number of
iterations of PSO is more than 19.
Beskrivelse
Master's thesis in Computer science