Parallel PSO in Spark

2014-06-16

It is now commonly realized that the energy consumption in our world is high but the energy

efficiency is comparatively low. Therefore scientists and engineers are trying to investigate and

develop new methodologies and applications which could lead to improve the total energy

efficiency ratio to ease the problem. Moreover, the quantity and diversity of data required to solve

energy optimization problems is increasing dramatically every year. Therefore it is urgent to adapt

and create new algorithms that can effectively and efficiently solve Big Data energetic optimization

problems in useful time. Particle Swarm Optimization (PSO) is an algorithm that can efficiently be

used to solve energy optimization problems. However, most of today's energy optimization

problems requires huge computational power and an efficient distributed implementation of the

PSO.

This thesis focuses in contributing to the above mentioned problem of efficiently distributing

the PSO using a new technology named Spark. This thesis describes how to adapt the classic

Particle Swarm Optimization algorithm to the distributed Big Data platform Spark.

The solution of the problem is to firstly define the Spark data structure (Resilient Distributed

Dataset) for our PSO algorithm and then create the initialization algorithm for parallel PSO. The

main processing algorithm consists of Foreach Operation design and Collect Operation design.

We implemented our algorithm using Java and tested it with a real world use case of energy

optimization for buildings. The use case is part of the EU FP7 research project named SEEDS

(Self-Learning Energy Efficient Buildings and Open Spaces) that has the participation of the

University of Stavanger. The experiments show that both Spark and Hadoop could carry out big

data calculation which normal serial PSO could not handle. Even given enough memory, serial PSO

could take days or months to calculate. We found out that Spark could compute 32 times faster then

the well known platform Hadoop and that this difference is even bigger when the number of

iterations of PSO is more than 19.

Master's thesis in Computer science

University of Stavanger, Norway

Masteroppgave/UIS-TN-IDE/2014;