Parallel PSO in Spark
MetadataShow full item record
- Studentoppgaver (TN-IDE) 
It is now commonly realized that the energy consumption in our world is high but the energy efficiency is comparatively low. Therefore scientists and engineers are trying to investigate and develop new methodologies and applications which could lead to improve the total energy efficiency ratio to ease the problem. Moreover, the quantity and diversity of data required to solve energy optimization problems is increasing dramatically every year. Therefore it is urgent to adapt and create new algorithms that can effectively and efficiently solve Big Data energetic optimization problems in useful time. Particle Swarm Optimization (PSO) is an algorithm that can efficiently be used to solve energy optimization problems. However, most of today's energy optimization problems requires huge computational power and an efficient distributed implementation of the PSO. This thesis focuses in contributing to the above mentioned problem of efficiently distributing the PSO using a new technology named Spark. This thesis describes how to adapt the classic Particle Swarm Optimization algorithm to the distributed Big Data platform Spark. The solution of the problem is to firstly define the Spark data structure (Resilient Distributed Dataset) for our PSO algorithm and then create the initialization algorithm for parallel PSO. The main processing algorithm consists of Foreach Operation design and Collect Operation design. We implemented our algorithm using Java and tested it with a real world use case of energy optimization for buildings. The use case is part of the EU FP7 research project named SEEDS 1 (Self-Learning Energy Efficient Buildings and Open Spaces) that has the participation of the University of Stavanger. The experiments show that both Spark and Hadoop could carry out big data calculation which normal serial PSO could not handle. Even given enough memory, serial PSO could take days or months to calculate. We found out that Spark could compute 32 times faster then the well known platform Hadoop and that this difference is even bigger when the number of iterations of PSO is more than 19.
Master's thesis in Computer science