Privacy preserving for Big Data Analysis
Master thesis
Permanent lenke
http://hdl.handle.net/11250/181827Utgivelsesdato
2013Metadata
Vis full innførselSamlinger
- Studentoppgaver (TN-IDE) [823]
Sammendrag
The Safer@Home [6] project at the University of Stavanger aims to create a
smart home system capturing sensor data from homes into it’s data cluster. To
provide assistive services through data analytic technologies, sensor data has
to be collected centrally in order to effectively perform knowledge discovery
algorithms. This Information collected from such homes is often very sensitive
in nature and needs to be protected while processing or sharing across the value
chain. Data has to be perturbed to protect against the disclosure and misuse
by adversaries. Anonymization is the process of perturbing data by generalizing
and suppresing identifiers which could be a potential threat by linking them with
publicly available databases. There is a great challenge of maintaining privacy
while still retaining the utitlity of the data.
This thesis evaluates various anonymization methods that suits our require-
ments. We present the software requirement specification of an anonymization
framework and provide the practical implementation of a well accepted privacy
preserving anonymization algorithm called Mondrian [7]. To quantify the in-
formation loss during the anonymization process, a framework is proposed to
evaluate the anonymized dataset. Moreover, it proposes the distributed method
for solving the anonymization process using the Hadoop MapReduce framework
to make a scalable system for big data analysis.
Beskrivelse
Master's thesis in Computer Science