Show simple item record

dc.contributor.advisorEsteves, Rui Paulo Maximo Pereira Mateus
dc.contributor.authorJacobsen, Håvard Moe
dc.contributor.authorFlotve, Ola Andrè
dc.date.accessioned2023-09-16T15:51:22Z
dc.date.available2023-09-16T15:51:22Z
dc.date.issued2023
dc.identifierno.uis:inspera:129718883:50646236
dc.identifier.urihttps://hdl.handle.net/11250/3089848
dc.description.abstractIn this thesis, we present a solution for the challenge of optimizing the retrieval of data in Spark. Our column recommendation system is based on Spark's event logs and finds influential columns for Z-ordering and partitioning. The column recommendation system consists of four methods, each looking for different query patterns and query characteristics. From the recommendation system experiment, we managed to improve the run time by 17% compared to the baseline. This improvement demonstrates our column recommendation system's potential for optimizing data retrieval in Spark. Our system was developed on an ETL platform and is a flexible solution for ETL platforms utilizing Spark.
dc.description.abstract
dc.languageeng
dc.publisheruis
dc.titleSpark Optimization: A Column Recommendation System for Data Partitioning and Z-Ordering on ETL Platforms
dc.typeMaster thesis


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record