IMPACT OF DATA PRE-PROCESSING TECHNIQUES ON MACHINE LEARNING MODELS

Tahir, Ali

dc.contributor.advisor	Sui, Dan
dc.contributor.author	Tahir, Ali
dc.date.accessioned	2022-10-26T15:51:15Z
dc.date.available	2022-10-26T15:51:15Z
dc.date.issued	2022
dc.identifier	no.uis:inspera:107970678:81623551
dc.identifier.uri	https://hdl.handle.net/11250/3028476
dc.description.abstract	The Volve dataset, which contains the time series values of different sensors that have been used at the Volve drilling site contains many flaws which make it hard for machine learning models to learn from the dataset and provide useful insights and future predictions. Three flaws have been highlighted including missing data, different frequency rates, and too many attributes (high dimensional data). To solve the issues, present in time series data, a data preprocessing pipeline has been proposed which first removes the noise through the rolling mean. Then applies gap analysis to remove the columns whose gaps can not be filled with data imputation methods. After that gap has been filled by the KNN imputer which imputes the missing values in the data. After that data resampling has been applied to make the sampling rate consistent as the time series prediction model takes a constant sampling rate. For hyper-parameter tuning of the resampling method AIC and BIC value has been created on a grid of hyper-parameters. After resampling, top parameters were selected on basis of Pearson correlation, after which AIC and BIC has been used to select the most relevant 3 parameters. These 3 parameters has then be used to train three models that are: RNN + MLP, LSTM + MLP, and LSTM + RNN + MLP. On basis of mean absolute error (MAE) best model has been selected which is RNN + MLP.
dc.description.abstract
dc.language	eng
dc.publisher	uis
dc.title	IMPACT OF DATA PRE-PROCESSING TECHNIQUES ON MACHINE LEARNING MODELS
dc.type	Master thesis

Tilhørende fil(er)

Filnavn:: no.uis:inspera:107970678:81623 ...
Størrelse:: 1.148Mb
Format:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Studentoppgaver (TN-IER) [150]
Master- og bacheloroppgaver i energiressurser

Vis enkel innførsel