Uncertainty analysis of Supervised machine learning predictions applied to Lithology classification
Abstract
Geosteering is the technique of guiding directional drilling to remain within the pay zone.This process demands a thorough survey of the lithological properties of the surrounding geo-logical strata. Since logging while drilling (LWD) tools are positioned a few meters above thebit, it generates depth lag and, thus, a time delay between what the LWD sensors report to thesurface and the performance of the bit. Drill bit and drill string performance factors are theearliest markers to determine formations’ characteristics without the temporal delay.Implementing automated lithology identification would enhance the quality of the geosteering operation. This thesis investigated the extent to which various supervised machine learning(ML) classification algorithms may be utilized to recognize the lithological features of drilledformations.ML models were trained using preprocessed real-time drilling data from the Volve field.The data included nine wells with a total of 198 928 tagged observations and the accompanyingmeasured parameters at various depths within the wells. The ML algorithms were tested on theselected well with a minority of samples presented in the dataset.The progress in ML algorithms application provides an incentive for more study on modeltrustworthiness, including uncertainty analysis, to improve classification algorithms used inlithology identification. Most ML algorithms may be thought of as "black box" models, mean-ing that the process by which variables are integrated to form predictions cannot be seen ortransparently understood. Hence, it is required to quantify and limit the uncertainties in mod-els’ performance to apply ML to real-life classification problems successfully.Within the scope of this research, Feature Sensitivity and Vulnerability Analysis, as well asDataset shift Measurement, were applied to investigate the reliability of ML models. A novelBlack Box Metamodel approach and Bayesian Neural Networks were employed to computealeatoric and epistemic uncertainties.After testing seven ML classification algorithms, the Random Forest and Adaptive Boostingones demonstrated the most accurate results and were chosen for comparative reliability analysis.In classification tasks, it is more crucial to estimate the probability that an observation be-longs to a specific class than the prediction results. Consequently, the Probability Calibrationtechniques improved the quality of the quantified uncertainties. It was proven that the AdaptiveBoosting algorithm with the better scoring results is less confident and ambiguous regardingepistemic uncertainty than the Random Forest one after calculating and comparing the difference between the confidence and accuracy results obtained after the Probability Calibration..