Deep learning for Alzheimer’s disease: a fairness perspective
Description
Full text not available
Abstract
Alzheimer’s Disease (AD) is a progressive disease, and detecting the disease in early stages enables the patients to begin the treatments for the symptoms to slow down the progression as soon as possible. Neuroimaging data like brain magnetic resonance imaging (MRI) can provide a lot of information about the brain, which can be utilized to detect the AD in early stages. This task can be done by a specialist, but it is complex and not so accurate. So, the use of deep learning (DL) techniques to achieve this task has become very popular in the recent years because of the ability that DL networks, and especially convolutional neural networks (CNNs), have to extract hidden features from these complex data in addition to the great performance these networks offer.
Recently, there have been a lot of studies to detect AD in early stages by using DL, but not many of them considered the difference in performance between genders. In this study, we trained three state-of-the-art three-dimensional (3D) CNN models using T1-weighted 3D MRI images obtained from Alzheimer's Disease Neuroimaging Initiative (ADNI) to perform three-way multiclass classifications for three different states: Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and AD. In addition, we performed conformal prediction on the predictions of the DL models to add some level of certainty to the predictions. The DL models and the conformal prediction were tested on the whole test dataset and the test dataset per gender to perform the gender fairness analysis. Then, the performance and calibration of the DL models and the conformal prediction were quantified, and the disparities between the results for genders for both the performance and calibration were calculated.
The best of our DL models, when tested on the whole test dataset, had a balanced accuracy for the three-way classification of 53% and an AUC for the macro-average ROC for the three classes of 72%. When the DL models were tested on the test dataset per gender, the results for the performance and calibration per gender were almost even between the females and males. After applying the conformal prediction for the predictions of the DL models on the whole dataset and the test dataset per gender, the performance of the DL models in general was not improved much. After all, the disparities between females and males for the results for the DL models per gender and the conformal prediction per gender were calculated. The disparity results did not show that there was a dominant gender for these results and did not indicate that there is a gender that has better performance than the other. Alzheimer’s Disease (AD) is a progressive disease, and detecting the disease in early stages enables the patients to begin the treatments for the symptoms to slow down the progression as soon as possible. Neuroimaging data like brain magnetic resonance imaging (MRI) can provide a lot of information about the brain, which can be utilized to detect the AD in early stages. This task can be done by a specialist, but it is complex and not so accurate. So, the use of deep learning (DL) techniques to achieve this task has become very popular in the recent years because of the ability that DL networks, and especially convolutional neural networks (CNNs), have to extract hidden features from these complex data in addition to the great performance these networks offer.
Recently, there have been a lot of studies to detect AD in early stages by using DL, but not many of them considered the difference in performance between genders. In this study, we trained three state-of-the-art three-dimensional (3D) CNN models using T1-weighted 3D MRI images obtained from Alzheimer's Disease Neuroimaging Initiative (ADNI) to perform three-way multiclass classifications for three different states: Cognitively Normal (CN), Mild Cognitive Impairment (MCI), and AD. In addition, we performed conformal prediction on the predictions of the DL models to add some level of certainty to the predictions. The DL models and the conformal prediction were tested on the whole test dataset and the test dataset per gender to perform the gender fairness analysis. Then, the performance and calibration of the DL models and the conformal prediction were quantified, and the disparities between the results for genders for both the performance and calibration were calculated.
The best of our DL models, when tested on the whole test dataset, had a balanced accuracy for the three-way classification of 53% and an AUC for the macro-average ROC for the three classes of 72%. When the DL models were tested on the test dataset per gender, the results for the performance and calibration per gender were almost even between the females and males. After applying the conformal prediction for the predictions of the DL models on the whole dataset and the test dataset per gender, the performance of the DL models in general was not improved much. After all, the disparities between females and males for the results for the DL models per gender and the conformal prediction per gender were calculated. The disparity results did not show that there was a dominant gender for these results and did not indicate that there is a gender that has better performance than the other.