Early Dementia Detection in Speech Transcripts Using Machine Learning and Large Language Models
Abstract
Dementia, a prevalent neurodegenerative disorder, poses significant challenges in early detection and intervention. Timely diagnosis is crucial for providing af- fected individuals and their families with the necessary support and resources, ultimately improving the quality of life.This thesis explores innovative approaches for detecting dementia through the analysis of linguistic patterns in speech transcripts using advanced machine learning techniques. Leveraging datasets from DementiaBank, specifically the Pitt Corpus and ADReSS challenge datasets, we employ a combination of tradi- tional machine learning models and cutting-edge Large Language Models (LLMs) to analyze cognitive and linguistic features.Our methodology includes comprehensive data preprocessing, feature extrac- tion from the ’Cookie Theft’ picture description test, and model fine-tuning. We investigate the effectiveness of various classic machine learning models such as Logistic Regression, Random Forest, Support Vector Machines, and transfer learn- ing techniques with pre-trained models like BERT, DistilBERT, RoBERTa, Mis- tral, and Llama. The study highlights the significant potential of LLMs, especially when enhanced with Low-Rank Adaptation (LoRA) and Quantization (QLoRA) methods, in accurately detecting dementia.