Early Dementia Detection in Speech Transcript using Machine Learning and LArge language Models
Abstract
Dementia, a prevalent neurodegenerative disorder, poses significant challenges in early detection and intervention. Timely diagnosis is crucial for providing affected individuals and their families with the necessary support and resources, ultimately improving the quality of life. This thesis explores innovative approaches for detecting dementia through the analysis of linguistic patterns in speech transcripts using advanced machine learning techniques. Leveraging datasets from DementiaBank, specifically the Pitt Corpus and ADReSS challenge datasets, we employ a combination of traditional machinelearningmodelsandcutting-edgeLargeLanguageModels(LLMs) to analyze cognitive and linguistic features. Ourmethodologyincludescomprehensivedatapreprocessing,featureextraction from the ’Cookie Theft’ picture description test, and model fine-tuning. We investigate the effectiveness of various classic machine learning models such as LogisticRegression,RandomForest,SupportVectorMachines,andtransferlearning techniques with pre-trained models like BERT, DistilBERT, RoBERTa, Mistral, and Llama. Thestudyhighlightsthesignificantpotential ofLLMs, especially when enhanced with Low-Rank Adaptation (LoRA) and Quantization (QLoRA) methods, in accurately detecting dementia