Speech digital biomarker combined with fluid biomarkers predict cognitive impairment through machine learning

Release time: Publisher:MKT Dept Reading times:

【Abstract】
【Background】
Current methods for the early detection of Alzheimer’s disease (AD) are constrained by high costs, invasiveness, and limited accessibility, underscoring the urgent need for alternative approaches that are accessible, affordable, and patient-friendly. Previous research has identified speech analysis as a promising tool for the early diagnosis of cognitive impairment (CI). However, the correlation between speech tests and underlying pathology remains undetermined or even obscure. Its clinical utility still lacks pathological validation. We need to further explore the relationship through large-sample analysis and further construct models that can diagnose CIf.

【Methods】
1223 participants including probable AD or AD (n = 238), amnestic mild cognitive impairment (aMCI) (n = 461) and cognitively unimpaired (CU) (n = 524) were recruited. The participants underwent neuropsychological tests, speech recordings of the “cookie-theft” task, serum biomarker quantification, APOE genotyping, and part of them underwent Aβ PET imaging. Partial Correlation Analysis and LOWESS were used to explore the correlation between speech digital biomarkers and other core AD biomarkers. Finally, machine learning such as XGBoost and Logistic regression were used for constructing the most cost-effective models for CI and Aβ status, leveraging SHAP values for screening.

【Results

Significant differences in AD biomarkers were observed among different groups. Notably, the speech digital biomarker percentage of silence duration (PSD) was correlated with cognitive level, serum glial fibrillary acidic protein (GFAP), neurofilament light chain (NFL), phosphorylated tau protein 217 (p-Tau217) and amyloid deposition in specific brain regions. Additionally, we discovered that as the different stages of Aβ deposition progress, PSD, p-Tau217, and GFAP exhibit a two-stage change pattern. Based on the findings, a machine learning CI diagnostic model (AUC = 0.928, 95% CI 0.897 to 0.960) incorporating PSD, APOE, GFAP, and demographic information was developed. Furthermore, an Aβ status classification model (AUC = 0.845, 95% CI 0.783 to 0.907) with PSD, APOE, p-Tau217, and demographic data was also constructed.

【Conclusion

Combining speech digital markers with serum and other biomarkers helps identify CI, representing a promising advance in AD detection. This study serves as a preliminary yet encouraging step toward applying speech digital biomarkers in AD diagnostics.


DOI:10.1186/s13195-025-01877-6