An Explainable Machine Learning Framework for Parkinson’s Disease Classification Using High-Dimensional Speech Features
Main Article Content
Abstract
Parkinson’s disease is a progressive neurodegenerative disorder in which early and reliable diagnosis remains clinically important. Speech impairment is one of the most common manifestations of the disease and offers a non-invasive basis for automated screening. This study proposes an explainable machine learning framework for Parkinson’s disease classification using high-dimensional speech features. The framework integrates data preprocessing, feature selection, multiple machine learning classifiers, probability-based evaluation, and explainability analysis to build an interpretable and robust diagnostic model. Four classifiers were applied: Logistic Regression, SVM with an RBF kernel, Random Forest, and XGBoost. Experimental evaluation on the speech-feature dataset showed that XGBoost achieved the best overall classification performance, while Random Forest produced the most reliable calibration. In addition, feature-importance analysis revealed that dynamic cepstral and TQWT-based speech descriptors were among the most influential predictors for Parkinson’s disease detection. Oversampling strategies, including Random Oversampling, SMOTE, and ADASYN, did not improve performance over the original data distribution, indicating that model-level robustness was more beneficial than synthetic class balancing for this dataset. Overall, the findings demonstrate that explainable ensemble machine learning provides an effective and interpretable approach for Parkinson’s disease classification from high-dimensional speech biomarkers.
Article Details
Issue
Section

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.
How to Cite
References
1. Haq, A. U., Li, J., Memon, M. H., Khan, J., Din, S. U., Ahad, I., ... & Lai, Z. (2018). Comparative analysis of the classification performance of machine learning classifiers and deep neural network classifier for prediction of Parkinson disease. In 2018 15th international computer conference on wavelet active media technology and information processing (ICCWAMTIP) (pp. 101-106). IEEE.
2. Ngo, Q. C., Motin, M. A., Pah, N. D., Drotár, P., Kempster, P., & Kumar, D. (2022). Computerized analysis of speech and voice for Parkinson's disease: A systematic review. Computer Methods and Programs in Biomedicine, 226, 107133.
3. Hani, A. A., Sallow, A. B., Ahmad, H. B., Abdulrahman, S. M., Asaad, R. R., Zeebaree, S. R., & Majeed, D. A. (2024). Comparative analysis of state-of-the-art classifiers for Parkinson's disease diagnosis. Jurnal Ilmiah Ilmu Terapan Universitas Jambi, 8(2), 409-423.
4. Shen, M., Mortezaagha, P., & Rahgozar, A. (2025). Explainable artificial intelligence to diagnose early Parkinson’s disease via voice analysis. Scientific Reports, 15(1), 11687.
5. Cao, F., Vogel, A. P., Gharahkhani, P., & Renteria, M. E. (2025). Speech and language biomarkers for Parkinson’s disease prediction, early diagnosis, and progression. npj Parkinson’s Disease, 11, Article 57.
6. Ngo, Q. C., Motin, M. A., Pah, N. D., Drotár, P., Kempster, P., & Kumar, D. (2022). Computerized analysis of speech and voice for Parkinson’s disease: A systematic review. Computer Methods and Programs in Biomedicine, 226, 107133.
7. Cao, F., Vogel, A. P., Gharahkhani, P., & Renteria, M. E. (2025). Speech and language biomarkers for Parkinson’s disease prediction, early diagnosis, and progression. npj Parkinson’s Disease, 11.
8. Hossain, M. A., et al. (2024). Machine learning-based classification of Parkinson’s disease patients using speech biomarkers. Journal of Parkinson’s Disease.
9. Jeong, S. M., et al. (2024). Machine learning-based classification of Parkinson’s disease by analyzing speech characteristics: A voting-based approach. Computers in Biology and Medicine.
10. Priyadharshini, S., Ramkumar, K., Vairavasundaram, S., Narasimhan, K., Venkatesh, S., Amirtharajan, R., & Kotecha, K. (2024). A comprehensive framework for Parkinson’s disease diagnosis using explainable artificial intelligence empowered machine learning techniques. Alexandria Engineering Journal, 107, 568–582.
11. Shen, M., et al. (2025). Explainable artificial intelligence to diagnose early Parkinson’s disease via voice analysis. Scientific Reports, 15.
12. Salih, M. S., Zebari, N. A., Masoud, R., & Zebari, D. A. (2025). Deep Transfer Learning and Feature Fusion for Improving Facial Expression Recognition on JAFFE Dataset. Applied Computing Journal.
13. Aighuraibawi, A. H. B., Manickam, S., Abdullah, R., Alyasseri, Z. A. A., Al-Ani, A. K. I., Zebari, D. A., ... & Arif, Z. H. (2023). Feature Selection for Detecting ICMPv6-Based DDoS Attacks Using Binary Flower Pollination Algorithm. Comput. Syst. Sci. Eng., 47(1), 553-574.
14. Qiu, J. (2024). An analysis of model evaluation with cross-validation: techniques, applications, and recent advances. Advances in Economics, Management and Political Sciences, 99, 69-72.
15. Dormann, C. F. (2020). Calibration of probability predictions from machine‐learning and statistical models. Global ecology and biogeography, 29(4), 760-765.
16. Rukhsar, S., Awan, M. J., Naseem, U., Zebari, D. A., Mohammed, M. A., Albahar, M. A., ... & Mahmoud, A. (2023). Artificial intelligence based sentence level sentiment analysis of COVID-19. Computer Systems Science and Engineering, 47(1), 791-807.
17. Egbo, B., Nigmetolla, Z., Khan, N. A., & Jamwal, P. K. (2025). Explainable machine learning for early detection of Parkinson’s disease in aging populations using vocal biomarkers. Frontiers in Aging Neuroscience, 17, 1672971.
18. Little, R. J. A., & Rubin, D. B. (2019). Statistical analysis with missing data (3rd ed.). Wiley.
19. Troyanskaya, O., Cantor, M., Sherlock, G., Brown, P., Hastie, T., Tibshirani, R., Botstein, D., & Altman, R. B. (2001). Missing value estimation methods for DNA microarrays. Bioinformatics, 17(6), 520–525.
20. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning (2nd ed.). Springer.
21. scikit-learn developers, “SimpleImputer,” scikit-learn documentation, accessed Apr. 13, 2026.
22. scikit-learn developers, “StandardScaler” and “KNNImputer,” scikit-learn documentation, accessed Apr. 13, 2026.
23. Zebari, D. A., Sadiq, S. S., & Sulaiman, D. M. (2022). Knee osteoarthritis detection using deep feature based on convolutional neural network. In 2022 international conference on computer science and software engineering (CSASE) (pp. 259-264). IEEE.
24. Salih, M. S., Ibrahim, R. K., Zeebaree, S. R., Asaad, D., Zebari, L. M., & Abdulkareem, N. M. (2024). Diabetic prediction based on machine learning using PIMA Indian dataset. Communications on Applied Nonlinear Analysis, 31(5s), 138-156.
25. Zebari, D. A., Abdulazeez, A. M., Zeebaree, D. Q., & Salih, M. S. (2020). A fusion scheme of texture features for COVID-19 detection of CT scan images. In 2020 international conference on advanced science and engineering (ICOASE) (pp. 1-6). IEEE.
26. Zebari, D. A., Sulaiman, D. M., Sadiq, S. S., Zebari, N. A., & Salih, M. S. (2022). Automated Detection of Covid-19 from X-ray Using SVM. In 2022 4th International Conference on Advanced Science and Engineering (ICOASE) (pp. 130-135). IEEE.
27. Mukhtar, A., Khalid, S., Toor, W. T., & Akhtar, M. S. (2024). Detection of Parkinson's Disease from Voice Signals Using Explainable Artificial Intelligence. In 2024 3rd International Conference on Emerging Trends in Electrical, Control, and Telecommunication Engineering (ETECTE) (pp. 1-6). IEEE.