COMPARATIVE ANALYSIS OF MISSING DATA IMPUTATION METHODS IN BIOMEDICAL RESEARCH: IMPACT ON BIOLOGICAL AGE PREDICTION
DOI:
https://doi.org/10.31891/csit-2026-2-21Keywords:
data imputation, missing data, biological age, machine learning, MCAR, MAR, MNARAbstract
Missing data remain a major challenge in biomedical research because they can bias statistical estimates, reduce predictive accuracy, and compromise the robustness of scientific conclusions. The present study provides a comparative evaluation of five imputation approaches: IterativeImputer with RandomForest, ExtraTrees, and BayesianRidge estimators, together with KNNImputer and median-based SimpleImputer. The methods were assessed on two biomedical datasets, Bones (3,285 records, 11 biomarkers, n/p = 299) and NHANES (11,016 records after reduction from 55,081, 85 biomarkers, n/p = 130), with an n/p gradient ranging from 19 to 299. The experimental design incorporated three missingness mechanisms, MCAR, MAR, and MNAR, and three missingness levels: 10%, 40%, and 80%. Imputation quality was quantified using RMSE, while downstream effects were examined through biological age prediction based on ElasticNet and PCA models. IterativeImputer with ExtraTrees achieved the lowest average RMSE (9.275), whereas BayesianRidge and RandomForest demonstrated the strongest average rank (2.19-2.20), indicating more stable overall performance across heterogeneous scenarios. Under MNAR conditions, RandomForest produced the best results (RMSE 10.896), while ExtraTrees was most effective for MAR (RMSE 8.704). Downstream analysis showed that PCA yielded lower prediction RMSE than ElasticNet (2.14 versus 5.86), although 34% of cases exhibited negative correlations. A paradoxical improvement in imputation quality with increasing missingness was observed in 55-75% of scenarios. Median imputation was the fastest method (0.0075 s), whereas RandomForest was the slowest (261 s). The findings support practical recommendations for selecting imputation strategies according to dataset structure, missingness mechanism, and computational constraints in biomedical applications.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2026 Володимир СЛІПЧЕНКО, Любов Полягушко, Олександр ВОЛКОВ

This work is licensed under a Creative Commons Attribution 4.0 International License.
