COMPARATIVE ANALYSIS OF MISSING DATA IMPUTATION METHODS IN BIOMEDICAL RESEARCH: IMPACT ON BIOLOGICAL AGE PREDICTION

Authors

DOI:

https://doi.org/10.31891/csit-2026-2-21

Keywords:

data imputation, missing data, biological age, machine learning, MCAR, MAR, MNAR

Abstract

Missing data remain a major challenge in biomedical research because they can bias statistical estimates, reduce predictive accuracy, and compromise the robustness of scientific conclusions. The present study provides a comparative evaluation of five imputation approaches: IterativeImputer with RandomForest, ExtraTrees, and BayesianRidge estimators, together with KNNImputer and median-based SimpleImputer. The methods were assessed on two biomedical datasets, Bones (3,285 records, 11 biomarkers, n/p = 299) and NHANES (11,016 records after reduction from 55,081, 85 biomarkers, n/p = 130), with an n/p gradient ranging from 19 to 299. The experimental design incorporated three missingness mechanisms, MCAR, MAR, and MNAR, and three missingness levels: 10%, 40%, and 80%. Imputation quality was quantified using RMSE, while downstream effects were examined through biological age prediction based on ElasticNet and PCA models. IterativeImputer with ExtraTrees achieved the lowest average RMSE (9.275), whereas BayesianRidge and RandomForest demonstrated the strongest average rank (2.19-2.20), indicating more stable overall performance across heterogeneous scenarios. Under MNAR conditions, RandomForest produced the best results (RMSE 10.896), while ExtraTrees was most effective for MAR (RMSE 8.704). Downstream analysis showed that PCA yielded lower prediction RMSE than ElasticNet (2.14 versus 5.86), although 34% of cases exhibited negative correlations. A paradoxical improvement in imputation quality with increasing missingness was observed in 55-75% of scenarios. Median imputation was the fastest method (0.0075 s), whereas RandomForest was the slowest (261 s). The findings support practical recommendations for selecting imputation strategies according to dataset structure, missingness mechanism, and computational constraints in biomedical applications.

Downloads

Published

2026-05-31

How to Cite

SLIPCHENKO, V., Poliahushko, L., & VOLKOV, O. (2026). COMPARATIVE ANALYSIS OF MISSING DATA IMPUTATION METHODS IN BIOMEDICAL RESEARCH: IMPACT ON BIOLOGICAL AGE PREDICTION. Computer Systems and Information Technologies, (2), 254–264. https://doi.org/10.31891/csit-2026-2-21