ENHANCED TWO-STEP AUGMENTATION METHOD FOR ANALYZING SMALL DATASETS IN MEDICAL APPLICATIONS
DOI:
https://doi.org/10.31891/csit-2025-1-18Keywords:
generalized regression neural network, small data, data augmentation, high-dimensional data, regressionAbstract
Despite the enormous possibilities for data collection, situations still often arise where data is scarce. Insufficient data can significantly complicate their effective analysis, since most known approaches require a sufficiently large training sample to obtain accurate predictions. In the field of medicine, the problems of lack of data are quite common for a number of reasons (confidentiality, fragmentation and natural rarity). Accordingly, the development of algorithms that can at least partially eliminate the scarcity of data and demonstrate satisfactory efficiency is relevant. Existing techniques for analyzing small data based on their augmentation can improve the efficiency of traditional methods. However, along with an increase in the number of instances in the sample, the number of features also increases significantly, which can negatively affect the performance of machine learning methods.
In this paper, an improved two-step method was proposed for the intelligent analysis of short high-dimensional data sets based on a generalized regression neural network. A peculiarity of this approach is the avoidance of a multiple increase in the number of features in the augmented sample. The method was used to solve two regression problems: predicting the value of a function and determining the compressive strength of the femur. Both data sets contained less than 100 instances. The optimal parameters were determined using the Dual Annealing optimization algorithm for five distance measures: Euclidean, Chebyshev, Manhattan, Canberra, and cosine. The proposed method showed a significant reduction in errors (such as MAE, RMSE) compared to the traditional GRNN model. The developed technique also surpassed the accuracy of the input doubling method for both solved problems. Along with increasing accuracy, the proposed model also increased the execution time. Therefore, the feasibility of its application depends on the priorities of the problem being solved.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2025 Мирослав ГАВРИЛЮК

This work is licensed under a Creative Commons Attribution 4.0 International License.