ENHANCED TWO-STEP AUGMENTATION METHOD FOR ANALYZING SMALL DATASETS IN MEDICAL APPLICATIONS

Authors

DOI:

https://doi.org/10.31891/csit-2025-1-18

Keywords:

generalized regression neural network, small data, data augmentation, high-dimensional data, regression

Abstract

Despite the enormous possibilities for data collection, situations still often arise where data is scarce. Insufficient data can significantly complicate their effective analysis, since most known approaches require a sufficiently large training sample to obtain accurate predictions. In the field of medicine, the problems of lack of data are quite common for a number of reasons (confidentiality, fragmentation and natural rarity). Accordingly, the development of algorithms that can at least partially eliminate the scarcity of data and demonstrate satisfactory efficiency is relevant. Existing techniques for analyzing small data based on their augmentation can improve the efficiency of traditional methods. However, along with an increase in the number of instances in the sample, the number of features also increases significantly, which can negatively affect the performance of machine learning methods.

In this paper, an improved two-step method was proposed for the intelligent analysis of short high-dimensional data sets based on a generalized regression neural network. A  peculiarity of this approach is the avoidance of a multiple increase in the number of features in the augmented sample. The method was used to solve two regression problems: predicting the value of a function and determining the compressive strength of the femur. Both data sets contained less than 100 instances. The optimal parameters were determined using the Dual Annealing optimization algorithm for five distance measures: Euclidean, Chebyshev, Manhattan, Canberra, and cosine. The proposed method showed a significant reduction in errors (such as MAE, RMSE) compared to the traditional GRNN model. The developed technique also surpassed the accuracy of the input doubling method for both solved problems. Along with increasing accuracy, the proposed model also increased the execution time. Therefore, the feasibility of its application depends on the priorities of the problem being solved.

Downloads

Published

2025-03-27

How to Cite

HAVRYLIUK, M. (2025). ENHANCED TWO-STEP AUGMENTATION METHOD FOR ANALYZING SMALL DATASETS IN MEDICAL APPLICATIONS. Computer Systems and Information Technologies, (1), 156–162. https://doi.org/10.31891/csit-2025-1-18