METHOD FOR EXTENDING IMAGE CLASSIFICATOR VIA TEXT METADATA STATISTICAL ANALYSIS

Authors

DOI:

https://doi.org/10.31891/csit-2025-1-20

Keywords:

neural networks, natural language processing, machine learning, finetuning, image classification, multimodal data, statistical data processing

Abstract

The goal of this article is to create a new method for obtaining a neural network for image classification that would work with classes for which examples are not available at the time of training. The task of image classification involves assigning one or more labels to an image based on the objects present in the image. The current state of the art method for creating such neural networks is to train models on the necessary data in a fine-tuning manner. The research methodology is to use existing machine learning models and expand the set of classes that the model operates on by manipulating the weight coefficients of the existing classifier model. The proposed method uses text metadata related to the images and descriptions of object classes to build assumptions about the relationship between different image classes. The method involves, using simple statistical calculations on text data, based on the existing weights of the neural network classifier, generating additional weights for recognizing new classes of objects in the image. The result of the research is the development of an algorithm for obtaining a classifier model that works with a class or classes that are not available during training. The model shows a classification accuracy result higher than the basic random one. At the same time, the classification accuracy for new classes, expressed in the F-score measure, is approximately 0.66, which is lower than the corresponding F-score measure for classes that were present during training, which is approximately 0.93. Also, the paper shows the limitations of the statistics-based approach to fine tuning, highlighting that it is not a full replacement for the classical model training. The scientific novelty lies in the development of methods for expanding image classifier models using statistical analysis of text metadata. The practical significance of the research lies in two aspects. The first aspect is obtaining a more stable base line of classification quality for classes that are added to the models after training using more sophisticated methods. The second aspect is obtaining a method for expanding the classifier for cases when extra data for additional training is not available and the training process itself is not possible due to a lack of computational resources.

Downloads

Published

2025-03-27

How to Cite

DASHENKOV, D., & SMELYAKOV, K. (2025). METHOD FOR EXTENDING IMAGE CLASSIFICATOR VIA TEXT METADATA STATISTICAL ANALYSIS. Computer Systems and Information Technologies, (1), 171–177. https://doi.org/10.31891/csit-2025-1-20