DEVELOPMENT AND RESEARCH OF MULTIMODAL NEURAL ARCHITECTURES FOR  HETEROGENEOUS UNBALANCED DATA IN CLASSIFICATION TASKS

Serhii MINUKHIN; Valerii RUDOI

doi:10.31891/csit-2026-1-3

Authors

Serhii MINUKHIN Kharkiv National University of Radio Electronics https://orcid.org/0000-0002-9314-3750
Valerii RUDOI Kharkiv National University of Radio Electronics https://orcid.org/0009-0002-5285-7746

DOI:

https://doi.org/10.31891/csit-2026-1-3

Keywords:

мультимодальні дані, крос-модальна увага, контрастивне навчання, дистиляція знань, прунінг, квантизація, класифікація емоцій, автономна навігація

Abstract

The article presents a comprehensive study of modern multimodal neural architectures for integrating heterogeneous and partially unbalanced data in classification tasks. It considers early and late fusion approaches, hybrid architectures with cross-modal attention, and transformers that allow the formation of consistent latent spaces of visual, auditory, and textual features. Particular attention is paid to contrastive learning (CLIP-like approaches, multimodal InfoNCE), which ensures semantic consistency of representations and improves classification accuracy in the presence of uneven data distribution and rare classes. A model is proposed that combines early and late fusion with cross-modal attention and contrastive learning to form a coherent joint latent space. Features of each modality are processed by specialized encoders, and fusion is performed with adaptive weighting, which minimizes the impact of heterogeneous data imbalance and enables the efficient processing of signals of different natures and intensities. The use of pruning, quantization, and knowledge distillation has reduced computational costs without losing accuracy, ensuring stable model performance in real-world streaming scenarios with limited resources. The results of applying the proposed model to the BDD100K and CMU-MOSEI datasets confirmed the model's high efficiency in processing heterogeneous and unbalanced data. For BDD100K, Accuracy 0.953, F1-score 0.956, ROC-AUC 0.947 were achieved, and the integral indicators Micro F1, Macro F1, and Weighted F1 were 0.953, 0.949, and 0.955, respectively; For CMU-MOSEI, Accuracy 0.956, F1-score 0.969, ROC-AUC 0.968, and the integral indicators Micro F1, Macro F1, and Weighted F1 were 0.956, 0.962, and 0.968, respectively. A comparative analysis of metrics with classical methods,SOTA solutions and AutoML (B-T4SA proved that the developed model provides consistently higher accuracy and consistency of classification for all classes, including rare ones, confirming its ability to effectively adapt to high variability and imbalance of heterogeneous data in real conditions.

DEVELOPMENT AND RESEARCH OF MULTIMODAL NEURAL ARCHITECTURES FOR HETEROGENEOUS UNBALANCED DATA IN CLASSIFICATION TASKS

Authors

DOI:

Keywords:

Abstract

Downloads

Published

How to Cite

Issue

Section

License

Make a Submission

Information

Language

Indexing