IMAGE ROTATION-INVARIANT REPRESENTATION VIA REMOVAL OF ORIENTATION FEATURES FROM THE ENCODER LATENT SPACE

Authors

DOI:

https://doi.org/10.31891/csit-2025-2-13

Keywords:

variational autoencoder, feature disentanglement, rotation invariance, semantic representation, convolutional architecture, image classification, algorithms, machine learning

Abstract

In many computer vision tasks, accurate object recognition is complicated by arbitrary object orientations. Ensuring rotation invariance is critical for improving classification accuracy and reducing errors related to the varying placement of objects. This issue is particularly important in real-world environments, where object orientation is rarely controlled.

The goal of this study is to develop a method that allows separating rotational features from the semantic essence of an object, while preserving high classification accuracy after removing orientation-related components. This approach enables the construction of models that remain effective under a wide range of input perspectives, thus improving robustness in practical applications.

The proposed method is based on using a convolutional variational autoencoder trained on a dataset of images subjected to various rotation angles. Linear regression is then used to identify those latent components that correlate most strongly with the rotation parameter. These components are removed, and the remaining features are used for classification. Additionally, image reconstruction is performed from the reduced latent vector to visually validate rotation invariance and evaluate the preservation of object shape.

Experiments on a synthetically rotated binarized digit dataset show that removing “rotational” components from the latent space does not lead to a critical drop in overall classification accuracy. Instead, the removed components primarily influence orientation, supporting the possibility of clearly disentangling geometric and semantic features. Images reconstructed without these components remain recognizable but appear rotation-normalized, indicating the suppression of orientation information. A quantitative assessment confirms that the loss in accuracy is proportional to the contribution of removed components in the rotation regression.

The scientific novelty of this work lies in introducing a simple and reproducible method for removing orientation-related features from the latent space of an autoencoder without modifying the model architecture or introducing specialized regularizers. The practical significance of the method is in reducing the influence of arbitrary object orientation on recognition accuracy, thereby increasing the universality and reliability of vision systems in uncontrolled settings. The proposed approach may be useful for building classifiers capable of handling images with varying or unknown orientations during data collection.

Downloads

Published

2025-06-26

How to Cite

BEDRATIUK, A. (2025). IMAGE ROTATION-INVARIANT REPRESENTATION VIA REMOVAL OF ORIENTATION FEATURES FROM THE ENCODER LATENT SPACE. Computer Systems and Information Technologies, (2), 112–122. https://doi.org/10.31891/csit-2025-2-13