Cross-modal representation learning for accurate harmonized system code classification in е-commerce systems
Main Article Content
Abstract
Accurate classification of goods according to the Harmonized System remains a critical challenge in international trade and e-commerce due to the complexity of product descriptions, ambiguity of textual data, and variability in product representation. The novelty of this study lies in the development of a cross-modal representation learning approach for automated Harmonized System code classification that integrates both textual and visual product information within a unified framework. By leveraging multimodal data, including product descriptions and images, the proposed system improves classification accuracy and robustness compared to traditional approaches that rely solely on textual information. In addition, the proposed framework enables more reliable identification of product characteristics by aligning semantic and visual representations in a shared feature space, which enhances the model’s ability to handle incomplete or ambiguous product descriptions commonly encountered in e-commerce environments. The methodology is based on contrastive learning techniques that align semantic representations across modalities, enabling the model to capture deeper relationships between product attributes and Harmonized System codes. Transformer-based encoders are employed for textual feature extraction, while convolutional or vision transformer architectures are used for image representation. A joint embedding space is constructed to facilitate effective cross-modal interaction and classification. Experimental evaluation is conducted on a real-world e-commerce dataset, demonstrating that the proposed approach significantly outperforms baseline models in terms of accuracy, precision, and recall. The results highlight the effectiveness of multimodal learning in handling noisy, incomplete, and heterogeneous product data commonly encountered in customs and trade environments. The proposed framework contributes to the advancement of intelligent customs classification systems by enhancing automation, reducing human error, and improving compliance in international trade operations. Future work will focus on incorporating explainability mechanisms and extending the model to support multilingual and low-resource scenarios.

