On January 22, 2025, a meeting of the members of the editorial board and editorial board of the journals “Applied Aspects of Information Technology” and “Herald of Advanced Information Technology” was held (Read more)

Improved segmentation model to identify object instances based on textual prompts

Authors

  • Sergii V. Mashtalir Kharkiv National University of Radio Electronics.14, Nauky Ave. Kharkiv, 61166, Ukraine
  • Andrii R. Kovtunenko Kharkiv National University of Radio Electronics.14, Nauky Ave. Kharkiv, 61166, Ukraine 

DOI:

https://doi.org/10.15276/hait.08.2025.4

Keywords:

deep learning, image segmentation, convolution neural networks, transformers, contrastive language-image pretraining, open-set segmentation

Abstract

The rapidly increasing amount of multimedia information requires significant methods development for its rapid processing. In this case, one of the areas of processing is preliminary analysis with the images characteristic features detection to reduce the information required for subsequent tasks. One of the types for an information reduction is image segmentation. In this case, the general task of image segmentation is often reduced to the task of object segmentation is a fundamental task in computer vision, requiring accurate pixel-by-pixel object delineation and scene understanding. With the development of natural language processing techniques, many approaches have been successfully adapted to computer vision tasks, allowing for more intuitive descriptions of scenes using natural language. Unlike traditional models limited to a fixed set of classes, natural language processing-based approaches allow searching for objects based on attributes, expanding their applicability. While existing object segmentation methods are typically categorized into one-stage and two-stage methods – depending on speed and accuracy - there remains a gap in developing models that can effectively identify and segment objects based on textual prompts. To address this, we propose an open-set instance segmentation model capable of detecting and segmenting objects from prompts. Our approach builds upon CLIPSeg, integrating architectural modifications from Panoptic-DeepLab and PRN (Panoptic Refinement Network) to predict object centers and pixel-wise distances to boundaries. A post-processing phase refines segmentation results to improve object separation. The proposed architecture is trained on large vocabulary instance segmentation and PhraseCut datasets and evaluated using the mean Dice score against state-of-the-art open-set segmentation models. Experimental results show that although our model achieves the highest inference rate among open-set methods while maintaining FastSAM-level segmentation quality, post-processing remains a limiting factor. This suggests that future improvements should be aimed at eliminating the post-processing process itself or improving its algorithm, which could lead to more efficient segmentation.

Downloads

Download data is not yet available.

Author Biographies

Sergii V. Mashtalir, Kharkiv National University of Radio Electronics.14, Nauky Ave. Kharkiv, 61166, Ukraine

Doctor of Engineering Science, Professor, Informatics Department

Scopus Author ID: 36183980100

Andrii R. Kovtunenko, Kharkiv National University of Radio Electronics.14, Nauky Ave. Kharkiv, 61166, Ukraine 

PhD student, Informatics Department

Scopus Author ID: 58362751200

Downloads

Published

2025-04-04

How to Cite

Mashtalir, S. . V., & Kovtunenko, A. R. (2025). Improved segmentation model to identify object instances based on textual prompts. Herald of Advanced Information Technology, 8(1), 54–66. https://doi.org/10.15276/hait.08.2025.4