Modeling and automation of the process for detecting duplicate objects in memory snapshots
DOI:
https://doi.org/10.15276/hait.07.2024.10Keywords:
Optimization, algorithm, performance, memory snapshot, duplication, stringAbstract
The paper is devoted to the problem of detecting increased memory usage by software applications. The modern software development cycle is focused on functionality and often overlooks aspects of optimal resource utilization. Limited physical scalability sets an upper limit on the system's capacity to handle requests. The presence of immutable objects with identical information indicates increased memory consumption. Avoiding duplicates of objects in memory allows for more rational use of existing resources and increases the volumes of processed information. Existing scientific publications focus on investigating
memory leaks, limiting attention to excessive memory use due to the lack of a unified model for finding excessive memory use. It should be noted that existing programming patterns include the “object pool” pattern, but leave the decision on its implementation to engineers without providing mathematical grounding. This paper presents the development of a mathematical model for the process of detecting duplicates of immutable String type objects in a memory snapshot. Industrial systems that require hundreds of gigabytes of random-access memory to operate and contain millions of objects in memory have been analyzed. At such data scales, there is a need to optimize specifically the process of finding duplicates. The research method is the analysis of memory snapshots of high-load systems using software code developed on.NET technology and the ClrMD library. A memory snapshot reflects the state of the process under investigation at a particular moment in time, containing all objects, threads, and operations being performed. The ClrMD library allows programmatic exploration of objects, their types, obtaining field values, and constructing graphs of relationships between objects. The series of experiments was conducted on Windows-backed machines, although similar results can be obtained on Linux thanks to cross-platform object memory layout pattern. The results of the study proposed an optimization that allows speeding up the process of finding duplicates several times. The scientific contribution of the research lies in the creation of a mathematically substantiated approach that significantly reduces memory resource use and optimizes computational processes. The practical utility of the model is confirmed by the optimization results achieved thanks to the obtained recommendations, reducing hosting costs (which provides greater economic efficiency in the deployment and use of software systems in industrial conditions), and increasing the volumes of processed data.