Optimization of analysis and minimization of information losses in text mining
DOI:
https://doi.org/10.15276/hait.01.2020.4Keywords:
text analysis, annotation, text mining, software, algorithm, text data, natural languageAbstract
Information is one of the most important resources of today's business environment. It is difficult for any company to succeed without having sufficient information about its customers, employees and other key stakeholders. Every day, companies receive unstructured and structured text from a variety of sources, such as survey results, tweets, call center notes, phone emails, online customer reviews, recorded interactions, emails and other documents. These sources provide raw text that is difficult to understand without using the right text analysis tool. You can do text analytics manually, but the manual process is inefficient. Traditional systems use keywords and cannot read and understand language in emails, tweets, web pages, and text documents. For this reason, companies use text analysis software to analyze large amounts of text data. The software helps users retrieve textual information to act accordingly The most common manual annotation is currently the most common, which can be attributed to the high quality of annotation and its “meaningfulness”. Typical disadvantages of manual annotation systems, textual information analysis systems are the high material costs and the inherent low speed of work. Therefore, the topic of this article is to explore the methods by which you can effectively annotate reviews of various products from the largest marketplace in Ukraine. The following tasks should be solved: to analyze modern approaches to data analysis and processing; to study basic algorithms for data analysis and processing; build a program that will collect data, design the program architecture for more efficient use, based on the use of the latest technologies; clear data using minimize information loss techniques; analyze the data collected, using data analysis and processing approaches; to draw conclusions from the results of all the above works. There are quite a number of varieties of the listed tasks, as well as methods of solving them. This again confirms the importance and relevance of the topic we choose. The purpose of the study is the methods and means by which information losses can be minimized when analyzing and processing textual data. The object of the study is the process of minimizing information losses in the analysis and processing of textual data. In the course of the study, recent research on the analysis and processing of textual information was analyzed; methods of textual information processing and Data Mining algorithms are analyzed.