Browsing by Author "Demidovich, Inna M."

Now showing 1 - 6 of 6

Authorship Determination of Natural Language Texts by Several Classes of Indicators with Customizable Weights
(CEUR-WS Team, Aachen, Germany, 2021) Shynkarenko, Viktor I.; Demidovich, Inna M.
ENG: In this work we try to improve the results of texts and their fragments attribution using the classification method of the least distance in Euclidean space of images, by selecting weights for each of the image measures. For weights determination the genetic algorithm was used. Images are formed using statistical and modified recurrent analysis and the text complexity indicators. We will try to identify the effectiveness for each of them. It was found that this method usage improves the efficiency of the text attribution and the reliability of authorship determination of the texts from the control sample reaches 80-91%.
Constructive-Synthesizing Modeling of Natural Language Texts
(Khmelnytskyi National University, Khmelnytskyi, 2023) Shynkarenko, Viktor I.; Demidovich, Inna M.
ENG: Means for solving the problem of establishing the natural language texts authorship were developed. Theoretical tools consist of a constructors set was developed on the basis of structural and production modeling. These constructors are presented in this work. Some results of experimental studies based on this approach have been published in previous works by the author, the main results should be published in the next ones. Constructors developed: converter of natural language text into tagged, tagged text into a formal stochastic grammar and the authors style similarity degree establishment of two natural language works based on the coincidence of the corresponding stochastic grammars (their substitution rules). In this paper, constructors are developed and presented that model a natural language text in the form of a stochastic grammar that displays the structures of sentences in it. This approach allows you to highlight the syntactic features of the construction of phrases by the author, which is a characteristic of his speech. Working with a sentence as a unit of text for analyzing its construction will allow you to more accurately capture the author's style in terms of the words use, their sequences and speech style characteristic. It allows you not to be tied to specific parts of speech, but reveals the general logic of constructing phrases, which can be more informative in terms of the author's style characteristics for any text. The presented work is a theoretical basis for solving the problems of the text authorship establishing and identifying borrowings. Experimental studies have also been carried out. The statistical similarity of solutions to the problems of establishing authorship and identifying borrowings was experimentally revealed, which will be presented in the next article of the authors. The proposed approach makes it possible to highlight the semantic features of the author's phrases construction, which is a characteristic of his speech. Working with a sentence as a unit of text to analyze its construction will allow you to more accurately determine the author's style in terms of the use of words, their sequences and characteristic language constructions. Allows not to be attached to specific parts of speech, but reveals the general logic of building phrases. It is planned to use the created model in the future to determine the authorship of natural language texts of various directions: fiction and technical literature.
A Dual Approach to Establishing the Authority of Technical Natural Language Texts and Their Components
( Ukrainian State University of Science and Technologies, Dnipro, 2023) Shynkarenko, Viktor I.; Demidovich, Inna M.; Kuropiatnyk, Olena S.
ENG: Purpose. The study is aimed at testing the hypothesis that it is possible to determine plagiarism by methods of establishing the authorship of a text without using a text bank and their direct comparison. Methodology. Construc-tive and productive models of the processes of establishing the authorship of technical texts for two methods have been developed. The first method is based on the formation of a text model in the form of a set of formal substitution rules with probabilistic weights (as in stochastic formal grammars), which reflects the syntactic features and patterns of text formation by the author. The degree of similarity between the text under study and another text is determined by comparing their models. The second method is a classical approach to detecting borrowings (plagiarism) by directly comparing the text under study with an existing text bank, highlighting repeated text fragments, and determining the degree of originality. Experiments were conducted to establish the correlation between the results of these two methods. The experimental base consisted of 509 text sections of theses of students majoring in «Software Engineering». Findings. Experimental studies have made it possible to establish a high correlation between the results of the two methods. Correlation coefficients in the range of 0.75...1.0 and with an average value of 0.88 were obtained provided that borrowings are taken into account for text fragments of at least five words in length. Originality. For the first time, the authors have identified the possibilities and proposed methods for indirect plagiarism detection without using a large text bank. The essence of the model is to formalize the representation of the author's sentence syntax by a set of substitution rules with probabilistic weights. Practical value. Based on the results obtained, the possibilities for detecting borrowings have been expanded and the effectiveness of the corre-sponding methods has been increased. Recommendations on the parameters of classical methods for detecting borrowings have been obtained, in particular, it is recommended to take into account text fragments of at least five words in length as a rational parameter when using borrowing detection systems. The possibilities of text authorship detection methods tested on fiction texts are extended to technical texts.
Methods and Software for Significant Indicators Determination of the Natural Language Texts Author Profile
(Інститут програмних систем НАН України, Київ, 2023) Shynkarenko, Viktor I.; Demidovich, Inna M.
ENG: Methods for the formation and optimization of author profiles are presented. The author profile is an image - a vector in a multidimensional space, which components are author's texts measurements by a number of methods based on 4-grams, stemming, recurrence analysis and formal stochastic grammar. The author's profile is a model of his language, including vocabulary, sentence syntax features. A comparative analysis of the each of the methods effectiveness is carried out. By means of the genetic algorithm, a reduced profile of the author is formed. Insignificant indicators are excluded, which allows to reduce their number by 20%. The reduced author's profile contains attributes that are significant for this author and is an effective attribution of a particular author.
Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task
(IEEE, 2021) Demidovich, Inna M.; Shynkarenko, Viktor I.; Kuropiatnyk, Olena; Kirichenko, Oleksandr
ENG: The previously developed method establishes the natural language texts authorship based on frequency analysis, supplemented by indicators of text complexity and recurrent analysis. The authorship indication problem is reduced to the pattern recognition classical theory. To account for the different individual indicators information content, their weights are taken into account. They are determined according to the maximum number of the correctly established texts authorship from the training sample using a genetic algorithm. This method is used to study the effectiveness of the author's style representation that is based on different types of words processing: two types of words stems and 4-grams. To obtain stems, the adapted Porter stemmer is used and creating a dictionary of the foundations of the Ukrainian language original method is applied, respectively. Taking into account the calculated indicators weights, the reliability of establishing the text authorship in the control sample reached 85-91%.
Program clones detection as natural language texts fragments based on constructive-synthesizing modeling
(CEUR Workshop Proceedings, 2025) Shynkarenko, Viktor I.; Kuropiatnyk, Olena S.; Demidovich, Inna M.
ENG: The developed and tested method for comparing the structure of natural language texts is adapted to the analysis of program texts. The method is based on the use of stochastic grammars, including rules that describe the algorithmic structure of programs. The certain structures appearance probability is calculated as the product of the different program elements probabilities. Constructive-production modeling tools were used to form the rules. An experiment was conducted to verify the possibility of using this method to detect clones in the programs source text in C++ and C#. Different types of tasks and their software implementations were studied: both those that are equivalent in control flow but different in calculations, and vice versa. As a result of the experiments, it was found that programs that solve different tasks but have almost identical algorithms have high values of similarity indicators. If the algorithms are similar, but solve different tasks, the indicators are slightly lower. Similarity indicators from low to medium, obtained in cases where different tasks are solved with different algorithms that is due to the use of a single programming language syntax.