Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task

Demidovich, Inna M.; Shynkarenko, Viktor I.; Kuropiatnyk, Olena; Kirichenko, Oleksandr

Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task

Files

Demidovich.pdf (109.13 KB)

Date

2021

Authors

Demidovich, Inna M.

Shynkarenko, Viktor I.

Kuropiatnyk, Olena

Kirichenko, Oleksandr

Publisher

IEEE

Abstract

ENG: The previously developed method establishes the natural language texts authorship based on frequency analysis, supplemented by indicators of text complexity and recurrent analysis. The authorship indication problem is reduced to the pattern recognition classical theory. To account for the different individual indicators information content, their weights are taken into account. They are determined according to the maximum number of the correctly established texts authorship from the training sample using a genetic algorithm. This method is used to study the effectiveness of the author's style representation that is based on different types of words processing: two types of words stems and 4-grams. To obtain stems, the adapted Porter stemmer is used and creating a dictionary of the foundations of the Ukrainian language original method is applied, respectively. Taking into account the calculated indicators weights, the reliability of establishing the text authorship in the control sample reached 85-91%.

Description

V. Shynkarenko: ORCID 0000-0001-8738-7225; I. Demidovich: ORCID 0000-0002-3644-184X; O. Kuropiatnyk: ORCID 0000-0003-2286-884x

Keywords

natural language texts, authorship attribution, Porter stemmer, genetic algorithm, recurrent analysis, statistical analysis, text classification, dictionary, pattern recognition, КІТ

Citation

Demidovich I., Shynkarenko V., Kuropiatnyk O., Kirichenko O. Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task. Computer Sciences and Information Technologies (CSIT 2021). Proceedings of the 16th IEEE International Conference, Lviv, Ukraine, 22–25 September 2021. Lviv, 2021. Vol. 2. P. 48–51. DOI: 10.1109/CSIT52700.2021.9648829.

URI

https://ieeexplore.ieee.org/document/9648829/references#references
https://crust.ust.edu.ua/handle/123456789/14720

Collections

Статті КІТ

Full item page

Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Abstract

Description

Keywords

Citation

URI

Collections

Endorsement

Review

Supplemented By

Referenced By