Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task

dc.contributor.authorDemidovich, Innaen
dc.contributor.authorShynkarenko, Viktor I.en
dc.contributor.authorKuropiatnyk, Olenaen
dc.contributor.authorKirichenko, Oleksandren
dc.date.accessioned2022-02-25T17:55:35Z
dc.date.available2022-02-25T17:55:35Z
dc.date.issued2021
dc.descriptionV. Shynkarenko: ORCID 0000-0001-8738-7225; I. Demidovich: ORCID 0000-0002-3644-184X; O. Kuropiatnyk: ORCID 0000-0003-2286-884xen
dc.description.abstractENG: The previously developed method establishes the natural language texts authorship based on frequency analysis, supplemented by indicators of text complexity and recurrent analysis. The authorship indication problem is reduced to the pattern recognition classical theory. To account for the different individual indicators information content, their weights are taken into account. They are determined according to the maximum number of the correctly established texts authorship from the training sample using a genetic algorithm. This method is used to study the effectiveness of the author's style representation that is based on different types of words processing: two types of words stems and 4-grams. To obtain stems, the adapted Porter stemmer is used and creating a dictionary of the foundations of the Ukrainian language original method is applied, respectively. Taking into account the calculated indicators weights, the reliability of establishing the text authorship in the control sample reached 85-91%.en
dc.identifierDOI: 10.1109/CSIT52700.2021.9648829
dc.identifier.citationDemidovich I., Shynkarenko V., Kuropiatnyk O., Kirichenko O. Processing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Task. Computer Sciences and Information Technologies (CSIT 2021). Proceedings of the 16th IEEE International Conference, Lviv, Ukraine, 22–25 September 2021. Lviv, 2021. Vol. 2. P. 48–51. DOI: 10.1109/CSIT52700.2021.9648829.en
dc.identifier.isbn978-1-6654-4258-9 (Print)
dc.identifier.isbn978-1-6654-4257-2 (Online)
dc.identifier.issn2766-3655 (Print)
dc.identifier.issn2766-3639 (Online)
dc.identifier.urihttps://ieeexplore.ieee.org/document/9648829/references#referencesen
dc.identifier.urihttp://eadnurt.diit.edu.ua/jspui/handle/123456789/14720en
dc.language.isoen
dc.publisherIEEEen
dc.subjectnatural language textsen
dc.subjectauthorship attributionen
dc.subjectPorter stemmeren
dc.subjectgenetic algorithmen
dc.subjectrecurrent analysisen
dc.subjectstatistical analysisen
dc.subjecttext classificationen
dc.subjectdictionaryen
dc.subjectpattern recognitionen
dc.subjectКІТuk_UA
dc.titleProcessing Words Effectiveness Analysis in Solving the Natural Language Texts Authorship Determination Tasken
dc.typeArticleen
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Demidovich.pdf
Size:
109.13 KB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed upon to submission
Description: