Статистичний аналіз тексту та дослідження динаміки точності класифікації

Островська, Катерина Юріївна; Фененко, Тетяна Михайлівна; Глущенко, Олександр

Статистичний аналіз тексту та дослідження динаміки точності класифікації

dc.contributor.author	Островська, Катерина Юріївна	uk_UA
dc.contributor.author	Фененко, Тетяна Михайлівна	uk_UA
dc.contributor.author	Глущенко, Олександр	uk_UA
dc.date.accessioned	2023-05-06T15:33:46Z
dc.date.available	2023-05-06T15:33:46Z
dc.date.issued	2022
dc.description	Т. Фененко: ORCID 0000-0002-7631-3148; К. Островська: ORCID 0000-0002-9375-4121	uk_UA
dc.description.abstract	UKR: Робота присвячена статистичному аналізу тексту та дослідженню динаміки точності класифікації. У роботі проводиться відбір статистичних ознак тексту, класифікація текстів, що належать різним авторам, та дослідження динаміки точності класифікації в залежності від довжини текстових фрагментів. Для вирішення поставленого завдання використовувалися: методи обробки природної мови; статистичні характеристики текстів; методи машинного навчання; методи зниження розмірності для можливості візуалізації. На основі отриманої динаміки зміни точності класифікації в залежності від довжин текстових фрагментів було зроблено відповідні висновки щодо оптимальної довжини текстів, що використовуються для навчання та тестування моделей. Завдання вирішувалося у програмному середовищі Jupyter Notebook дистрибутива Anaconda, який дозволяє одразу встановити Python та необхідні бібліотеки.	uk_UA
dc.description.abstract	ENG: The work is devoted to the statistical analysis of the text and the study of the dynamics of classification. In the work, the selection of statistical features of the text, the classification of texts belonging to different authors, and the study of the dynamics of classification accuracy depending on the length of text fragments are carried out. To solve the problem, the following methods were used: natural language processing methods; statistical characteristics of texts; machine learning methods; dimensionality reduction methods for visualization capability. On the basis of the obtained dynamics of changes in classification accuracy depending on the lengths of text fragments, appropriate conclusions were drawn regarding the optimal length of texts used for training and testing models. The task was solved in the Jupyter Notebook software environment of the Anaconda distribution, which allows you to immediately install Python and the necessary libraries.	en
dc.identifier	DOI: 10.34185/1562-9945-5-142-2022-06
dc.identifier.citation	Островська К. Ю., Фененко Т. М., Глущенко О. О. Статистичний аналіз тексту та дослідження динаміки точності класифікації. Системні технології. Дніпро, 2022. Т. 5. № 142. С. 60–68. DOI: 10.34185/1562-9945-5-142-2022-06.	uk_UA
dc.identifier.issn	1562-9945 (Print)
dc.identifier.issn	2707-7977 (Online)
dc.identifier.uri	https://journals.nmetau.edu.ua/index.php/st/issue/view/126	en
dc.identifier.uri	https://journals.nmetau.edu.ua/index.php/st/issue/view/126/99	en
dc.identifier.uri	https://crust.ust.edu.ua/handle/123456789/16943	en
dc.language.iso	uk_UA
dc.publisher	Український державний університет науки і технологій, ННІ ≪Інститут промислових та бізнес технологій≫, ІВК ≪Системні технології≫, Дніпро	uk_UA
dc.rights	Creative Commons Attribution 4.0 International License	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en
dc.subject	машинне навчання	uk_UA
dc.subject	статистичний аналіз текста	uk_UA
dc.subject	визначення авторства	uk_UA
dc.subject	аналіз даних	uk_UA
dc.subject	обробка природної мови	uk_UA
dc.subject	machine learning	en
dc.subject	statistical text analysis	en
dc.subject	authorship determination	en
dc.subject	data analysis	en
dc.subject	natural language processing	en
dc.subject	КІТС	uk_UA
dc.title	Статистичний аналіз тексту та дослідження динаміки точності класифікації	uk_UA
dc.title.alternative	Statistical Text Analysis and Study of the Dynamics of Classification Accuracy	en
dc.type	Article	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ostrovska.pdf
Size:: 309.33 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Том 5 № 142 (СТ ІПБТ)
Статті КІТС (ДМетІ)