Аналіз використання модернових embedding-моделей для автоматичного пошуку підсанкційних осіб на прикладі санкційного списку OFAC SDN

Павленко, Єгор Вікторович; Гнатушенко, Володимир Володимирович

doi:https://doi.org/10.32782/EIS/2025-108-9

Аналіз використання модернових embedding-моделей для автоматичного пошуку підсанкційних осіб на прикладі санкційного списку OFAC SDN

dc.contributor.author	Павленко, Єгор Вікторович	uk_UA
dc.contributor.author	Гнатушенко, Володимир Володимирович	uk_UA
dc.date.accessioned	2025-12-11T12:15:37Z
dc.date.issued	2025
dc.description	Є. Павленко: ORCID 0009-0004-0600-3090; Вол. Гнатушенко: ORCID 0000-0003-3140-3788	uk_UA
dc.description.abstract	UKR: У статті досліджується ефективність використання сучасних текстових ембединґів і варіацій їх навчання для «наївного» автоматичного пошуку підсанкційних осіб у фінансових транзакціях на прикладі санкційного списку OFAC SDN. Зростання вимог до комплаєнс-процедур та недоліки традиційних методів скринінгу (низька точність, обмежена масштабованість, фрагментарність даних) підкреслюють актуальність дослідження. Авторами запропоновано архітектуру системи, яка інтегрує векторні бази даних з API для Google Embeddings та Gemini API, використовуючи «наївний» підхід до обробки даних без складних процедур попередньої підготовки даних. Проведено експериментальну валідацію із застосуванням чотирьох стратегій векторизації (Stringified JSON, Stringified Non-Empty, Flattened Key-Value, Flattened Non-Empty) та різних типів завдань для ембединґ-моделей. Було порівняно результати з існуючими системами скринінгу, включаючи власну реалізацію OFAC. Отримані дані свідчать, що хоча «наївний» підхід забезпечує впевнені результати для подальшої обробки людиною або LLM (у рамках RAG-систем), але для повністю автоматизованих транзакційних систем, що працюють за пороговим значенням, потрібна більш складна попередня підготовка даних. Показано, що традиційні fuzzy-matching-алгоритми (Soundex, Jaro-Winkler), які застосовані у пошуку на сайті OFAC, забезпечують високу точність для імен, що точно збігаються із записами у санкційному списку. Проте їх ефективність знижується за транслітерації та варіацій у транслітерації, при цьому діапазони показників для істинно позитивних і хибнопозитивних результатів перекриваються, що ускладнює визначення єдиного граничного значення. Дослідження підкреслює потенціал модернових ембединґів для підвищення точності та масштабованості санкційного скринінгу, але вказує на необхідність подальшої оптимізації.	uk_UA
dc.description.abstract	ENG: This article investigates the effectiveness of modern text embeddings and variations in their training for «aive» automatic detection of sanctioned individuals in financial transactions, using the OFAC SDN sanctions list as a case study. The increasing demands on compliance procedures, along with the limitations of traditional screening methods (low accuracy, limited scalability, fragmented data), highlight the relevance of this research. The authors propose a system architecture that integrates vector databases with the Google Embeddings API and the Gemini API, employing a «naive» approach to data processing that avoids complex preprocessing steps. An experimental validation was conducted using four vectorization strategies («Stringified JSON», «Stringified Non-Empty», «Flattened Key-Value», «Flattened Non-Empty») and different task types for embedding models. The results were compared with existing screening systems, including OFAC's own implementation. The findings indicate that, while the «naive» approach provides reliable results for further human or LLM-assisted processing (within RAG systems), fully automated transaction systems operating based on a threshold value require more sophisticated data preprocessing. It is shown that traditional fuzzy-matching algorithms (Soundex, Jaro-Winkler), as applied in the OFAC website search, achieve high accuracy for names that exactly match entries in the sanctions list. However, their effectiveness decreases with transliteration and variations thereof, and the score ranges for true positives and false positives overlap, complicating the selection of a single threshold value. The study highlights the potential of modern embeddings to improve the accuracy and scalability of sanctions screening, but also emphasizes the need for further optimization.	en
dc.description.sponsorship	НТУ «Дніпровська політехніка», Дніпро	uk_UA
dc.identifier.citation	Павленко Є. В., Гнатушенко Вол. В. Аналіз використання модернових embedding-моделей для автоматичного пошуку підсанкційних осіб на прикладі санкційного списку OFAC SDN. Електротехнічні та інформаційні системи. 2025. № 108. C. 67–77. DOI: https://doi.org/10.32782/EIS/2025-108-9.	uk_UA
dc.identifier.doi	https://doi.org/10.32782/EIS/2025-108-9	en
dc.identifier.issn	2786-9040 (Print)
dc.identifier.issn	2786-9059 (Online)
dc.identifier.uri	https://journals.politehnica.dp.ua/index.php/eis/article/view/888	en
dc.identifier.uri	https://crust.ust.edu.ua/handle/123456789/21371	en
dc.language.iso	en
dc.publisher	Видавничий дім «Гельветика»	uk_UA
dc.rights	Creative Commons Attribution 4.0 International License	en
dc.rights.uri	https://creativecommons.org/licenses/by/4.0/	en
dc.subject	санкційний скринінг	uk_UA
dc.subject	текстові ембединґи	uk_UA
dc.subject	штучний інтелект	uk_UA
dc.subject	семантичний пошук	uk_UA
dc.subject	фонетичний пошук	uk_UA
dc.subject	обробка природної мови	uk_UA
dc.subject	комплаєнс	uk_UA
dc.subject	sanctions screening	en
dc.subject	text embeddings	en
dc.subject	artificial intelligence	en
dc.subject	semantic search	en
dc.subject	phonetic search	en
dc.subject	natural language processing	en
dc.subject	compliance	en
dc.subject	КІТС	uk_UA
dc.subject.classification	TECHNOLOGY	en
dc.subject.classification	TECHNOLOGY::Information technology	en
dc.title	Аналіз використання модернових embedding-моделей для автоматичного пошуку підсанкційних осіб на прикладі санкційного списку OFAC SDN	uk_UA
dc.title.alternative	An Analysis of Modern Embedding Models for the Automated Identification of Sanctioned Individuals: Evidence from the OFAC SDN List	en
dc.type	Article	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Pavlenko.pdf
Size:: 421.69 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Статті КІТС (ДМетІ)