Метод побудови кризово-контекстного датасету для верифікації Adaptive IRM
Files
Date
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
UKR: Ця робота присвячена не експериментальному підтвердженню ефективності Adaptive IRM, а побудові спеціалізованого кризово-контекстного датасету, який робить таку перевірку можливою в коректній постановці. У статті запропоновано метод перетворення кризових повідомлень із HumAID у пари виду «абстрактний запит – кризово-залежна відповідь», де питання навмисно очищується від прямих маркерів лиха, а правильна інтерпретація потребує відновлення прихованого контексту події. Такий дизайн відрізняється від переважних у crisis informatics задач tweet-level classification, informativeness detection, humanitarian categorization і multimodal crisis annotation, для яких призначені HumAID, CrisisBench, AIDR, TREC-IS і CrisisMMD [1, 2, 3, 4, 5, 6]. У результаті роботи сформовано датасет обсягом 41 152 записи за п'ятьма категоріями кризових подій; під час генерації питань використовувалася схема primary generation -> retry generation -> fallback, причому fallback було задіяно у 1 432 випадках, що становить 3.48% корпусу. Як наступний етап пропонуються формалізована ручна валідація, автоматична retrieval-style перевірка семантичної узгодженості, event-disjoint split на рівні подій HumAID, реалізація Adaptive IRM і порівняння LLM-baseline, LLM+Adaptive IRM, RAG і PEFT-baselines із розширеним набором автоматичних і ручних метрик [7, 8, 9, 10, 11, 12, 13, 14, 15].
ENG: Recent research in the field of crisis informatics is largely focused on the automatic processing of social media messages during emergency situations. Existing crisis corpora, including HumAID, CrisisBench, AIDR, TREC-IS, and CrisisMMD, provide an important foundation for message classification, informativeness detection, humanitarian categorization, prioritization, and multimodal annotation tasks. At the same time, most of these resources are oriented toward the analysis of individual messages or the identification of their class, rather than toward verifying the ability of a large language model to reconstruct hidden crisis context from an abstract query. With the development of large language models, there is a growing need for specialized datasets that make it possible to evaluate not only the general linguistic competence of a model, but also its ability to adapt a response to a context that is not explicitly specified in the user’s question. The purpose of this work is to develop a method for constructing a crisis-context dataset for the subsequent verification of Adaptive IRM in tasks of hidden contextual adaptation of large language model responses. To achieve this purpose, it is proposed to transform crisis messages from the HumAID corpus into pairs of the form “abstract query — crisis-dependent answer”, where the question does not contain direct markers of the disaster type but preserves a semantic connection with the original message. The paper proposes a generative dataset construction pipeline that includes primary generation, retry generation, and a fallback mechanism. For each crisis message, a locally deployed large language model generates a short WH-question to which the original tweet should provide a direct answer. After generation, the question undergoes automatic validation according to formal criteria: the presence of an interrogative structure, the absence of a yes/no form, ending with a question mark, compliance with the length limit, the absence of undesirable template-like formulations, and the absence of direct crisis markers such as earthquake, hurricane, flood, disaster, emergency, and others. If the initial question does not meet the specified requirements, a retry generation step is performed using a stricter instruction. In case of a repeated failure, a fallback question is applied, which makes it possible to preserve the completeness of the corpus. As a result, a dataset of 41,152 records was formed across five categories of crisis events: hurricanes, earthquakes, cyclones, wildfires, and floods. The fallback mechanism was used in 1,432 cases, which accounts for 3.48% of the corpus. The main result of the study is a method for transforming crisis messages into “abstract query — crisis-dependent answer” pairs and the dataset constructed on its basis for the future verification of hidden contextual adaptation in LLMs. The proposed approach differs from classical crisis datasets in that it models a situation in which the question does not directly indicate the type of disaster, while the correct answer requires taking into account the hidden context of the event. Future work includes formalized manual validation of the corpus, automatic retrieval-style verification of semantic consistency, construction of an event-disjoint split, implementation of Adaptive IRM, and comparison with LLM-baseline, RAG, and PEFT-baseline approaches.
