On-off Spacecraft Relative Control in Sliding Mode Via Reinforcement Learning

Sorochinskii, V. V.; Khoroshylov, S. V.; Levchuk, Ihor L.; Dubovyk, Tetiana M.; Huz, Hanna M.; Romanchuk, Oleksandr O.

On-off Spacecraft Relative Control in Sliding Mode Via Reinforcement Learning

dc.contributor.author	Sorochinskii, V. V.	en
dc.contributor.author	Khoroshylov, S. V.	en
dc.contributor.author	Levchuk, Ihor L.	en
dc.contributor.author	Dubovyk, Tetiana M.	en
dc.contributor.author	Huz, Hanna M.	en
dc.contributor.author	Romanchuk, Oleksandr O.	en
dc.date.accessioned	2026-03-24T10:20:04Z
dc.date.issued	2025
dc.description	Ihor L. Levchuk: ORCID 0000-0002-8983-0558; Tetiana M. Dubovyk: ORCID 0000-0002-2359-2569; Hanna M. Huz: ORCID 0009-0002-2908-8985; Oleksandr O. Romanchuk: ORCID 0000-0003-2623-350X	en
dc.description.abstract	ENG: The paper addresses the problem of on-off spacecraftrelative control in sliding mode for autonomous on-orbit servicing operations under actuator amplitude limits, action discreteness, and parametric uncertainties. The goal is to develop and assess an approach that combines sliding-mode control with modern reinforcement-learning methods tailored for resource-constrained onboard implementation. Relative motion dynamics is formulated in an orbital coordinate frame with normalized states and discretized in time. Binary actions with pulse-width modulation, subject to constraints on the thrust level, pulse duration, and duty cycle, represent the impulsive nature of actuation. We propose a combined synthesis in which the sliding-surface parameters and switching rules are tuned via proximal policy optimization within an actor-critic architecture. The actor and critic are implemented as neural networks that approximate the policy and the value function, respectively. The actor neural network takes the state vector as input information and outputs the mean and standarddeviation of the parameters of the sliding mode control law. The value function penalizes both the state error and control effort, thus enabling a trade-off among the response speed, accuracy, and propellant consumption. Two uncoupled agents are designed to control spacecraft relative orbital motion in in-plane and out-of-plane directions independently. The proximal policy optimization hyperparameters are selected to ensure a trade-off among the learning time, stability, and control performance. The reinforcement-learning agents are trained and analyzed considering four cases that differ in the thrust levels and weighting matrices. The quality functional combines state deviation and thrust use penalties, thus enabling a trade-off among the response speed, accuracy, and propellant consumption. The results confirm the potential of this approach for autonomous spacecraft control under constraints and uncertainty. Compared with reported baselines, the trained agent shows superior robustness to plant-parameter uncertainty, which we attribute to the inherent robust properties of sliding-mode control. These findings have the potential to improve the efficiency and autonomy of on-orbit servicing operations.	en
dc.description.abstract	UKR: Розглянуто задачу відносного імпульсного керування рухом космічного апарата у ковзному режимі для автономних орбітальних сервісних операцій за наявності обмежень на амплітуду керуючих впливів, дискретності дій та параметричних невизначеностей. Метою роботиє розробка й оцінювання підходу, що поєднує принципи ковзного керування з сучасними методами навчання з підкріпленням, орієнтованими на бортову реалізацію з обмеженими ресурсами. Динаміку відносного руху задано в орбітальній системі координат у нормалізованих змінних і дискредитовано. Імпульсний характер впливів виконавчих органів відображено через бінарні дії з широтно-імпульсною модуляцією та обмеженнями на рівень тяги, тривалість і період увімкнень. Запропоновано комбінований синтез, у якому параметри поверхні ковзання та правила перемикання налаштовуються методом проксимальної оптимізації політики з використанням архітектури актор-критик. Актор і критик реалізовані у виглядінейроннихмереж, які відповідно апроксимують політику та функцію цінності. Нейронна мережа актора приймає вектор стану як вхідну інформацію і видає середнє значення та стандартне відхилення параметрів закону керування уковзномурежимі. Функція цінності штрафуєяк за помилку стану, так і за витрати на керування, що дозволяє забезпечити компроміс між швидкістю реагування, точністю та витратою палива. Два незалежні агенти розроблені для керування відносним орбітальним рухом космічного апарата окремо в напрямку площини орбітита у перпендикулярному напрямку. Гіперпараметри оптимізації проксимальноїполітики обранодля забезпечення компромісуміж часом навчання, стабільністю та якістюкерування. Агенти навчання з підкріпленнямнавчeніта проаналізованіз урахуванням чотирьох випадків, що відрізняються рівнями тяги та ваговими матрицями.Функціонал якості об’єднує штрафи за відхилення стану та використання тяги, що дає змогу знаходити компроміс між швидкодією, точністю та витратами робочого тіла. Отримані результати підтверджують потенціал такого підходу для задач автономного керування космічних апаратів в умовах обмежень та невизначеності. У порівнянні з відомими результатами навчений агент продемонстрував кращу робастність по відношенню до невизначенності параметрів моделі об’єкта керування, що пояснюється сильними робастними властивостями керування в ковзному режимі. Отримані результати мають потенціал підвищити ефективність та автономність орбітальних сервісних операцій.	uk_UA
dc.description.sponsorship	Іnstitute of Technical Mechanics of the National Academy of Science of Ukraine and the State Space Agency of Ukraine; Ukrainian State University of Science and Technologies	en
dc.identifier.citation	On-off spacecraft relative control in sliding mode via reinforcement learning. V. V. Sorochinskii et al. Technical mechanics. 2025. Vol. 2025, № 4. P. 77–92. DOI: 10.15407/itm2025.04.077	en
dc.identifier.issn	1561-9184 (Print)	en
dc.identifier.issn	2616-6380 (Online)	en
dc.identifier.uri	https://journal-itm.dp.ua/ojs/index.php/ITM_j1/article/view/157/66	en
dc.identifier.uri	https://crust.ust.edu.ua/handle/123456789/21912	en
dc.language.iso	en
dc.publisher	Technical mechanics	en
dc.subject	einforcement learning	en
dc.subject	proximal policy optimization	en
dc.subject	spacecraft control	en
dc.subject	on-orbit servicing	en
dc.subject	on–off control	en
dc.subject	autonomous control systems	en
dc.subject	навчання з підкріпленням	uk_UA
dc.subject	проксимальна оптимізація політики	uk_UA
dc.subject	керування космічним апаратом	uk_UA
dc.subject	орбітальні сервісні операції	uk_UA
dc.subject	on-off керування	uk_UA
dc.subject	автономні системи керування	uk_UA
dc.subject	ККІТтаА	uk_UA
dc.subject.classification	NATURAL SCIENCES	en
dc.subject.classification	Chemistry	en
dc.subject.classification	TECHNOLOGY	en
dc.subject.classification	Autonomous systems	en
dc.title	On-off Spacecraft Relative Control in Sliding Mode Via Reinforcement Learning	en
dc.type	Article	en

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ст_07.pdf
Size:: 757.63 KB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.71 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Статті ККІТтаР УДХТУ (раніше ККІТтаА УДХТУ)