Domain Shift solved: Predicted Reward Fine-Tuning

Опубликовано: 30 Июль 2024
на канале: Discover AI
1,364
58

Latest AI Research by John Hopkins Univ about Image-based Reinforcement Learning (RL).

Sim-to-Real transfer of "AI intelligence" with domain shift solved by Predicted Reward Fine-Tuning (PRFT). Topic includes Imitation Learning, Behavior Cloning of Intelligence between AI agents.

Predicted Reward Fine-Tuning (PRFT) is a novel approach in reinforcement learning (RL) designed to address the challenge of domain shift, where the visual appearance of the environment changes significantly between training and deployment. Traditional methods like data augmentation and domain randomization often fall short in mitigating the accumulated errors that arise in sequential decision-making processes under such conditions. PRFT leverages the observation that even imperfect predicted rewards can serve as a useful signal for fine-tuning policies in the target domain. By training a reward prediction model alongside the policy in the source domain and then using this model to predict rewards in the target domain, PRFT enables effective policy fine-tuning without the need for ground-truth rewards during deployment. This approach exploits the structural advantages of the reward model’s generalization capabilities, significantly enhancing policy performance across various tasks.

The core methodology involves training the policy and reward prediction model concurrently in the source environment using the Maximum Entropy (MaxEnt) RL algorithm, which optimizes both expected rewards and the entropy of the action distribution to ensure robustness. During fine-tuning, the reward prediction model is frozen, and the policy is updated using predicted rewards generated from interactions in the target environment. Extensive experiments in simulated and real-world settings demonstrate that PRFT substantially outperforms baseline methods, particularly under high-intensity visual distractions and in sim-to-real transfer scenarios. By effectively adapting policies to new domains with significant visual shifts, PRFT presents a robust solution to the generalization problem in image-based RL.

All rights w/ authors:
Adapting Image-based RL Policies via Predicted Rewards
https://arxiv.org/pdf/2407.16842
by Computer Science Department, Johns Hopkins University

#airesearch
#science
#ai


Смотрите видео Domain Shift solved: Predicted Reward Fine-Tuning онлайн без регистрации, длительностью часов минут секунд в хорошем качестве. Это видео добавил пользователь Discover AI 30 Июль 2024, не забудьте поделиться им ссылкой с друзьями и знакомыми, на нашем сайте его посмотрели 1,364 раз и оно понравилось 58 людям.