dc.description.abstract | This study presents a novel approach to enhancing the performance of artificial agents in
complex environments like Minecraft, where traditional reward-based learning strategies
can be challenging to apply. To improve the efficacy and efficiency of fine-tuning a
foundation model for complex tasks, we propose the Human-Guided Phasic Policy
Gradient (HPPG) algorithm, which combines human preference learning with the Phasic
Policy Gradient technique. Our key contributions include validating the use of behavioral
cloning to improve agent performance and introducing the HPPG algorithm, which
employs a reward predictor network to estimate rewards based on human preferences.
We further explore the challenges associated with the HPPG algorithm and propose
strategies to mitigate its limitations. Through our experiments, we demonstrate significant
improvements in the agent’s performance when executing complex tasks in Minecraft,
laying the groundwork for future developments in reinforcement learning algorithms
for complex, real-world tasks without defined rewards. Our findings contribute to the
broader goal of bridging the gap between artificial agents and human-like intelligence. | |