Show simple item record

dc.contributor.advisorSetty, Vinay Jayarama
dc.contributor.authorValvik, Dag Hermann
dc.date.accessioned2023-09-19T15:51:18Z
dc.date.available2023-09-19T15:51:18Z
dc.date.issued2023
dc.identifierno.uis:inspera:129718883:36625893
dc.identifier.urihttps://hdl.handle.net/11250/3090544
dc.description.abstractThis study presents a novel approach to enhancing the performance of artificial agents in complex environments like Minecraft, where traditional reward-based learning strategies can be challenging to apply. To improve the efficacy and efficiency of fine-tuning a foundation model for complex tasks, we propose the Human-Guided Phasic Policy Gradient (HPPG) algorithm, which combines human preference learning with the Phasic Policy Gradient technique. Our key contributions include validating the use of behavioral cloning to improve agent performance and introducing the HPPG algorithm, which employs a reward predictor network to estimate rewards based on human preferences. We further explore the challenges associated with the HPPG algorithm and propose strategies to mitigate its limitations. Through our experiments, we demonstrate significant improvements in the agent’s performance when executing complex tasks in Minecraft, laying the groundwork for future developments in reinforcement learning algorithms for complex, real-world tasks without defined rewards. Our findings contribute to the broader goal of bridging the gap between artificial agents and human-like intelligence.
dc.description.abstract
dc.languageeng
dc.publisheruis
dc.titleHuman-Guided Phasic Policy Gradient in Minecraft: Exploring Deep Reinforcement Learning with Human Preferences in Complex Environments
dc.typeMaster thesis


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record