Human-Guided Phasic Policy Gradient in Minecraft: Exploring Deep Reinforcement Learning with Human Preferences in Complex Environments

Valvik, Dag Hermann

dc.contributor.advisor	Setty, Vinay Jayarama
dc.contributor.author	Valvik, Dag Hermann
dc.date.accessioned	2023-09-19T15:51:18Z
dc.date.available	2023-09-19T15:51:18Z
dc.date.issued	2023
dc.identifier	no.uis:inspera:129718883:36625893
dc.identifier.uri	https://hdl.handle.net/11250/3090544
dc.description.abstract	This study presents a novel approach to enhancing the performance of artificial agents in complex environments like Minecraft, where traditional reward-based learning strategies can be challenging to apply. To improve the efficacy and efficiency of fine-tuning a foundation model for complex tasks, we propose the Human-Guided Phasic Policy Gradient (HPPG) algorithm, which combines human preference learning with the Phasic Policy Gradient technique. Our key contributions include validating the use of behavioral cloning to improve agent performance and introducing the HPPG algorithm, which employs a reward predictor network to estimate rewards based on human preferences. We further explore the challenges associated with the HPPG algorithm and propose strategies to mitigate its limitations. Through our experiments, we demonstrate significant improvements in the agent’s performance when executing complex tasks in Minecraft, laying the groundwork for future developments in reinforcement learning algorithms for complex, real-world tasks without defined rewards. Our findings contribute to the broader goal of bridging the gap between artificial agents and human-like intelligence.
dc.description.abstract
dc.language	eng
dc.publisher	uis
dc.title	Human-Guided Phasic Policy Gradient in Minecraft: Exploring Deep Reinforcement Learning with Human Preferences in Complex Environments
dc.type	Master thesis

Files in this item

Name:: no.uis:inspera:129718883:36625 ...
Size:: 5.173Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Studentoppgaver (TN-IDE) [823]
Studentoppgaver i informasjonsteknologi, datateknikk / kybernetikk, signalbehandling

Show simple item record