A Deep Learning Approach to Generating Human Faces in 3D Media

Carlsson, Tor Håkon

Carlsson, Tor Håkon

Master thesis

Permanent lenke

https://hdl.handle.net/11250/3087986

Utgivelsesdato

2023

Metadata

Vis full innførsel

Samlinger

Studentoppgaver (TN-IDE) [835]

Beskrivelse

Full text not available

Sammendrag

This paper proposes new machine learning architectures for generating face meshes that can be used in 3D media. They are based on the architecture introduced in "pix2pix: Image-to-image translation with a conditional GAN". Due to the problems faced by generative networks working with 3D mesh representation, these architectures make use of the 2D positio-nmap representation of 3D shapes introduced by Yao Feng et al in 'Joint 3D Face Reconstruction and Dense Alignment with Position Map Regression Network'.

In order to improve on their result, I implement an architecture fine-tuned for image translation in order to learn the relation between a facial image and its position-map, and sought to improve on this by:

1. Using the Wasserstein loss.

2. Changing the generial adversarial network to a VAE-GAN. First the variant introduced in "Energy-based Generative Adversarial Network", then the variant introduced in "Boundary Equilibrium Generative Adversarial Networks".

Unfortunately, the results were not as expected, due to the interaction between the Wasserstein loss and the PatchGAN generator used in the original Pix2Pix architecture, and due to the difficulty of training GANs.

For future work I suggest continuing trying to train the Egan, and a new Flow-GAN Architecture.

Utgiver

uis