Realistic Emotion-controllable Audio Driven Avatars

<p>One of the critical limitations of existing audio-driven deepfakes is the need for more ability to control stylistic attributes. Ideally, we would like to change these aspects, for example, making a generated video happy vs. sad, or to use the speaking style of a particular actor.&nbsp;READ Avatars&nbsp;looks to do exactly this, by modifying existing, high-quality, person-specific models to work with direct control over styles.</p> <p>Having written several blog posts covering deepfake models in the past, this one has special significance to me,&nbsp;<strong>as it is my own.&nbsp;</strong>The paper has just been accepted to this year&rsquo;s BMVC and it is my first accepted paper! In this article, I will cover the motivation, intuition and methodology behind the work.</p> <h1>What is Style?</h1> <p>The first place to start when considering stylistic control is to ask exactly what is meant by style. The answer I usually give is a bit of a cop-out: Style is anything in our data that is not considered content. This may seem to merely shift the definition from one word to another, but it does make the task easier. In the context of audio-driven deepfakes, content is the speech itself, the lip movements that match the audio, as well as the face&rsquo;s appearance.</p> <p>In the case of my research, I usually look at two particular forms of style:&nbsp;<strong><em>emotional&nbsp;</em></strong>and&nbsp;<strong><em>idiosyncratic.</em></strong>&nbsp;Emotional style is simply the emotion expressed on the face, whereas idiosyncratic style refers to the difference in expression between individual people. For example, the way a smile looks on my face compared to yours is an example of idiosyncratic style. These are not the only kinds of styles, but they are among the easiest to demonstrate and work with.&nbsp;<strong>For this work, we used emotion styles only, as we worked on person-specific models.</strong></p> <p><a href="https://pub.towardsai.net/read-avatars-realistic-emotion-controllable-audio-driven-avatars-1351e1fdfee2">Click Here</a></p>