Generative Image Dynamics — A Summary
<p>Diffusion models have been around for a while now and I‘ve always wondered what a good use-case for diffusion models could be. Should I just generate images of baby Yoda riding a bicycle or maybe an astronaut riding a horse?</p>
<p>Well, a recent release [<a href="https://generative-dynamics.github.io/static/pdfs/GenerativeImageDynamics.pdf" rel="noopener ugc nofollow" target="_blank">paper</a>] by the Google Research team has led to a crazy good use case i.e. generating looping videos with dynamics that we experience due to motion caused by wind, water currents, respiration, and other natural factors.</p>
<p>The basic idea behind this paper is to bring natural object dynamics to images in response to an interactive user excitation. The dataset consists of a large collection of automatically extracted motion trajectories of real video sequences to predict a neural stochastic motion texture i.e. a set of coefficients of a motion basis that characterize each pixel’s trajectory into the future.</p>
<h1>An Overview</h1>
<p>The goal is to get an input image I0 and generate a video of length T featuring oscillation dynamics. GID (Generative Image Dynamics) uses a frequency-coordinated diffusion sampling process to predict a per-pixel long-term motion representation in the Fourier domain, which is called a neural stochastic motion texture. This representation can be converted into dense motion trajectories that span an entire video.</p>
<p>The system comprises of 2 modules:</p>
<ol>
<li>Motion prediction module</li>
<li>Image based rendering module</li>
</ol>
<p><strong>Motion prediction module:</strong> It consists of a Latent Diffusion Model (LDM) that predicts a neural stochastic motion texture (basically a frequency representation of per-pixel motion trajectories) for an input image I0. The predicted neural stochastic motion texture is then transformed to a sequence of motion displacement fields F using inverse discrete Fourier transform. These fields help to determine the position of each input pixel at each future time step.</p>
<p><a href="https://aashi-dutt3.medium.com/generative-image-dynamics-a-summary-fd92edce560d"><strong>Read More</strong></a></p>