Generative Image Dynamics — A Summary

<p>Diffusion models have been around for a while now and I&lsquo;ve always wondered what a good use-case for diffusion models could be. Should I just generate images of baby Yoda riding a bicycle or maybe an astronaut riding a horse?</p> <p>Well, a recent release [<a href="https://generative-dynamics.github.io/static/pdfs/GenerativeImageDynamics.pdf" rel="noopener ugc nofollow" target="_blank">paper</a>] by the Google Research team has led to a crazy good use case i.e. generating looping videos with dynamics that we experience due to motion caused by wind, water currents, respiration, and other natural factors.</p> <p>The basic idea behind this paper is to bring natural object dynamics to images in response to an interactive user excitation. The dataset consists of a large collection of automatically extracted motion trajectories of real video sequences to predict a neural stochastic motion texture i.e. a set of coefficients of a motion basis that characterize each pixel&rsquo;s trajectory into the future.</p> <h1>An Overview</h1> <p>The goal is to get an input image I0 and generate a video of length T featuring oscillation dynamics. GID (Generative Image Dynamics) uses a frequency-coordinated diffusion sampling process to predict a per-pixel long-term motion representation in the Fourier domain, which is called a neural stochastic motion texture. This representation can be converted into dense motion trajectories that span an entire video.</p> <p>The system comprises of 2 modules:</p> <ol> <li>Motion prediction module</li> <li>Image based rendering module</li> </ol> <p><strong>Motion prediction module:</strong>&nbsp;It consists of a Latent Diffusion Model (LDM) that predicts a neural stochastic motion texture (basically a frequency representation of per-pixel motion trajectories) for an input image I0. The predicted neural stochastic motion texture is then transformed to a sequence of motion displacement fields F using inverse discrete Fourier transform. These fields help to determine the position of each input pixel at each future time step.</p> <p><a href="https://aashi-dutt3.medium.com/generative-image-dynamics-a-summary-fd92edce560d"><strong>Read More</strong></a></p>