Visualizating Stable Diffusion Project Proposal

Lakshyana Kc and Andrew Lemke

 

One main citation


High-Resolution Image Synthesis with Latent Diffusion Models

https://arxiv.org/abs/2112.10752

While the idea of diffusion dates back to 2005, it was not until recently that the results were realized. This paper builds on the previous diffusion paper by quickening the speed of training and inference. Instead of applying diffusion though the image (output) space, the authors advocate for diffusion in the latent space. The improved performance allows for more training while still putting forward quality results. The authors showed best performance in one metric and among the best performance in another metric, proving at minimum, equality to GAN models that dominated image and other generation tasks up to this point.


 

 

Preliminary literature review of the topic

 

Proposal of the main question you wish to answer, and how you aim to do it

 We seek to make a new visualization of the process of diffusion in the context of input text to image. The standard visualization shows the image-in-generation at various points in its de-noising. Our current idea begins the same way, grabbing images though the denoising process, but we will send these images through a separate denoising process and attempt to complete the remaining de-noising in one step. We expect to find a progress of images where visual inconsistencies reduce until the final image. Since stable latent diffusion builds on so many concepts, that themselves build on concepts, our visualization plan may change. For example, the U-net blocks in the diffuser use self attention among other advanced concepts taken from previous breakthroughs, like ResNet. Self attention requires an understanding of attention, itself a breakthrough concept that pushed the capabilities of AI.