By: Ali Alyaqoub and Adam Belfki
Original Paper Link: Erasing Concepts from Diffusion Models
Code Repository Link: https://github.com/aalyaqoub/nn_final
* The code for the ESD implementation is undisclosed *
Significant recent advancements in the field of GenAI have been achieved with Diffusion Models for image generation. Hence, many efforts have been deployed to ensure the generation of appropriate content by avoiding NSFW images and addressing copy-right issues. A number of solutions have been proposed to tackle this problem with different approaches centered around filtering, during either pre-training or post-generation.
In this project we decide to look into a novel method focused on fine-tuning diffusion models using negative guidance as a teacher for the erasure of concepts, which has proven to be more effective than the aforementioned techniques. Most specifically, we are interested in answering the following questions:
The paper at hand proposes a method to remove concepts permanently from a diffusion model which makes it very hard to circumvent even if the user has access to the model weights. Previous methods based on filtering datasets are fundamentally costly and not flexible to problems discovered after training while filtering generated content can easily be circumvented by end users.
To address these issues, ESD relies on fine-tuning pre-trained diffusion models to minimize the probability of generating certain images linked to a targeted concept and using no additional data. The fine-tuning process involves using several instances of the model with one set of parameters frozen while training the other set of parameters to erase the concept. Partially de-noised images, constrained by the concept, are sampled from the training model while inference is performed twice on the frozen model to predict noise both conditioned and unconditioned on the prompt which combined to negate the predicted noise associated with the targeted concept. This technique modifies the scoring function to move the data distribution to minimize the generation images that can be labeled with the targeted prompt.
The degree of success attributed to the erasure of a concept depends on the subset of parameters fine-tuned with a main distinction between cross-attention and non-cross-attention parameters. In the paper, ESD-x refers to the process of fine-tuning the cross-attention parameters which is used to perform controlled erasure specific to the prompt. This method has demonstrated impressive results for specific artistic style removal. On the other hand, ESD-u refers to non-cross-attention fine-tuned parameters which tend to contribute to a broader range of visual concepts when the erasure is required to be independent of the text in the prompt which is crucial for the generation of nudity free content.
The experimental study for this solution on its artistic style removal capabilities involved erasing 5 modern artists and conducting a user study to measure the human perception of the effectiveness of the fine-tuned model compared to other approaches. The study showed that ESD-x had superior results demonstrated by its ability to erase the intended style better but also to keep minimum interference with the other styles. The performance of ESD-u-n for removing explicit content was measure against SLD and SD V.20 and showed that the method proposed in this paper outperforms both inference methods and models trained on filtered datasets.
Erasing an entire object class still involves some limitations represented by a trade-off between complete erasure of a visual concept and interference with other visual concepts.
To test out the models effect on different model sizes, we used the following small stable diffusion and mini stable diffusion models which are smaller implementations of Stable Diffusion. Our method consits of erasing the same concept from each model along side the SD-V1.4 model originally employed by the paper and then analyze the impact of erasing object on the model through inference with multiple prompts with related to the concept being erased.
We theorize that the model is not completely unlearning a particular concept, but rather suppressing it from being expressed by making the probability of generating images with the concept very low. Since the concept that was erased is still present in the model, it could be possible to relearn concepts by fine-tuning the model with the same objective as the erasure process, but with a constraint that prioritizes the generation of images with the concept we desire. A traditional way to relearn the concept would be to fine-tune the model with data of the concept you desire to relearn. The method we are testing has the benefit of not requiring training data of the concept that was erased as the model will depend on the suppressed knowledge of the concept already in its weights.
Erasing concepts via ESD-u can have a severe affect on smaller models' ability to generate images. We tested the erasure of the object "car" on the normal, small, and mini diffusion models and found that the smaller models struggled with removing the object. For both the small and the mini model, removing the object by fine-tuning the non-cross attention parameters caused the model to lose its ability to generate proper images and content. Post fine-tuning, the models' ability to generate any image, including those of other objects, was severely hampered indicating that the model was dead. We also tried removing global concepts, like nudity, using the ESD-u method and found that the models produced similar nonsensical images. That being said, the smaller models were able to remove styles from the model using ESD-x and continue generating proper images. We belive that the smaller models are not able to handle erasure through ESD-u since they are trained using a much smaller number of parameters and weights. A smaller model indicates that complex features are captured jointly by the same weights, and so modify certain weights to erase concept will inevitably effect the other concepts learned by the model. In this case, ESD-u has ramifications that extend beyond the scope of the desired concept to erase.
Prompt: "A car zooming through space at the speed of
light."
Erasure Method: "car" using ESD-u
Original SD
SD with car erased using ESD-u
Original Mini SD image
Mini SD with car erased using ESD-u
Showcasing model is dead
Erasure Method: "car" using ESD-u
A plane in the sky
A fish in a lake
Picnic by lake
Friends graduating
Prompt: "A cat by a windows Van Gogh style."
Erasure Method: "Van Gogh" using ESD-x
Original SD
SD with Van Gogh erased using ESD-x
Original Mini SD image
Mini SD with Van Gogh erased using ESD-x
Prompt: "A cat by a window {X} style."
Erasure Method: "Van Gogh" using ESD-x
Mini SD style Studio Ghibli
Mini SD Van Gogh erased style Studio Ghibli
Mini SD style Claude Monet
Mini SD Van Gogh erased style Claude Monet
When we attempted to relearn the concept of Van Gogh and the object car we found that the models were not able to resurface the information that was removed. We believe this is the case because after the original epochs to erase the concept from the model the predicted noise of the concept is so low that relearning the concept from it does not effect the overall model much.
Prompt: "A car zooming through space at the speed of
light."
Relearn Method: "car" using ESD-u
Original SD
SD with car erased using ESD-u
Relearn car
Prompt: "A cat by a windows Van Gogh style."
Relearn Method: "Van Gogh" using ESD-x
Original SD
SD with Van Gogh erased using ESD-x
Relearn Van Gogh
In conclusion, we have discovered that the ESD-u method for erasing global concepts or conducting object removal has severe affects on smaller models, which causes them to simply die and stop generating acceptable content.
Additionally, we learned that relearning concepts or making erased features resurface is not an easy task, and it might involve serious retraining efforts which indicate that the erasure successfully achieves its intended purpose.
Based on our findings, we belive the method for erasing concepts introduced in the paper is a viable solution for removing unwanted content from models post training. That being said, more work needs to be done to find how to implement this on smaller models and fully understand the effects of the fine tunning. Implementing this on smaller models is particularly important with the advancement of on-device Machine Learning models technology, which involves smaller models, and so it will be crucial to develop a way for achieving proper results on these models, similar to ESD on larger models. Additionally, we noticed after removal of a concept the model at times would preffer to produce images with less color and research is needed to understand why this is the case.
[1] Rohit Gandikota, Joanna MaterzyĆska, Jaden Fiotto-Kaufman, David Bau. Erasing Concepts from Diffusion Models. Proceedings of the 2023 IEEE International Conference on Computer Vision (ICCV 2023).
Ali Alyaqoub | alyaqoub.a@northeastern.edu
Adam Belfki | belfki.a@northeastern.edu