Stable Diffusion (2022)

Stable Diffusion

Task: Text2Image

Abstract

Stable Diffusion is a latent diffusion model conditioned on the text embeddings of a CLIP text encoder, which allows you to create images from text inputs. This model builds upon the CVPR'22 work High-Resolution Image Synthesis with Latent Diffusion Models. The official code was released at stable-diffusion and also implemented at diffusers. We support this algorithm here to facilitate the community to learn together and compare it with other text2image methods.

A mecha robot in a favela in expressionist style	A Chinese palace is beside a beautiful lake	A panda is having dinner in KFC

Pretrained models

Model	Dataset	Download
stable_diffusion_v1.5	-	model

We use stable diffusion v1.5 weights. This model has several weights including vae, unet and clip.

You should download the weights from stable-diffusion-1.5 and change the 'pretrained_model_path' in config to the weights dir.

Download with git:

git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5

Quick Start

Running the following codes, you can get a text-generated image.

from mmengine import MODELS, Config
from torchvision import utils

from mmengine.registry import init_default_scope

init_default_scope('mmedit')

config = 'configs/stable_diffusion/stable-diffusion_ddim_denoisingunet.py'
config = Config.fromfile(config).copy()
config.model.init_cfg.pretrained_model_path = '/path/to/your/stable-diffusion-v1-5'

StableDiffuser = MODELS.build(config.model)
prompt = 'A mecha robot in a favela in expressionist style'
StableDiffuser = StableDiffuser.to('cuda')

image = StableDiffuser.infer(prompt)['samples']
utils.save_image(image, 'robot.png')

Comments

Our codebase for the stable diffusion models builds heavily on diffusers codebase and the model weights are from stable-diffusion-1.5.

Thanks for the efforts of the community!

Citation

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models},
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Stable Diffusion (2022)

Abstract

Pretrained models

Quick Start

Comments

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

Stable Diffusion (2022)

Abstract

Pretrained models

Quick Start

Comments

Citation