Skip to content

Latest commit

 

History

History
100 lines (73 loc) · 3.65 KB

README.md

File metadata and controls

100 lines (73 loc) · 3.65 KB

Stable Diffusion (2022)

Stable Diffusion

Task: Text2Image

Abstract

Stable Diffusion is a latent diffusion model conditioned on the text embeddings of a CLIP text encoder, which allows you to create images from text inputs. This model builds upon the CVPR'22 work High-Resolution Image Synthesis with Latent Diffusion Models. The official code was released at stable-diffusion and also implemented at diffusers. We support this algorithm here to facilitate the community to learn together and compare it with other text2image methods.


A mecha robot in a favela in expressionist style

A Chinese palace is beside a beautiful lake

A panda is having dinner in KFC

Pretrained models

Model Dataset Download
stable_diffusion_v1.5 - model

We use stable diffusion v1.5 weights. This model has several weights including vae, unet and clip.

You should download the weights from stable-diffusion-1.5 and change the 'pretrained_model_path' in config to the weights dir.

Download with git:

git lfs install
git clone https://huggingface.co/runwayml/stable-diffusion-v1-5

Quick Start

Running the following codes, you can get a text-generated image.

from mmengine import MODELS, Config
from torchvision import utils

from mmengine.registry import init_default_scope

init_default_scope('mmedit')

config = 'configs/stable_diffusion/stable-diffusion_ddim_denoisingunet.py'
config = Config.fromfile(config).copy()
config.model.init_cfg.pretrained_model_path = '/path/to/your/stable-diffusion-v1-5'

StableDiffuser = MODELS.build(config.model)
prompt = 'A mecha robot in a favela in expressionist style'
StableDiffuser = StableDiffuser.to('cuda')

image = StableDiffuser.infer(prompt)['samples']
utils.save_image(image, 'robot.png')

Comments

Our codebase for the stable diffusion models builds heavily on diffusers codebase and the model weights are from stable-diffusion-1.5.

Thanks for the efforts of the community!

Citation

@misc{rombach2021highresolution,
      title={High-Resolution Image Synthesis with Latent Diffusion Models},
      author={Robin Rombach and Andreas Blattmann and Dominik Lorenz and Patrick Esser and Björn Ommer},
      year={2021},
      eprint={2112.10752},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}