Skip to main content

Introduction

Mystic AI is sunsetting their services. They were an early pioneer that pushed the industry forward. This guide covers migrating apps from Mystic to Cerebrium to keep them functional. It covers converting existing Mystic code (using a stable diffusion example) and configuration to the Cerebrium platform, including deployment optimization for performance and cost efficiency.

Key Differences

Cerebrium helps teams deploy and run models efficiently. The infrastructure is designed for reliable performance:
  • The average model cold-starts in 2-5 seconds.
  • Updates to your code deploy quickly, taking only 8-14 seconds.
  • 99.9% uptime.
Cerebrium provides precise control over computing resources. Instead of managing entire instances, select the exact CPU, memory, and GPU power needed. Billing is per-second for actual resource usage. Use the pricing calculator for cost estimates.

Migration Process

1. Project Setup and Configuration

Install Cerebrium’s command-line tool and create the project:
pip install cerebrium --upgrade
cerebrium login  # You'll be redirected to the dashboard for login
cerebrium init stable-diffusion
cd stable-diffusion

Convert the existing Mystic configuration to Cerebrium’s format. A typical Mystic configuration:
# Mystic's pipeline.yaml
runtime:
  container_commands:
    - apt-get update
    - apt-get install -y git
  python:
    version: "3.10"
    requirements:
      - pipeline-ai
      - diffusers==0.24.0
      - torch==2.1.1
      - transformers==4.35.2
      - accelerate==0.25.0
    cuda_version: "11.4"
accelerators:
  - "nvidia_a10"
accelerator_memory: null
pipeline_graph: sd_pipeline:pipeline_graph
pipeline_name: <YOUR_USERNAME>/stable-diffusion-v1.5
extras: {}
Becomes this Cerebrium TOML config:
# cerebrium.toml
[cerebrium.deployment]
name = "stable-diffusion"
python_version = "3.11"
docker_base_image_url = "debian:bookworm-slim"
include = ["./*", "main.py", "cerebrium.toml"]
exclude = [".*"]

[cerebrium.hardware]
compute = "AMPERE_A10"    # Choose your GPU type
cpu = 4                   # Number of CPU cores
memory = 16.0             # Memory in GB
gpu_count = 1             # Number of GPUs

[cerebrium.scaling]
min_replicas = 0         # Save costs when inactive and scale down your app
max_replicas = 2         # Handle increased traffic and scale up where necessary
cooldown = 60            # Time window at reduced concurrency before scaling down
replica_concurrency = 1  # The number of requests a single container can support

[cerebrium.dependencies.pip]
torch = ">=2.0.0"
pydantic = "latest"
transformers = "latest"
accelerate = "latest"
diffusers = "latest"
safetensors = "latest"
xformers = "latest"

2. Code Migration

Convert the model implementation. A typical Mystic pipeline:
import typing as t
from pathlib import Path

from PIL.Image import Image
from pipeline.cloud.pipelines import run_pipeline
from pipeline.objects.graph import InputField, InputSchema

from pipeline import File, Pipeline, Variable, entity, pipe

HF_MODEL_ID = "runwayml/stable-diffusion-v1-5"

class ModelKwargs(InputSchema):
    num_images_per_prompt: int | None = InputField(
        title="num_images_per_prompt",
        description="The number of images to generate per prompt.",
        default=1,
        optional=True,
    )
    height: int | None = InputField(
        title="height",
        description="The height in pixels of the generated image.",
        default=512,
        optional=True,
        multiple_of=64,
        ge=64,
    )
    width: int | None = InputField(
        title="width",
        description="The width in pixels of the generated image.",
        default=512,
        optional=True,
        multiple_of=64,
        ge=64,
    )
    num_inference_steps: int | None = InputField(
        title="num_inference_steps",
        description=(
            "The number of denoising steps. More denoising steps "
            "usually lead to a higher quality image at the expense "
            "of slower inference."
        ),
        default=50,
        optional=True,
    )

@entity
class StableDiffusionModel:
    def __init__(self) -> None:
        self.model = None
        self.device = None

    @pipe(run_once=True, on_startup=True)
    def load(self) -> None:
        """
        Load the HF model into memory"""
        import torch
        from diffusers import StableDiffusionPipeline

        device = torch.device("cuda") if torch.cuda.is_available() else "cpu"
        self.model = StableDiffusionPipeline.from_pretrained(HF_MODEL_ID)
        self.model.to(device)

    @pipe
    def predict(self, prompt: str, model_kwargs: ModelKwargs) -> t.List[Image]:
        """
        Generates a list of PIL images.
        """
        return self.model(prompt=prompt, **model_kwargs.to_dict()).images

    @pipe
    def postprocess(self, images: t.List[Image]) -> t.List[File]:
        """
        Creates a list of Files from the `PIL` images.
        """
        output_images = []
        for i, image in enumerate(images):
            path = Path(f"/tmp/sd/image-{i}.jpg")
            path.parent.mkdir(parents=True, exist_ok=True)
            image.save(str(path))
            output_images.append(File(path=path, allow_out_of_context_creation=True))
        return output_images

with Pipeline() as builder:
    prompt = Variable(
        str,
        title="prompt",
        description="The prompt to guide image generation",
        max_length=512,
    )
    model_kwargs = Variable(ModelKwargs)

    model = StableDiffusionModel()
    model.load()

    images: t.List[Image] = model.predict(prompt, model_kwargs)

    output: t.List[File] = model.postprocess(images)

    builder.output(output)

pipeline_graph = builder.get_pipeline()

The Cerebrium equivalent in main.py:
import base64
import io

import torch
from diffusers import StableDiffusionPipeline, DPMSolverMultistepScheduler
from pydantic import BaseModel


# Define the structure of input parameters
class Item(BaseModel):
    prompt: str
    height: int
    width: int
    num_inference_steps: int
    num_images_per_prompt: int


# Load the model and set it up for inference
model_id = "stabilityai/stable-diffusion-2-1"
pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_xformers_memory_efficient_attention()
pipe = pipe.to("cuda")


# The endpoint we'll call to make inference
def predict(
        prompt: str,
        height: int = 512,
        width: int = 512,
        num_inference_steps: int = 25,
        num_images_per_prompt: int = 1,
):
    item = Item(
        prompt=prompt,
        height=height,
        width=width,
        num_inference_steps=num_inference_steps,
        num_images_per_prompt=num_images_per_prompt,
    )
    images = pipe(
        prompt=item.prompt,
        height=item.height,
        width=item.width,
        num_images_per_prompt=item.num_images_per_prompt,
        num_inference_steps=item.num_inference_steps,
    ).images
    finished_images = []
    for image in images:
        buffered = io.BytesIO()
        image.save(buffered, format="PNG")
        finished_images.append(base64.b64encode(buffered.getvalue()).decode("utf-8"))
    return finished_images

3. Deployment

Deploy your model with a single command:
cerebrium deploy

4. Inference

Once your app is deployed, you can make requests to your model using the example cURL request below:
curl --location 'https://api.aws.us-east-1.cerebrium.ai/v4/p-<YOUR PROJECT ID>/stable-diffusion/predict' \
--header 'Content-Type: application/json' \
--header 'Authorization: Bearer <YOUR TOKEN HERE>' \
--data '{
    "prompt": "a photo of an astronaut riding a horse on mars"
}'
The Cerebrium platform provides the tools and support needed for a smooth transition.

Join the Community

Connect with other developers and the Cerebrium team for faster response and issue resolution:
  • Join the Discord server.
  • Join the Slack workspace.
These communities offer migration support, quick technical answers, best practices, and feature updates.