Skip to main content
Deployment cost is based on the hardware selected and the execution time. Every time code runs or a machine is specified to stay running, compute is billed. GPU, CPU, and Memory usage are charged per second; persistent storage is charged per GB per month. View compute pricing on the pricing page. Deploying a model incurs two billable processes:
  1. Build process — sets up the app environment: a Python environment with the specified parameters, required apt packages, Conda and Python packages, and any model files. A build is only charged when the environment needs rebuilding, i.e., a build or deploy command runs with changed requirements, parameters, or code. Each build step is cached, so subsequent builds cost substantially less than the first.
  2. App runtime — the time code runs from start to finish on each request. Three cost components apply:
  • Cold-start: The time to spin up server(s), load the environment, connect storage, etc. Cerebrium continuously optimizes cold-start latency. Cold-start time is not billed.
  • Model initialization: Code outside the request function that only runs on cold start (e.g., loading a model into GPU RAM, importing packages). This time is billed.
  • Function runtime: Code inside the request function, executed on every request
Example cost calculation A model deployment requires:
  • 24 GB VRam (A10): $0.000306 per second
  • 2 CPU cores: 2 * $0.00000655 per second
  • 20GB Memory: 20 * $0.00000222 per second
Assume the app works on the first deployment, incurring a single 2-minute build. The app has 10 cold starts per day with an average initialization of 2 seconds and an average runtime (predict) of 2 seconds. The expected monthly volume is 100,000 inferences.
# Your variables
average_initialization_time = 2
cold_starts_per_month = 300  # 10 a day for 30 days
average_inference_time = 2  # seconds
number_of_inferences = 100000  # number of inferences per month

GPU_cost = 0.000306  # per second
CPU_cost = 0.00000655  # per second per core
memory_cost = 0.00000222  # per second per GB
num_of_cpu_cores = 2
gb_of_RAM = 20
build_seconds = 120  # 2 minutes

# cost calculation
compute_rate = GPU_cost + (CPU_cost * num_of_cpu_cores) + (memory_cost * gb_of_RAM)

total_build_compute_cost = build_seconds * compute_rate
total_initialization_time = average_initialization_time * cold_starts_per_month
total_inference_time = average_inference_time * number_of_inferences

initialization_compute_cost = total_initialization_time * compute_rate
inference_compute_cost = total_inference_time * compute_rate
storage_cost = gb_of_persistent_storage * persistent_storage_cost

total_cost = inference_compute_cost + storage_cost + total_build_compute_cost + initialization_compute_cost

print(f"Build Compute cost: ${total_build_compute_cost :.2f}/month",
      f"Initialization Compute cost: ${initialization_compute_cost :.2f}/month",
      f"Inference Compute cost: ${inference_compute_cost :.2f}/month",
      f"\nStorage cost: ${storage_cost :.2f}/month",
      f"\nTotal cost: ${total_cost :.2f}/month")