Cerebrium setup
Create a Cerebrium account by signing up here and follow the installation docs. Run the following command to create the Cerebrium starter project:cerebrium init 1-openai-compatible-endpoint. This creates two files:
main.py: The entrypoint file where application code livescerebrium.toml: A configuration file that contains all build and environment settings
cerebrium.toml to create your deployment environment:
main.py:
- Takes parameters through its signature, with optional and default values available
- Automatically receives a unique
run_idfor each request - Processes the entire prompt through the model
- Streams results when
stream=Trueusing async functionality - Returns the complete result at the end if streaming is disabled
Deploy & Inference
To deploy the model use the following command:/run). While OpenAI-compatible endpoints typically end with /chat/completions, all Cerebrium endpoints are OpenAI-compatible. Call the endpoint as follows:
/run). Use the JWT token from either the curl command or the Cerebrium dashboard’s API Keys section.