Chat Completions
The chat completions endpoint generates responses from a language model. It follows the OpenAI Chat Completions API format.
Endpoint
POST /v1/chat/completions
Requires authentication. See Authentication.
Request body
| Field | Type | Required | Description |
|---|---|---|---|
model |
string | Yes | The model ID to use. Get available IDs from List Models. |
messages |
array | Yes | The conversation history. Each item has a role (system, user, assistant, or tool) and content. |
stream |
boolean | No | If true, responses are streamed as Server-Sent Events. Default: false. |
temperature |
number | No | Controls randomness. Range 0–2. Higher = more varied. |
top_p |
number | No | Nucleus sampling threshold. |
max_completion_tokens |
integer | No | Maximum tokens in the response. |
stop |
string or array | No | One or more sequences where generation stops. |
tools |
array | No | Tool definitions the model may call. See Tool Calling. |
tool_choice |
string or object | No | Controls whether the model uses tools. "auto", "none", "required", or a specific tool. |
frequency_penalty |
number | No | Reduces repetition of frequent tokens. Range -2 to 2. |
presence_penalty |
number | No | Reduces repetition of any prior token. Range -2 to 2. |
Basic example
Request:
{
"model": "llama-3.1-8b",
"messages": [
{ "role": "system", "content": "You are a helpful assistant." },
{ "role": "user", "content": "What is the capital of France?" }
]
}
Response:
{
"id": "chatcmpl-abc123",
"object": "chat.completion",
"created": 1718000000,
"model": "llama-3.1-8b",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "The capital of France is Paris."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 28,
"completion_tokens": 9,
"total_tokens": 37
}
}
Streaming
Set "stream": true to receive the response as a stream of Server-Sent Events. Each event contains a partial response delta.
Request:
{
"model": "llama-3.1-8b",
"messages": [{ "role": "user", "content": "Tell me a short joke." }],
"stream": true
}
Response (stream of events):
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":"Why"},"index":0}]}
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" don't"},"index":0}]}
...
data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop","index":0}]}
data: [DONE]
Tool calling
If tools are provided and the model supports tool calling, it may respond with tool call requests rather than a direct text answer.
Request with tools:
{
"model": "llama-3.1-8b",
"messages": [{ "role": "user", "content": "What is the weather in London?" }],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": { "type": "string", "description": "The city name" }
},
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}
Response with tool call:
{
"choices": [
{
"message": {
"role": "assistant",
"content": null,
"tool_calls": [
{
"id": "call_xyz",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"London\"}"
}
}
]
},
"finish_reason": "tool_calls"
}
]
}
Execute the function on your side, then send the result back as a tool role message and call the endpoint again to get the model's final answer.
Multi-turn conversations
Include the full conversation history in messages on each request. OptimaGPT is stateless — it does not store conversation state between API calls.
{
"model": "llama-3.1-8b",
"messages": [
{ "role": "user", "content": "My name is Alice." },
{ "role": "assistant", "content": "Hello Alice! How can I help you today?" },
{ "role": "user", "content": "What is my name?" }
]
}