Skip to content

Chat Completions

The chat completions endpoint generates responses from a language model. It follows the OpenAI Chat Completions API format.

Endpoint

POST /v1/chat/completions

Requires authentication. See Authentication.

Request body

Field Type Required Description
model string Yes The model ID to use. Get available IDs from List Models.
messages array Yes The conversation history. Each item has a role (system, user, assistant, or tool) and content.
stream boolean No If true, responses are streamed as Server-Sent Events. Default: false.
temperature number No Controls randomness. Range 0–2. Higher = more varied.
top_p number No Nucleus sampling threshold.
max_completion_tokens integer No Maximum tokens in the response.
stop string or array No One or more sequences where generation stops.
tools array No Tool definitions the model may call. See Tool Calling.
tool_choice string or object No Controls whether the model uses tools. "auto", "none", "required", or a specific tool.
frequency_penalty number No Reduces repetition of frequent tokens. Range -2 to 2.
presence_penalty number No Reduces repetition of any prior token. Range -2 to 2.

Basic example

Request:

{
  "model": "llama-3.1-8b",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is the capital of France?" }
  ]
}

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1718000000,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 9,
    "total_tokens": 37
  }
}

Streaming

Set "stream": true to receive the response as a stream of Server-Sent Events. Each event contains a partial response delta.

Request:

{
  "model": "llama-3.1-8b",
  "messages": [{ "role": "user", "content": "Tell me a short joke." }],
  "stream": true
}

Response (stream of events):

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":"Why"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" don't"},"index":0}]}

...

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop","index":0}]}

data: [DONE]

Tool calling

If tools are provided and the model supports tool calling, it may respond with tool call requests rather than a direct text answer.

Request with tools:

{
  "model": "llama-3.1-8b",
  "messages": [{ "role": "user", "content": "What is the weather in London?" }],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": { "type": "string", "description": "The city name" }
          },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Response with tool call:

{
  "choices": [
    {
      "message": {
        "role": "assistant",
        "content": null,
        "tool_calls": [
          {
            "id": "call_xyz",
            "type": "function",
            "function": {
              "name": "get_weather",
              "arguments": "{\"city\":\"London\"}"
            }
          }
        ]
      },
      "finish_reason": "tool_calls"
    }
  ]
}

Execute the function on your side, then send the result back as a tool role message and call the endpoint again to get the model's final answer.

Multi-turn conversations

Include the full conversation history in messages on each request. OptimaGPT is stateless — it does not store conversation state between API calls.

{
  "model": "llama-3.1-8b",
  "messages": [
    { "role": "user", "content": "My name is Alice." },
    { "role": "assistant", "content": "Hello Alice! How can I help you today?" },
    { "role": "user", "content": "What is my name?" }
  ]
}