Chat Completions

The chat completions endpoint generates responses from a language model. It follows the OpenAI Chat Completions API format.

Endpoint

POST /v1/chat/completions

Requires authentication. See Authentication.

Request body

Field	Type	Required	Description
`model`	string	Yes	The model ID to use. Get available IDs from List Models.
`messages`	array	Yes	The conversation history. Each item has a `role` (`system`, `user`, `assistant`, or `tool`) and `content`.
`stream`	boolean	No	If `true`, responses are streamed as Server-Sent Events. Default: `false`.
`temperature`	number	No	Controls randomness. Range 0–2. Higher = more varied.
`top_p`	number	No	Nucleus sampling threshold.
`max_completion_tokens`	integer	No	Maximum tokens in the response.
`stop`	string or array	No	One or more sequences where generation stops.
`tools`	array	No	Tool definitions the model may call. See Tool Calling.
`tool_choice`	string or object	No	Controls whether the model uses tools. `"auto"`, `"none"`, `"required"`, or a specific tool.
`frequency_penalty`	number	No	Reduces repetition of frequent tokens. Range -2 to 2.
`presence_penalty`	number	No	Reduces repetition of any prior token. Range -2 to 2.

Basic example

Request:

{
  "model": "llama-3.1-8b",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "What is the capital of France?" }
  ]
}

Response:

{
  "id": "chatcmpl-abc123",
  "object": "chat.completion",
  "created": 1718000000,
  "model": "llama-3.1-8b",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "The capital of France is Paris."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 28,
    "completion_tokens": 9,
    "total_tokens": 37
  }
}

Streaming

Set "stream": true to receive the response as a stream of Server-Sent Events. Each event contains a partial response delta.

Request:

{
  "model": "llama-3.1-8b",
  "messages": [{ "role": "user", "content": "Tell me a short joke." }],
  "stream": true
}

Response (stream of events):

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"role":"assistant","content":"Why"},"index":0}]}

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{"content":" don't"},"index":0}]}

...

data: {"id":"chatcmpl-abc","object":"chat.completion.chunk","choices":[{"delta":{},"finish_reason":"stop","index":0}]}

data: [DONE]

Tool calling

If tools are provided and the model supports tool calling, it may respond with tool call requests rather than a direct text answer.

Request with tools:

{
  "model": "llama-3.1-8b",
  "messages": [{ "role": "user", "content": "What is the weather in London?" }],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": { "type": "string", "description": "The city name" }
          },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}

Response with tool call:

n>

{ "choices": [ { "message": { "role": "assistant", "content": null, "tool_calls": [ { "id": "call_xyz", "type": "function", "function": { "name": "get_weather", "arguments": "{\"city\":\"London\"}" } } ] }, "finish_reason": "tool_calls" } ] }
Execute the function on your side, then send the result back as a tool role message and call the endpoint again to get the model's final answer.
Multi-turn conversations
Include the full conversation history in messages on each request. OptimaGPT is stateless — it does not store conversation state between API calls.
{
  "model": "llama-3.1-8b",
  "messages": [
    { "role": "user", "content": "My name is Alice." },
    { "role": "assistant", "content": "Hello Alice! How can I help you today?" },
    { "role": "user", "content": "What is my name?" }
  ]
}