Responses

The responses endpoint is a stateful alternative to /v1/chat/completions. The gateway stores each turn server-side, so clients only need to send the new user message on subsequent requests — previous conversation history is reconstructed automatically from previous_response_id.

Endpoint

POST /v1/responses

Requires authentication. See Authentication.

Request body

Field	Type	Required	Description
`model`	string	Yes	The model ID to use. Get available IDs from List Models.
`input`	string or array	Yes	The new user message. Either a plain string or a messages array (each item with `role` and `content`).
`previous_response_id`	string	No	The `id` of the previous response. When provided, the gateway prepends the full conversation history automatically.
`instructions`	string	No	A system prompt. On the first turn this is stored alongside the conversation. On subsequent turns it can be omitted and the original instructions will be reused.
`tools`	array	No	Tool definitions the model may call. Same format as `/v1/chat/completions`.
`tool_choice`	string or object	No	`"auto"`, `"none"`, `"required"`, or a specific tool object.
`stream`	boolean	No	If `true`, the response is streamed using the event format described below. Default: `false`.
`store`	boolean	No	If `false`, the response is not persisted and `previous_response_id` chaining is disabled for this turn. Default: `true`.
`temperature`	number	No	Sampling temperature.
`max_output_tokens`	integer	No	Maximum tokens in the response.

Basic example

First turn (no previous_response_id):

{
  "model": "llama-3.1-8b",
  "input": "What is the capital of France?",
  "instructions": "You are a helpful assistant."
}

Response:

{
  "id": "resp_a1b2c3d4e5f6...",
  "object": "response",
  "created_at": 1718000000,
  "model": "llama-3.1-8b",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "abc123",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "The capital of France is Paris." }
      ]
    }
  ],
  "usage": {
    "input_tokens": 28,
    "output_tokens": 9,
    "total_tokens": 37
  }
}

Second turn (continuing the conversation):

{
  "model": "llama-3.1-8b",
  "input": "And what about Germany?",
  "previous_response_id": "resp_a1b2c3d4e5f6..."
}

The gateway looks up the prior response, reconstructs the conversation history, and returns a response as if the full history had been sent.

Response object

Field	Description
`id`	Unique response identifier (`resp_...`). Pass this as `previous_response_id` on the next turn.
`object`	Always `"response"`.
`created_at`	Unix timestamp.
`model`	The model that generated the response.
`status`	Always `"completed"` for non-streaming responses.
`output`	Array of output items. Each item has a `type` of `"message"` (text) or `"function_call"` (tool call).
`usage.input_tokens`	Prompt token count.
`usage.output_tokens`	Completion token count.
`usage.total_tokens`	Combined token count.

Tool calls

If the model requests a tool call, the output array contains a function_call item instead of a message:

{
  "output": [
    {
      "type": "function_call",
      "call_id": "call_xyz",
      "name": "get_weather",
      "arguments": "{\"city\":\"London\"}"
    }
  ]
}

Execute the function and send the result back as the input on the next turn, alongside the same previous_response_id.

Streaming

Set "stream": true to receive the response as Server-Sent Events.

event: response.created
data: {"type":"response.created","response":{"id":"resp_...","status":"in_progress",...}}

event: response.output_item.added
data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","role":"assistant","content":[]}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"The capital"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":" of France is Paris."}

event: response.output_text.done
data: {"type":"response.output_text.done","output_index":0,"content_index":0,"text":"The capital of France is Paris."}

event: response.output_item.done
data: {"type":"response.output_item.done","output_index":0,"item":{...}}

event: response.completed
data: {"type":"response.completed","response":{...}}

data: [DONE]

Opting out of persistence

Set "store": false to process the request without saving it. The response will not be retrievable via previous_response_id and no conversation record will be created.

Authentication note

This endpoint accepts both JWT bearer tokens and API keys. Conversations are linked to the authenticated identity — JWT tokens link to a user account; API keys link to the key itself.