Skip to content

Responses

The responses endpoint is a stateful alternative to /v1/chat/completions. The gateway stores each turn server-side, so clients only need to send the new user message on subsequent requests — previous conversation history is reconstructed automatically from previous_response_id.

Endpoint

POST /v1/responses

Requires authentication. See Authentication.

Request body

Field Type Required Description
model string Yes The model ID to use. Get available IDs from List Models.
input string or array Yes The new user message. Either a plain string or a messages array (each item with role and content).
previous_response_id string No The id of the previous response. When provided, the gateway prepends the full conversation history automatically.
instructions string No A system prompt. On the first turn this is stored alongside the conversation. On subsequent turns it can be omitted and the original instructions will be reused.
tools array No Tool definitions the model may call. Same format as /v1/chat/completions.
tool_choice string or object No "auto", "none", "required", or a specific tool object.
stream boolean No If true, the response is streamed using the event format described below. Default: false.
store boolean No If false, the response is not persisted and previous_response_id chaining is disabled for this turn. Default: true.
temperature number No Sampling temperature.
max_output_tokens integer No Maximum tokens in the response.

Basic example

First turn (no previous_response_id):

{
  "model": "llama-3.1-8b",
  "input": "What is the capital of France?",
  "instructions": "You are a helpful assistant."
}

Response:

{
  "id": "resp_a1b2c3d4e5f6...",
  "object": "response",
  "created_at": 1718000000,
  "model": "llama-3.1-8b",
  "status": "completed",
  "output": [
    {
      "type": "message",
      "id": "abc123",
      "role": "assistant",
      "content": [
        { "type": "output_text", "text": "The capital of France is Paris." }
      ]
    }
  ],
  "usage": {
    "input_tokens": 28,
    "output_tokens": 9,
    "total_tokens": 37
  }
}

Second turn (continuing the conversation):

{
  "model": "llama-3.1-8b",
  "input": "And what about Germany?",
  "previous_response_id": "resp_a1b2c3d4e5f6..."
}

The gateway looks up the prior response, reconstructs the conversation history, and returns a response as if the full history had been sent.

Response object

Field Description
id Unique response identifier (resp_...). Pass this as previous_response_id on the next turn.
object Always "response".
created_at Unix timestamp.
model The model that generated the response.
status Always "completed" for non-streaming responses.
output Array of output items. Each item has a type of "message" (text) or "function_call" (tool call).
usage.input_tokens Prompt token count.
usage.output_tokens Completion token count.
usage.total_tokens Combined token count.

Tool calls

If the model requests a tool call, the output array contains a function_call item instead of a message:

{
  "output": [
    {
      "type": "function_call",
      "call_id": "call_xyz",
      "name": "get_weather",
      "arguments": "{\"city\":\"London\"}"
    }
  ]
}

Execute the function and send the result back as the input on the next turn, alongside the same previous_response_id.

Streaming

Set "stream": true to receive the response as Server-Sent Events.

event: response.created
data: {"type":"response.created","response":{"id":"resp_...","status":"in_progress",...}}

event: response.output_item.added
data: {"type":"response.output_item.added","output_index":0,"item":{"type":"message","role":"assistant","content":[]}}

event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":"The capital"}

event: response.output_text.delta
data: {"type":"response.output_text.delta","output_index":0,"content_index":0,"delta":" of France is Paris."}

event: response.output_text.done
data: {"type":"response.output_text.done","output_index":0,"content_index":0,"text":"The capital of France is Paris."}

event: response.output_item.done
data: {"type":"response.output_item.done","output_index":0,"item":{...}}

event: response.completed
data: {"type":"response.completed","response":{...}}

data: [DONE]

Opting out of persistence

Set "store": false to process the request without saving it. The response will not be retrievable via previous_response_id and no conversation record will be created.

Authentication note

This endpoint accepts both JWT bearer tokens and API keys. Conversations are linked to the authenticated identity — JWT tokens link to a user account; API keys link to the key itself.