The response resource.

interface RealtimeResponse {
    conversation_id?: string;
    id?: string;
    max_output_tokens?: number | "inf";
    metadata?: null | Metadata;
    modalities?: ("text" | "audio")[];
    object?: "realtime.response";
    output?: ConversationItem[];
    output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw";
    status?:
        | "completed"
        | "failed"
        | "incomplete"
        | "cancelled";
    status_details?: RealtimeResponseStatus;
    temperature?: number;
    usage?: RealtimeResponseUsage;
    voice?:
        | string & {}
        | "alloy"
        | "ash"
        | "ballad"
        | "coral"
        | "echo"
        | "fable"
        | "onyx"
        | "nova"
        | "sage"
        | "shimmer"
        | "verse";
}

Properties

conversation_id?: string

Which conversation the response is added to, determined by the conversation field in the response.create event. If auto, the response will be added to the default conversation and the value of conversation_id will be an id like conv_1234. If none, the response will not be added to any conversation and the value of conversation_id will be null. If responses are being triggered by server VAD, the response will be added to the default conversation, thus the conversation_id will be an id like conv_1234.

id?: string

The unique ID of the response.

max_output_tokens?: number | "inf"

Maximum number of output tokens for a single assistant response, inclusive of tool calls, that was used in this response.

metadata?: null | Metadata

Set of 16 key-value pairs that can be attached to an object. This can be useful for storing additional information about the object in a structured format, and querying for objects via API or the dashboard.

Keys are strings with a maximum length of 64 characters. Values are strings with a maximum length of 512 characters.

modalities?: ("text" | "audio")[]

The set of modalities the model used to respond. If there are multiple modalities, the model will pick one, for example if modalities is ["text", "audio"], the model could be responding in either text or audio.

object?: "realtime.response"

The object type, must be realtime.response.

output?: ConversationItem[]

The list of output items generated by the response.

output_audio_format?: "pcm16" | "g711_ulaw" | "g711_alaw"

The format of output audio. Options are pcm16, g711_ulaw, or g711_alaw.

status?:
    | "completed"
    | "failed"
    | "incomplete"
    | "cancelled"

The final status of the response (completed, cancelled, failed, or incomplete).

status_details?: RealtimeResponseStatus

Additional details about the status.

temperature?: number

Sampling temperature for the model, limited to [0.6, 1.2]. Defaults to 0.8.

Usage statistics for the Response, this will correspond to billing. A Realtime API session will maintain a conversation context and append new Items to the Conversation, thus output from previous turns (text and audio tokens) will become the input for later turns.

voice?:
    | string & {}
    | "alloy"
    | "ash"
    | "ballad"
    | "coral"
    | "echo"
    | "fable"
    | "onyx"
    | "nova"
    | "sage"
    | "shimmer"
    | "verse"

The voice the model used to respond. Current voice options are alloy, ash, ballad, coral, echo, fable, onyx, nova, sage, shimmer, and verse.