This event is the output of audio transcription for user audio written to the user audio buffer. Transcription begins when the input audio buffer is committed by the client or server (in server_vad mode). Transcription runs asynchronously with Response creation, so this event may come before or after the Response events.

Realtime API models accept audio natively, and thus input transcription is a separate process run on a separate ASR (Automatic Speech Recognition) model, currently always whisper-1. Thus the transcript may diverge somewhat from the model's interpretation, and should be treated as a rough guide.

interface ConversationItemInputAudioTranscriptionCompletedEvent {
    content_index: number;
    event_id: string;
    item_id: string;
    logprobs?: null | OpenAIClient.Beta.Realtime.ConversationItemInputAudioTranscriptionCompletedEvent.Logprob[];
    transcript: string;
    type: "conversation.item.input_audio_transcription.completed";
}

Properties

content_index: number

The index of the content part containing the audio.

event_id: string

The unique ID of the server event.

item_id: string

The ID of the user message item containing the audio.

The log probabilities of the transcription.

transcript: string

The transcribed text.

type: "conversation.item.input_audio_transcription.completed"

The event type, must be conversation.item.input_audio_transcription.completed.