Represents a verbose json transcription response returned by model, based on the provided input.

interface TranscriptionVerbose {
    duration: string;
    language: string;
    segments?: TranscriptionSegment[];
    text: string;
    words?: TranscriptionWord[];
}

Properties

duration: string

The duration of the input audio.

language: string

The language of the input audio.

Segments of the transcribed text and their corresponding details.

text: string

The transcribed text.

Extracted words and their corresponding timestamps.