Represents a verbose json transcription response returned by model, based on the provided input.
The duration of the input audio.
The language of the input audio.
Optional
Segments of the transcribed text and their corresponding details.
The transcribed text.
Extracted words and their corresponding timestamps.
Represents a verbose json transcription response returned by model, based on the provided input.