AudioResponseFormat:
    | "json"
    | "text"
    | "srt"
    | "verbose_json"
    | "vtt"

The format of the output, in one of these options: json, text, srt, verbose_json, or vtt. For gpt-4o-transcribe and gpt-4o-mini-transcribe, the only supported format is json.