You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
LocalAI version:
localai: stable 2.28.0 (bottled), HEAD (via brew)
Environment, CPU architecture, OS, and Version:
Darwin MacBook-Pro.local 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:30:59 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6030 arm64
Describe the bug
The whisper api endpoint (/v1/audio/transcriptions) isn't compatible to OpenAI's implementation.
While support for response_format (and other request parameters) would be amazing i think at least the default response should be compatible with OpenAI's verbose_json.
To Reproduce
spin up localai, install whisper-1 model, send any audio file to /v1/audio/transcriptions and check the response
Additional context
i know that OpenAi defaults to json for response_format but since the parameter isn't supported and the current answer is already close to the verbose_json format changing it so one could use official libraries or use localai as drop in replacement
The main differences i spotted:
task, language and duration are missing from the top level seek, temperature, avg_logprob, compression_ratio and no_speech_prob are missing from each segment and the start and stop values seem to be (rounded?) Nanoseconds instead of Seconds (5000000000 instead of 5.119999999999998 for around 5 seconds)
The text was updated successfully, but these errors were encountered:
LocalAI version:
localai: stable 2.28.0 (bottled), HEAD (via brew)
Environment, CPU architecture, OS, and Version:
Darwin MacBook-Pro.local 23.3.0 Darwin Kernel Version 23.3.0: Wed Dec 20 21:30:59 PST 2023; root:xnu-10002.81.5~7/RELEASE_ARM64_T6030 arm64
Describe the bug
The whisper api endpoint (/v1/audio/transcriptions) isn't compatible to OpenAI's implementation.
While support for
response_format
(and other request parameters) would be amazing i think at least the default response should be compatible with OpenAI'sverbose_json
.To Reproduce
spin up localai, install whisper-1 model, send any audio file to
/v1/audio/transcriptions
and check the responseExpected behavior
Get a response compatible with https://platform.openai.com/docs/api-reference/audio/verbose-json-object
Additional context
i know that OpenAi defaults to
json
forresponse_format
but since the parameter isn't supported and the current answer is already close to theverbose_json
format changing it so one could use official libraries or use localai as drop in replacementThe main differences i spotted:
task
,language
andduration
are missing from the top levelseek
,temperature
,avg_logprob
,compression_ratio
andno_speech_prob
are missing from each segment and thestart
andstop
values seem to be (rounded?) Nanoseconds instead of Seconds (5000000000 instead of 5.119999999999998 for around 5 seconds)The text was updated successfully, but these errors were encountered: