Passing audio files with get_bytes() to WhisperAPI

zielinski.mark · May 31, 2023, 12:26am

Hey there,

I have created some Media Objects from the client code using some code from this thread. This audio is captured from the user’s device’s microphone and stored as a media object.

However, when trying to pass the media object to Whisper’s Transcription API, I’m not able to provide it in an acceptable format. I first tried to pass the Media Object directly, but it seems to expect bytes. So I used get_bytes() to convert the media object into bytes (and printed it to make sure it is working).
However, when trying to pass the audio file’s bytes to the API, I get:

AttributeError: 'bytes' object has no attribute 'name'
at /home/anvil/.env/lib/python3.10/site-packages/openai/api_resources/audio.py:57

Any ideas how I can get the data over there? I even tried passing it as a URL, but the URL generated by the Media Object is not accessible outside of the app.

jshaffstall · May 31, 2023, 2:48am

The transcription API wants a file object, not bytes. Use the media object’s ability to write a temporary file and pass that file to the transcription api. Anvil Docs | Files on Disk

zielinski.mark · May 31, 2023, 3:22am

Like this?

@anvil.server.callable
def transcribe(audio_data):
    openai.api_key = "mykey" 
    with TempFile(audio_data) as temp_audio:
      transcript = openai.Audio.transcribe("whisper-1", temp_audio)

I tried this initially, but I get the following error:

`TypeError: a bytes-like object is required, not 'StreamingMedia'`

* `at /home/anvil/.env/lib/python3.10/site-packages/urllib3/filepost.py:90`

jshaffstall · May 31, 2023, 3:28am

As far as I can tell from the docs, that gives you a file name in temp_audio. You’d still need to open the file to pass it to the transcription api.