Converting audio file to text

soadadfarhan · October 21, 2020, 9:51am

Hi Everyone on forum,
I hope you are all well and safe!

I need help with converting audio file to text, I executed the code below but I get this error message:

AssertionError: audio_data must be audio data

def file_loader_1_change(self, file, **event_args):
      anvil.server.call('Audio2Text',file)

@anvil.server.callable
def Audio2Text(AudioFile):
    r = sr.Recognizer()
    try:
      response = r.recognize_google(AudioFile)
      print("You said '" + response + "'")

    except sr.UnknownValueError:
      print("Could not understand audio")
    except sr.RequestError as e:
      print("Error; {0}".format(e))

Many thanks in advance

Soadad

stucork · October 21, 2020, 12:27pm

if you look at the library reference for speech_recognition

speech_recognition/reference/library-reference.rst at master · Uberi/speech_recognition · GitHub

you need to send AudioData rather than an Anvil Media object

Here’s the AudioData class

speech_recognition/reference/library-reference.rst at master · Uberi/speech_recognition · GitHub

I’m not sure exactly how you’d go about turning the Anvil Media object into sr.AudioData
but you can get the bytes by doing media_object.get_bytes()

you might also be able to turn it into sr.AudioFile

speech_recognition/reference/library-reference.rst at master · Uberi/speech_recognition · GitHub

I don’t think

audio_file = sr.AudioFile(media_object)

will work

but using TempFile might work

with anvil.media.TempFile(media_object) as file_name:
  audio_file = sr.AudioFile(file_name)

soadadfarhan · October 22, 2020, 9:02am

Thanks, stucork. Unfortunately, that didn’t work. Can you think of other ideas?

stucork · October 22, 2020, 9:05am

did you get any useful logging information?

At this stage i’d suggest building a minimum example clone so that someone can dig around your code.

soadadfarhan · October 22, 2020, 9:16am

I get “AssertionError: audio_data must be audio data”

Code:

import speech_recognition as sr
@anvil.server.callable
def Audio2Text(AudioFile):
with anvil.media.TempFile(AudioFile) as file_name:
audio_file = sr.AudioFile(file_name)
r = sr.Recognizer()
response = r.recognize_google(audio_file)
print(“You said '” + response + “’”)

def file_loader_1_change(self, file, **event_args):
anvil.server.call(‘Audio2Text’,file)

brooke · October 23, 2020, 12:28pm

Hi @soadadfarhan,

The second example on the speech_recognition home page looks like it would do what you want. In particular, it seems that you need to ‘record’ the contents of the audio file using the Recognizer:

    r = sr.Recognizer()
    with anvil.media.TempFile(AudioFile) as file_name:
      with sr.AudioFile(file_name) as source:
        audio = r.record(source)
    response = r.recognize_google(audio)

Hopefully that should do roughly what you want! If you still see errors, then @stucork is right - please post a clone link to a minimal example here.

soadadfarhan · October 23, 2020, 12:42pm

That worked, Brooke. Thanks a million