Sound interface

kalle.prorok · October 15, 2021, 7:59am

Anyone tried or found a way to use microphone and/or loudspeaker in an Anvil app?

I am creating an AI/NLP-app for questions and answers and it works ok with text interface (In Swedish and calling my server with a very powerful GPU for using BERT in a quicker way) but would love to add the possibility to ask questions and/or give responses via speak/speech (ASR and TTS). It can also be very valuable for people with bad eyes or while driving a vehicle (train in my case).

Thank you Anvil for a very good and fun product!

kalle.prorok · October 15, 2021, 12:16pm

At least I found some JavaScript on how to use it in a Browser but I dont have the knowledge of how to exploit it…

aldo.ercolani · October 15, 2021, 12:23pm

Hello @kalle.prorok
Anvil allows you to create a custom component that can do that leveraging JS and/or existing APIs out there, but first I think you should define clearly your use case.

I mean, are you trying to find a way to access directly microphone and speakers, and then manage all by your own code in your own app?

Or (as it seems to me, since you have already something working with text) you are looking for a speech-to-text API to get input text and then a text-to-speech API to speak out the answer?

The answer would lead you down to completely different search paths.

BR

kalle.prorok · October 15, 2021, 12:44pm

Hi Aldo, thank you, my idea was that the Anvil app+browser manages the microphone + start/stop recording and then can send the sound-data (file) to my (Linux)server with GPU and installed software to convert this sound data to text and then it can return the text and put into a textbox in the Anvil app for further editing, search etc.

I plan to use

It’s nice because there seems to be no other Swedish possibilty except a google-API costing money/second.

As a bonus it would be nice if the browser can read out the answer from searches loudly, maybe there is a built in browser functionality for this like Windows “screen reader”?

BR Kalle

aldo.ercolani · October 15, 2021, 1:00pm

Hi @kalle.prorok
have you tried searching the forums for “recording” keyword?
First result is this post where an audio is recorded and then manipulated.

As for the text-to-speech what’s closest to the windows screen reader is the (still experimental) standard Web Speech API.

kalle.prorok · October 15, 2021, 1:15pm

Ooh, thanks a lot you for your findings, will try them out. Did search for microphone, sound, loudspeaker but not recording ;). Please have a nice weekend!