Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I am in need of general feedback from you monks

I am playing with the idea of adding to a Tk Text field the ability to receive the transcription of a cloud speech to text service. By now I had a look at:

Watson: https://www.ibm.com/watson/services/speech-to-text/ Google: https://cloud.google.com/speech-to-text/ Microsoft: https://azure.microsoft.com/es-es/services/cognitive-servic +es/speech-to-text/

Unfortunately there is no ready-to-use API in Perl, even if all services offer means which should be "easily" implemented in Perl (curl, etc.). Do you think it is a very big/difficult project to have a Perl solution that:

What I am thinking of is to mimic what the Microsoft Service is doing in the demo you can see in the above link. You speak, the Browser sends continuously the audio to the cloud, and the transcription appears (very quickly) in the Browser text field.

However, I have several doubts that make me think this goal is quite hard to achieve (how to access the mic audio stream, does this need to be chunked in some way to be sent? Etc.). Anyone has some experience in this field and can share some suggestions.

Replies are listed 'Best First'.
Re: API continuous Speech-To-Text -UPDATED
by zentara (Cardinal) on Sep 01, 2018 at 18:27 UTC
    Unfortunately there is no ready-to-use API in Perl

    No lie there. There isn't even a Perl module for the alsalib.

    Just as some brainstorming, on linux anyways, you can easily access the microphone. Assuming you have the PulseAudio pavucontrol settings set correctly, you can get the microphone's audio with

    arecord - | aplay -
    This will pipe whatever is coming in on the microphone, or line in ( must be set properly in alsamixer and pavucontrol ), to the default sound output. So you probably can capture the microphone and pipe it to a streaming application like Gstreamer. You would then need to have gstreamer send it to the server, and somehow get the text back.

    I noticed the services seem to offer a choice between streaming the audio or uploading a file. A file upload would be alot easier.

    Check out this old app I uploaded way back when. ztk-v4l-video-bloger/recorder. It shows basically how to access the alsa settings, turn on/off the microphone, and record. It may not work with your current hardware, but it contains some clues which may get you pointed in the right direction.

    To be honest, you might be best served by using an HTML5 Canvas app, written in javascript. It will handle the microphone, the upload and the text display.

    UPDATE:

    Also, check out this: speech recognition for linux. There is an interesting link concerning using Gstreamer Gstreamer and speech recognition, it may just give you the solution.


    I'm not really a human, but I play one on earth. ..... an animated JAPH

      Thank you for your insights. As I thought, it may be something behind my reach. The possibility to use HTML5+javascript is of course okay, as it is documented and so on, however it would mean to drop Perl. And my second goal was to apply my legacy Perl scripts "live" to the transcribed text (regex, data visualization, etc.) and do computations on the incoming text. This would mean to rewrite everything from scratch in javascript (a language I know only vaguely), which is, of course, not a nice thought.

        It may be within your reach if you dig hard enough. :-) If you notice, python has modules and scripts which will do all the hard work for you. You can easily run python from Perl, then use Perl to do your filtering and display. It might be time to learn a bit of python. I might be tempted to try it myself, but the TensorFlow libraries are huge and complex and I have other fish to fry.

        I'm not really a human, but I play one on earth. ..... an animated JAPH

        No need to rewrite your Perl programs into Javascript.

        One option, you could create a simple web frontend that would feed the text transcript to your Perl programs.

Re: API continuous Speech-To-Text
by marto (Cardinal) on Sep 03, 2018 at 12:08 UTC

    I'd probably start by researching projects like Jasper to see how they work under the hood.

      Are you saying he should use Python instead? I say this because you found my suggestion odd (of using Perl to integrate non-Perl resources as is done in dvd::rip).

      Update: Jasper is written in python, so if not, then for example is there a way to import python modules you can help the OP with.

        marto: I'd probably start by researching projects like Jasper to see how they work under the hood.

        TheloniusMonk: Are you saying he should use Python instead?

        There is no complementary explanation for how that interpretation—of a simple, and rather constructive, suggestion—could have been reached.

        It took you one day after starting your account to dig up a dead thread started by sundialsvc4 to imply that he’s a misunderstood genius. If that’s not a hackle-raiser, I don’t know what is.

        You are also already picking fights with important contributors without showing code to demonstrate your points, cutting and pasting broken code instead of recommending modules, and throwing around a term like “ignore list” that most new monks would not know in just two weeks of being cloistered.…

        A reply falls below the community's threshold of quality. You may see it by logging in.

        As stated, I suggested they look at how this project deals with the details, since it's a working example of what they actually want to achieve. The question OP asked was "Do you think it is a very big/difficult project to have a Perl solution that", having access to the working python code OP can then use their perl knowledge to ascertain the difficulty/scale of a perl based solution.

        A reply falls below the community's threshold of quality. You may see it by logging in.

        "Update: Jasper is written in python, so if not, then for example is there a way to import python modules you can help the OP with."

        As addressed already in my other reply to you, before this update, OP asked "Do you think it is a very big/difficult project to have a Perl solution that:", followed by some criteria. The project I suggested as a reasonable basis for comparison, allowing OP to make a more informed decision. While things like Inline::Python exist, I wouldn't recommend them because it isn't practical in this case and more specifically it doesn't address the question OP actually asked.

Re: API continuous Speech-To-Text
by TheloniusMonk (Sexton) on Sep 03, 2018 at 11:51 UTC
    If it's for a linux server, there's a very mature open source module dvd::rip which has to use non-Perl modules from various sources, not just ALSA. I mention this only for the technique it uses, not because it uses what you need. But where no existing Perl module can be found, it might be easier to find a module written in C that supplies what you need and follow the examples in dvd::rip for how to integrate them with Perl. (updated)

      How / where would a GUI for copying DVDs help when trying to convert speech audio to text?

        OK I'll try to rephrase

      Seems like a odd suggestion, which of the problems raised would this approach address exactly?