Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

This is a simplified speech recognition problem. The wav files consist of recordings of english letters (A-Z), and each letter is pronounced almost identically, is there a perl way to convert this to the letters? I'm thinking one can probably analyze the spectrum pattern, there are only 26 of them anyway. But I have no idea how to get such information from a wav file. Thanks.

Replies are listed 'Best First'.
Re: Convert wav file to letters
by Zaxo (Archbishop) on Jul 22, 2005 at 22:39 UTC

    Anon's advice on correlation and convolution is good. Check PDL for fast mathematics which will support that very well, particularly the Fourier transforms. It even has some builtin support for Audio.

    After Compline,
    Zaxo

Re: Convert wav file to letters
by BrowserUk (Patriarch) on Jul 22, 2005 at 22:59 UTC

    That's what I thought. Be warned, voice recognition is hard and to be honest, Perl wouldn't be my first choice of language for doing it in.

    Some of the questions you need to resolve are:

  • Are you trying to match one voice (that matches your training set) or many voices (with or without matching training sets)?
  • Individually spoken characters are not going to match occurances of those letters in continuous speech.

    Think about the different sounds that the letter 'c' has in "concession", or the 't's in 'traction'. Every letter in the alphabet has multiple sounds depending upon the word it is in, where in the word it is, the accent of the speaker (US ba-th -v- UK bar-th; US too-na -v- UK (ch)tu-na etc).

  • Most VRS uses syllables or phonemes because many words have similar sounds spelt differently. Eg. The 'shun' sounds at the end of 'comprehension', 'composition' etc.

    Unless your in for the long haul of deep research, you probably should look at existing solutions and libraries rather than rtying to stat from scratch.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      Thanks. My problem should be much easier. The "speech" are not words, they are simply letters, i.e., someone is spelling out the words. And there is only one voice, and we can even assume the person is very consistent. Thanks.
Re: Convert wav file to letters
by traveler (Parson) on Jul 22, 2005 at 22:33 UTC
Re: Convert wav file to letters
by BrowserUk (Patriarch) on Jul 22, 2005 at 22:20 UTC

    For you to associate a letter with each sound file, you must already know (or be able to hear), what that file contains, so there is no need to programmically analyse them.

    That probably means that your problem description is not a good description of the real problem you are trying to solve?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
    "Science is about questioning the status quo. Questioning authority".
    The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.
      well, I have a bunch of wav files as "training set", I can manually play them to know the letters. I'd like to use this training set to a large number of new wav files, for those I'd like an automated way of finding out the letters.
Re: Convert wav file to letters
by Anonymous Monk on Jul 22, 2005 at 22:29 UTC
Re: Convert wav file to letters
by true (Pilgrim) on Jul 24, 2005 at 17:50 UTC
    Firstoff, read and parse the wav file to understand it's sound sample patterns. 8bit and 16bit sounds are stored differently. I had some great help from BrowserUK in this node. Although it is a Win32 node, the storage of wav file data is the same across the board.
    jtrue