llancet has asked for the wisdom of the Perl Monks concerning the following question:

I want to make such a software, which will let user:
1: select a segment of music
2: perform FFT on that segment
3: shows the frequency spectrum, thus get individual pitch of a harmony.

I know that PDL can do FFT, and GUI things can be done using Gtk2 and GooCanvas. The remained problem is: how can I get the time-amplitude data as the input of FFT? I need a parser of some main-stream format, such as mp3 or ogg. But will they provide output that can be treated as time-amplitude data?

Thanks!

Replies are listed 'Best First'.
Re: Music and FFT
by zentara (Cardinal) on Oct 18, 2010 at 13:01 UTC
    I know that PDL can do FFT.....The remained problem is: how can I get the time-amplitude data as the input of FFT

    You probably want the Audio::DSP module. Here is an example of pulling the audio bits right off of the capture device of the soundcard's playback device.

    #!/usr/bin/perl use Audio::DSP; #alsamixer must be setup right, just turn everything up :-) ($buf, $chan, $fmt, $rate) = (4096, 1, 16, 22050); # $fmt=8 will work, but it's better to use the formats in soundcard. +h # some soundcards won't like 8-bit sound, or may need to be re-initial +ized $dsp = new Audio::DSP(buffer => $buf, channels => $chan, format => $fmt, rate => $rate); $dsp->init() || die $dsp->errstr(); open(OUT, ">out.raw") or warn $!; while ( $buffer = $dsp->dread(4096) ) { print OUT $buffer; } $dsp->close(); # to convert to wave # sox -r 8000 -u -b -c 1 out.raw out.wav
    You will then have the raw audio data, which you will have to separate out into individaul audio bytes by channels, format and rate.

    It is not simple code, to put audio onto a canvas, have a cursor select a section, determine the actual seek positions for begin and end in the audio file, then feed that to FFT.

    To compound the problem, Perl is much slower than C for doing numerical computations, so you probably want a module that does the FFT thru a C based xs based module.

    Even to plot the audio on a canvas, requires some intensive calculations to read the audio file, byte by audio byte. See Zero sound detection with Tk graphics and Tk-applause-meter


    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
      It seems /dev/dsp is required. However it seems that /dev/dsp is provided by a legancy kernel module named oss. What about alsa?
      I don't have much knowledge about linux sound architecture, is there any introduction materials?
        Google for "linux sound tutorial". Basically, OSS was the original sound system, which now has been replaced by alsa( which has a oss emulation layer for backward compatibility). Alsa is the low level device driver. There are even higher level sound daemons in use now. For instance, Ubuntu uses the Enlightenment sound daemon, which acts as a sound server, on top of alsa. Alsa is usually built into the kernel modules, and gives basic access to the DSP, speaker, microphone, etc. There is a set of basic alsa utilities, like aplay, arecord, amixer, and alsactl, with which you can use to control the /dev/dsp.

        It would seem to be simple, but sound can be complex with all the options available, like Jack, etc.

        If you want one good commandline tool for conversions on linux, see Sox


        I'm not really a human, but I play one on earth.
        Old Perl Programmer Haiku ................... flash japh
Re: Music and FFT
by Anonymous Monk on Oct 18, 2010 at 05:16 UTC
    Such software already exists (audacity), how would yours be different?

      Great software! It provides almost functionality I need.

      However, there's still something I need that it don't have. I'll ask those developers.

      Thanks a lot!
Re: Music and FFT
by BrowserUk (Patriarch) on Oct 20, 2010 at 05:29 UTC

    In the past, I converted mp3s to .wav format using a free tool (one of dozens) found on the internet. The tool I used was for Windows, but I just did a search and found http://www.mpg123.de/, which can apparently do this using mpg123 -w <filename>.wav <filename>.mp3

    Wav files contain PCM data in an uncompressed (RIFF) format which makes them easy to read and manipulate. PDL has a module PDL::Audio that can read .wav files directly: $pdl = raudio "file.wav";

    One thing worth mentioning is that mp3 and ogg formats are "lossy", so the fidelity of the sound it manipulated. Human ears may not detect it, but FFT certainly will.


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Music and FFT
by bart (Canon) on Oct 20, 2010 at 12:53 UTC
    Note: FFT only works with a sample count that is a power of 2, for example, 512.

    And it assumes the waveform is periodic, i.e. after sample 511 (in the case of 512 samples) comes sample 0. That most likely implies a level jump, which will cause inclusion of a lot of harmonics that are not actually in the original sound. Think: sawtooth, for an original gentle slope, which includes all harmonics(even+odd) at an amplitude level 1/n. There are ways to reduce that effect, for example doubling the sample count while mirroring the original samples backwards, turning the sawtooth into a triangle (odd harmonics only, amplitude 1/nē, which is a lot better). Even more advanced systems can make using of windowing functions, adding original samples from the left and right, to the side, which gently fade in/out.

      There's also another tough issue: what I'm going to hack is a piano music. As piano has rich harmonics even you press only one key, I need to distinguish fundamental frequency from harmonics.
      Any clue to do that? Maybe only through my ear and brain?
        Harmonics always have frequencies that are multiples of the base frequency. So, if in a result you have a lot of frequencies, you can simply ignore those that are multiples of another frequency which is also present, and strong. Note that after FFT, those frequencies will be approximated, and thus, not exact multiples... But if you have enough sample points, the resolution shoul become fine enough.

        Also, a piano has several strings per key, which are tuned in almost the same frequency... which produces the phasing effect you generally hear. The ribbles on strings, especially the lower strings, have a similar effect.

        Start experimenting with it. You'll probably get familiar with the results from the FFT, soon enough. Note that if you edit the values from the FFT data, you can convert it back to sound, and hear if your approximation is close enough.

        I need to distinguish fundamental frequency from harmonics. Any clue to do that?

        If you are recording the music you are analysing, yourself. Record directly into a non-lossy format (like.wav) and use as high a sample rate as your software allows. Many recorders can create .wav files with much higher sample rates than the stand 44,100 hz. Some will even allow you to specify any rate you type in, so FFT friendly powers of 2 are possible.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: Music and FFT
by pemungkah (Priest) on Oct 19, 2010 at 19:47 UTC
    You might also want to look at ChucK, which has the advantage of actully being a programming language instead of a scriptable application.