Ashx has asked for the wisdom of the Perl Monks concerning the following question:

Hey all

I want to make a program which does the following :

- Input audio from sound card line-in
- Get it in raw format (FIFO of the data, xxxx Hz sampling rate x xxxx bit resolution)
- Do some processing which would be suitable to show a matching visualization (so need to be able to run DTFT to obtain the spectrum)

This should be able to run in near real time (no requiring of "proper" real time, just be able to run along with the arriving analog input, not buffering it all for revisit later

This should run on a modern Linux system, where the main sound platform is ALSA, and support for old Linux sound technologies (OSS, etc) might not be available or not work correctly etc

So what i am asking is, what libraries or example code would you recommend me to look into ? - I found several mentions of modules on cpan which use OSS, but i am not sure whether it is supported properly on modern linux, and whether there are any newer and better suited libraries which are not widely known

Best Wishes - Ash
  • Comment on Audio input and processing - recommendations

Replies are listed 'Best First'.
Re: Audio input and processing - recommendations
by Corion (Patriarch) on Nov 22, 2023 at 07:24 UTC

    When I last wanted audio input in a portable manner, I cheated and simply read output from ffmpeg:

    # record from PulseAudio device 11 open my $voice, 'ffmpeg -hide_banner -loglevel error -nostats -f pulse + -i 11 -t 30 -ac 1 -ar 44100 -f s16le - |'; binmode $voice, ':raw'; while( ! eof($voice)) { read($voice, my $buf, 3200); ...

    This also works on Windows (but with different parameters for the input format).

    Obviously, this has worse latency than getting the information from ALSA / Pulse / Jack, so I don't know if this is suitable to you.

Re: Audio input and processing - recommendations
by bliako (Abbot) on Nov 22, 2023 at 11:39 UTC

    After reading Corion's answer I thought that it might be faster to let the command line do the FFT and pass that data to perl. I modified Corion's command-line to pipe into sox which can do basic FFT. I have also modified the ffmpeg options to output in sox format -f sox from reading my soundcard #0 -i hw:0 with alsa, thusly:

    ffmpeg -hide_banner -loglevel error -nostats -f alsa -i hw:0 -t 30 -ac + 1 -ar 44100 -f sox - | sox -t sox - -n stat -freq

    Also, see this: https://www.linuxquestions.org/questions/linux-software-2/why-does-sox-stat-freq-give-me-different-data-multiple-times-927589/ for example.

    Alternatively, you can use Corion's code to read raw audio bytes and plug them into PDL::FFTW3 to get FFT with a state-of-the-art C library (FFTW3).

    Following the latter route, you obviously have more control on what processing to do, albeit slower (hmm benchmarks?).

    The other route is to replace sox with custom C code (based on FFTW3 library) to do the FFT and any other processing you want closer to the hardware.

    Finally, there are lots of other command-line-based software in Linux to build a processing pipeline for your needs.

    Last but not least: PureData (PD) https://puredata.info/

    bw, bliako

      I have not tried audio input (yet), but found the combination of PDL and SoX quite powerful to synthesize and play sound, including spectral analysis. Here's a screenshot of one of my experiments (sine waves and playing with overtones).

      I use PDL to create the raw audio data and SoX to pipe them to whatever sound system is available (means: also works on MS-Windows), the waveform and spectrum display happen in real-time.

      I plan to write an article about stuff like this for quite some time now, but there are so many distractions...

        A little later: haj did write at least one article about PDL and sound-processing - I'm listening as I write this to the very nice bit of music he has at the end.

        For audio input, I'd note that a physical limitation of doing DFT on a finite, short window of discrete input is that it creates artifacts from cut-off. This is mitigated in various ways, and the easiest way in PDL-land is to use PDL::DSP::Windows. See jjatria's Advent article for more.

        More generally for real-time-ish stuff, PDL doesn't yet have a very fully-tested real-time capability. I still intend to experiment more fully, but one approach might be to set up a "flowing" transformation (so you pay the setup cost only once), then keep updating the input sample then reading the processed output. If anyone does have a go at that, I'd love to hear your findings!

Re: Audio input and processing - recommendations
by InfiniteSilence (Curate) on Nov 22, 2023 at 03:32 UTC

    When I look for 'digital audio workstation' on metacpan.org I get Nama. I would install that, see how it works, and tear out the parts I needed as a starting point.

    Celebrate Intellectual Diversity

Re: Audio input and processing - recommendations
by cavac (Prior) on Nov 23, 2023 at 15:43 UTC

    On a modern Linux system, you have the option to run PulseAudio. On many desktop installation it's the default.

    Aside from the many strange and wonderful things PulseAudio can do, it also has protocol modules which allow you external connections. Especially interesting for you might be the TCP or UDP modules, because some of those are completely agnostic to the underlying hardware/software implementation. These work on localhost and over the network.

    As an additional bonus, it should be quite easy to simulate a session by running a fake server with minimal Perl for regression testing.

    PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
      On a modern Linux system, you have the option to run PulseAudio. On many desktop installation it's the default.

      On a really modern Linux system, you have the option to run PipeWire. That's what I would be looking at for any new audio projects today. It's now the default on Pop!OS, Fedora, Ubuntu and Debian at least.


      🦛

        In writing an audio application for Linux, you can target either ALSA or JACK. PulseAudio and PipeWire present an ALSA device to applications, and PW presents a JACK device as well.

        The choice of audio server really depends on your use case. The OP's application doesn't require any of the features of PulseAudio or PipeWire. ALSA alone would be sufficient. JACK provides the stability and predictable latency needed for professional music production while allowing easy patching among multiple applications. PA and PW are convenient for running multiple independent applications such as browsers and media players where central controls for volume, muting, etc. are desirable. PW does support JACK, but still lacks support for latency compensation among multiple signal paths that Ardour (a free, pro-quality DAW) provides.

Re: Audio input and processing - recommendations
by Ashx (Novice) on Nov 26, 2023 at 21:29 UTC
    Thanks for all the recommendations

    I will go the FFmpeg way, running it on bare ALSA for now. (And if not, it is just changing some parameters for the FFmpeg input file)

    The latency of such solution does not bother me too much at this stage, but i want to address another issue :

    I want the processing of the audio stream to be continuous, uninterrupted

    Lets's say i can do some processing within the file read loop, but after the time interval (30 sec) i have to re run FFmpeg again. I might miss part of the audio completely, or cut it in the middle of a note which will then lead to an error in the data processing algorithm

    Also, if i try to do a burst of intensive processing at some point within the interval, i might hang the file read loop and miss some data then too

    I have seen an FFmpeg module on CPAN, https://metacpan.org/pod/FFmpeg Wonder if this could be of any use or better in any way vs. reading FFmpeg output through a pipe ?

    For now i'm just experimenting with FFmpeg and getting output from it
      I have seen an FFmpeg module on CPAN, https://metacpan.org/pod/FFmpeg

      Unfortunately, FFmpeg has no passing tests and a couple of open tickets from many years ago reporting the fact. That isn't to say you couldn't get it to work but it might take some effort and even then the module appears to be rather unsupported.


      🦛

      I guess I'll mention VideoLAN::LibVLC, though I only implemented callbacks to process video frames. The design is all set to be able to handle the audio frames, but that code isn't written yet. It would help with the real-time aspect though, if you wanted details like the timestamp of the audio frame at the time it was decoded/captured. If you go that route I'm happy to offer advice on the code, but I don't really have time to develop it myself.

      Also, I haven't tested it on new versions of libvlc since 2019, so there might be even more work to do, depending on which version of libvlc you want to use.

      I'd also recommend getting familiar with Inline::C if you aren't already.