Audio input and processing - recommendations

Ashx has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Audio input and processing - recommendations by Corion (Patriarch) on Nov 22, 2023 at 07:24 UTC
When I last wanted audio input in a portable manner, I cheated and simply read output from `ffmpeg`: `# record from PulseAudio device 11 open my $voice, 'ffmpeg -hide_banner -loglevel error -nostats -f pulse + -i 11 -t 30 -ac 1 -ar 44100 -f s16le - \|'; binmode $voice, ':raw'; while( ! eof($voice)) { read($voice, my $buf, 3200); ...` [download] This also works on Windows (but with different parameters for the input format). Obviously, this has worse latency than getting the information from ALSA / Pulse / Jack, so I don't know if this is suitable to you.	[reply] [d/l] [select]
Re: Audio input and processing - recommendations by bliako (Abbot) on Nov 22, 2023 at 11:39 UTC
After reading Corion's answer I thought that it might be faster to let the command line do the FFT and pass that data to perl. I modified Corion's command-line to pipe into `sox` which can do basic FFT. I have also modified the `ffmpeg` options to output in sox format `-f sox` from reading my soundcard #0 `-i hw:0` with `alsa`, thusly: `ffmpeg -hide_banner -loglevel error -nostats -f alsa -i hw:0 -t 30 -ac + 1 -ar 44100 -f sox - \| sox -t sox - -n stat -freq` [download] Also, see this: https://www.linuxquestions.org/questions/linux-software-2/why-does-sox-stat-freq-give-me-different-data-multiple-times-927589/ for example. Alternatively, you can use Corion's code to read raw audio bytes and plug them into PDL::FFTW3 to get FFT with a state-of-the-art C library (FFTW3). Following the latter route, you obviously have more control on what processing to do, albeit slower (hmm benchmarks?). The other route is to replace `sox` with custom C code (based on FFTW3 library) to do the FFT and any other processing you want closer to the hardware. Finally, there are lots of other command-line-based software in Linux to build a processing pipeline for your needs. Last but not least: PureData (PD) https://puredata.info/ bw, bliako	[reply] [d/l] [select]
Re^2: Audio input and processing - recommendations by haj (Vicar) on Nov 22, 2023 at 12:36 UTC
I have not tried audio input (yet), but found the combination of PDL and SoX quite powerful to synthesize and play sound, including spectral analysis. Here's a screenshot of one of my experiments (sine waves and playing with overtones). I use PDL to create the raw audio data and SoX to pipe them to whatever sound system is available (means: also works on MS-Windows), the waveform and spectrum display happen in real-time. I plan to write an article about stuff like this for quite some time now, but there are so many distractions...	[reply]
Re^3: Audio input and processing - recommendations by etj (Priest) on Jan 13, 2025 at 16:53 UTC
A little later: haj did write at least one article about PDL and sound-processing - I'm listening as I write this to the very nice bit of music he has at the end. For audio input, I'd note that a physical limitation of doing DFT on a finite, short window of discrete input is that it creates artifacts from cut-off. This is mitigated in various ways, and the easiest way in PDL-land is to use PDL::DSP::Windows. See jjatria's Advent article for more. More generally for real-time-ish stuff, PDL doesn't yet have a very fully-tested real-time capability. I still intend to experiment more fully, but one approach might be to set up a "flowing" transformation (so you pay the setup cost only once), then keep updating the input sample then reading the processed output. If anyone does have a go at that, I'd love to hear your findings!	[reply]
Re: Audio input and processing - recommendations by InfiniteSilence (Curate) on Nov 22, 2023 at 03:32 UTC
When I look for 'digital audio workstation' on metacpan.org I get Nama. I would install that, see how it works, and tear out the parts I needed as a starting point. Celebrate Intellectual Diversity	[reply]
Re: Audio input and processing - recommendations by cavac (Prior) on Nov 23, 2023 at 15:43 UTC
On a modern Linux system, you have the option to run PulseAudio. On many desktop installation it's the default. Aside from the many strange and wonderful things PulseAudio can do, it also has protocol modules which allow you external connections. Especially interesting for you might be the TCP or UDP modules, because some of those are completely agnostic to the underlying hardware/software implementation. These work on localhost and over the network. As an additional bonus, it should be quite easy to simulate a session by running a fake server with minimal Perl for regression testing. PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP	[reply]
Re^2: Audio input and processing - recommendations by hippo (Archbishop) on Nov 23, 2023 at 16:16 UTC
On a modern Linux system, you have the option to run PulseAudio. On many desktop installation it's the default. On a really modern Linux system, you have the option to run PipeWire. That's what I would be looking at for any new audio projects today. It's now the default on Pop!OS, Fedora, Ubuntu and Debian at least. 🦛	[reply]
Re^3: Audio input and processing - recommendations by gnosti (Chaplain) on Nov 24, 2023 at 02:00 UTC
In writing an audio application for Linux, you can target either ALSA or JACK. PulseAudio and PipeWire present an ALSA device to applications, and PW presents a JACK device as well. The choice of audio server really depends on your use case. The OP's application doesn't require any of the features of PulseAudio or PipeWire. ALSA alone would be sufficient. JACK provides the stability and predictable latency needed for professional music production while allowing easy patching among multiple applications. PA and PW are convenient for running multiple independent applications such as browsers and media players where central controls for volume, muting, etc. are desirable. PW does support JACK, but still lacks support for latency compensation among multiple signal paths that Ardour (a free, pro-quality DAW) provides.	[reply]
Re: Audio input and processing - recommendations by Ashx (Novice) on Nov 26, 2023 at 21:29 UTC
Thanks for all the recommendations I will go the FFmpeg way, running it on bare ALSA for now. (And if not, it is just changing some parameters for the FFmpeg input file) The latency of such solution does not bother me too much at this stage, but i want to address another issue : I want the processing of the audio stream to be continuous, uninterrupted Lets's say i can do some processing within the file read loop, but after the time interval (30 sec) i have to re run FFmpeg again. I might miss part of the audio completely, or cut it in the middle of a note which will then lead to an error in the data processing algorithm Also, if i try to do a burst of intensive processing at some point within the interval, i might hang the file read loop and miss some data then too I have seen an FFmpeg module on CPAN, https://metacpan.org/pod/FFmpeg Wonder if this could be of any use or better in any way vs. reading FFmpeg output through a pipe ? For now i'm just experimenting with FFmpeg and getting output from it	[reply]
Re^2: Audio input and processing - recommendations by hippo (Archbishop) on Nov 27, 2023 at 11:22 UTC
I have seen an FFmpeg module on CPAN, https://metacpan.org/pod/FFmpeg Unfortunately, FFmpeg has no passing tests and a couple of open tickets from many years ago reporting the fact. That isn't to say you couldn't get it to work but it might take some effort and even then the module appears to be rather unsupported. 🦛	[reply]
Re^2: Audio input and processing - recommendations by NERDVANA (Priest) on Nov 28, 2023 at 07:09 UTC
I guess I'll mention VideoLAN::LibVLC, though I only implemented callbacks to process video frames. The design is all set to be able to handle the audio frames, but that code isn't written yet. It would help with the real-time aspect though, if you wanted details like the timestamp of the audio frame at the time it was decoded/captured. If you go that route I'm happy to offer advice on the code, but I don't really have time to develop it myself. Also, I haven't tested it on new versions of libvlc since 2019, so there might be even more work to do, depending on which version of libvlc you want to use. I'd also recommend getting familiar with Inline::C if you aren't already.	[reply]