Re: Audio input and processing - recommendations
by Corion (Patriarch) on Nov 22, 2023 at 07:24 UTC
|
When I last wanted audio input in a portable manner, I cheated and simply read output from ffmpeg:
# record from PulseAudio device 11
open my $voice, 'ffmpeg -hide_banner -loglevel error -nostats -f pulse
+ -i 11 -t 30 -ac 1 -ar 44100 -f s16le - |';
binmode $voice, ':raw';
while( ! eof($voice)) {
read($voice, my $buf, 3200);
...
This also works on Windows (but with different parameters for the input format).
Obviously, this has worse latency than getting the information from ALSA / Pulse / Jack, so I don't know if this is suitable to you. | [reply] [d/l] [select] |
Re: Audio input and processing - recommendations
by bliako (Abbot) on Nov 22, 2023 at 11:39 UTC
|
After reading Corion's answer I thought that it might be faster to let the command line do the FFT and pass that data to perl. I modified Corion's command-line to pipe into sox which can do basic FFT. I have also modified the ffmpeg options to output in sox format -f sox from reading my soundcard #0 -i hw:0 with alsa, thusly:
ffmpeg -hide_banner -loglevel error -nostats -f alsa -i hw:0 -t 30 -ac
+ 1 -ar 44100 -f sox - | sox -t sox - -n stat -freq
Also, see this: https://www.linuxquestions.org/questions/linux-software-2/why-does-sox-stat-freq-give-me-different-data-multiple-times-927589/ for example.
Alternatively, you can use Corion's code to read raw audio bytes and plug them into PDL::FFTW3 to get FFT with a state-of-the-art C library (FFTW3).
Following the latter route, you obviously have more control on what processing to do, albeit slower (hmm benchmarks?).
The other route is to replace sox with custom C code (based on FFTW3 library) to do the FFT and any other processing you want closer to the hardware.
Finally, there are lots of other command-line-based software in Linux to build a processing pipeline for your needs.
Last but not least: PureData (PD) https://puredata.info/
bw, bliako | [reply] [d/l] [select] |
|
I have not tried audio input (yet), but found the combination of PDL and SoX quite powerful to synthesize and play sound, including spectral analysis. Here's a screenshot of one of my experiments (sine waves and playing with overtones).
I use PDL to create the raw audio data and SoX to pipe them to whatever sound system is available (means: also works on MS-Windows), the waveform and spectrum display happen in real-time.
I plan to write an article about stuff like this for quite some time now, but there are so many distractions...
| [reply] |
|
A little later: haj did write at least one article about PDL and sound-processing - I'm listening as I write this to the very nice bit of music he has at the end.
For audio input, I'd note that a physical limitation of doing DFT on a finite, short window of discrete input is that it creates artifacts from cut-off. This is mitigated in various ways, and the easiest way in PDL-land is to use PDL::DSP::Windows. See jjatria's Advent article for more.
More generally for real-time-ish stuff, PDL doesn't yet have a very fully-tested real-time capability. I still intend to experiment more fully, but one approach might be to set up a "flowing" transformation (so you pay the setup cost only once), then keep updating the input sample then reading the processed output. If anyone does have a go at that, I'd love to hear your findings!
| [reply] |
Re: Audio input and processing - recommendations
by InfiniteSilence (Curate) on Nov 22, 2023 at 03:32 UTC
|
When I look for 'digital audio workstation' on metacpan.org I get Nama. I would install that, see how it works, and tear out the parts I needed as a starting point.
Celebrate Intellectual Diversity
| [reply] |
Re: Audio input and processing - recommendations
by cavac (Prior) on Nov 23, 2023 at 15:43 UTC
|
On a modern Linux system, you have the option to run PulseAudio. On many desktop installation it's the default.
Aside from the many strange and wonderful things PulseAudio can do, it also has protocol modules which allow you external connections. Especially interesting for you might be the TCP or UDP modules, because some of those are completely agnostic to the underlying hardware/software implementation. These work on localhost and over the network.
As an additional bonus, it should be quite easy to simulate a session by running a fake server with minimal Perl for regression testing.
| [reply] |
|
On a modern Linux system, you have the option to run PulseAudio. On many desktop installation it's the default.
On a really modern Linux system, you have the option to run PipeWire. That's what I would be looking at for any new audio projects today. It's now the default on Pop!OS, Fedora, Ubuntu and Debian at least.
| [reply] |
|
In writing an audio application for Linux, you can target either ALSA or JACK. PulseAudio and PipeWire present an ALSA device to applications, and PW presents a JACK device as well.
The choice of audio server really depends on your use case. The OP's application doesn't require any of the features of PulseAudio or PipeWire. ALSA alone would be sufficient. JACK provides the stability and predictable latency needed for professional music production while allowing easy patching among multiple applications. PA and PW are convenient for running multiple independent applications such as browsers and media players where central controls for volume, muting, etc. are desirable. PW does support JACK, but still lacks support for latency compensation among multiple signal paths that Ardour (a free, pro-quality DAW) provides.
| [reply] |
Re: Audio input and processing - recommendations
by Ashx (Novice) on Nov 26, 2023 at 21:29 UTC
|
Thanks for all the recommendations
I will go the FFmpeg way, running it on bare ALSA for now. (And if not, it is just changing some parameters for the FFmpeg input file)
The latency of such solution does not bother me too much at this stage, but i want to address another issue :
I want the processing of the audio stream to be continuous, uninterrupted
Lets's say i can do some processing within the file read loop, but after the time interval (30 sec) i have to re run FFmpeg again. I might miss part of the audio completely, or cut it in the middle of a note which will then lead to an error in the data processing algorithm
Also, if i try to do a burst of intensive processing at some point within the interval, i might hang the file read loop and miss some data then too
I have seen an FFmpeg module on CPAN, https://metacpan.org/pod/FFmpeg Wonder if this could be of any use or better in any way vs. reading FFmpeg output through a pipe ?
For now i'm just experimenting with FFmpeg and getting output from it | [reply] |
|
| [reply] |
|
I guess I'll mention VideoLAN::LibVLC, though I only implemented callbacks to process video frames. The design is all set to be able to handle the audio frames, but that code isn't written yet. It would help with the real-time aspect though, if you wanted details like the timestamp of the audio frame at the time it was decoded/captured. If you go that route I'm happy to offer advice on the code, but I don't really have time to develop it myself.
Also, I haven't tested it on new versions of libvlc since 2019, so there might be even more work to do, depending on which version of libvlc you want to use.
I'd also recommend getting familiar with Inline::C if you aren't already.
| [reply] |