dimar has asked for the wisdom of the Perl Monks concerning the following question:

GK likes to record all of his university class lectures to MP3 files. The problem:

The Question

In the realm of 'audio munging' (as opposed to text munging), what is the correct terminology for the concepts of Clipping out Dead Air, Annotating, and Concatenating? Since GK knows this is easy with perl in the text world, is there an analogous simple perl solution in the audio world? What's a good place to start?

Replies are listed 'Best First'.
Re: MP3 Concatenation
by andyf (Pilgrim) on Jun 11, 2004 at 19:34 UTC
    Compression

    Sound files come in two distinct flavours, linear and compressed. Linear files are .wav and family, while compressed files include .mp3 and others. The main difference is that you can directly edit a linear sound file, each value in the simple time ordered list is a amplitude value (sample) for that sound. Compressed formats are composed of frames. Even if you can get an edit directly on a frame boundary the results are often terrible clicks because the uncompressed (recovered) value of a frame depends on its predecessors, hence arbitary frame sequences may be illegal. The prefered method in all cases is to convert to a simple time domain form , then edit, then convert back to mp3.

    Editing

    Practical editing to achieve your first goal, trimming, is called auto-trimming. The Windows application Sound Forge (which also imports and exports .mp3 files) has a very nice built in function to remove silence. You should also look at the open source Audacity which is scriptable in Lisp. Audio editing terminology is generally 'cutting' a piece of audio out and 'splicing' a piece of audio in. 'butt joins' (pure concatenation) is rare, usually you will use a crossfade. The parameters of each audio segment are therefore, start time, end time, and a transfer function which says how the clip you have cut will fade into the next. Removing silence works by scanning the file to find the rms (root mean square) of lowest average signal level, the noise floor, and chop out all the bits where the signal falls below this. You can get special plugins that are very effective at doing this on voice signals (for exactly your stated application ). My gut instinct is _not_ to try signal processing in Perl, but I haven't tried it (yet) and don't speak from experience. For batch processing offline files it might be effective but slow.



    Note: using a very high threshold as a parameter to auto-trimming has an interesting side effect... you actually remove the silences between words and alter the timing of spoken phrases, normally (for TV and Radio) this would be totally unacceptable, but as a study aid the upshot is that you get much better retention because the timescale of the entire lectures salient features is possibly 20% or less of the original.
Re: MP3 Concatenation
by halley (Prior) on Jun 11, 2004 at 16:48 UTC
    I would say the term you're looking for might be "production" or "post-production." Some of the production work can be automated, and some cannot.

    MP3 format is a stream of chunks. I think for the most part, chunks are independent. There are modules that let you find the chunks and then you can order or redact them any way you like.

    However, even if you decode each MP3 chunk to examine it as a waveform, it's not necessarily a computationally trivial task to decide if the chunk is "useful" or "chaff" which you can skip.

    --
    [ e d @ h a l l e y . c c ]

      Hiya halley

      Some of the production work can be automated, and some cannot.
      Ok, one assumption was it would be necessary to first find the relevant 'useful' and 'chaff' sections by hand (eg by writing down the times, or by inserting a 'marker' inside the MP3 files). Of course this assumes there is a way to correlate the 'chunks' with time, or to add 'markers' to the chunks. The preliminary web searching seems to indicate that .wav files may support this, and that perl may not be a good way to go for this kind of production process. That would be a shame.

      Any additional insights welcome!

        This is how it is done with the various opensource PVRs for the commercial cutting. The closed source (pay) Snapstream does it automatically with pretty good success

        I wonder if the start of the static can be detected (much like song start detection in many of the mp3 rippers) and marked for cutting.

        No one has seen what you have seen, and until that happens, we're all going to think that you're nuts. - Jack O'Neil, Stargate SG-1

Re: MP3 Concatenation
by karlgoethebier (Abbot) on Jun 05, 2017 at 09:46 UTC
    "...is there an analogous simple perl solution in the audio world..."

    It isn't there. Use some audio editing software. On the Mac i use Amadeus. The manual is a good point to start even if you use another tool. Audacity is pain in the ass IMHO. SoundForge was a solid solution over decades for the PC but i don't use PCs no more.

    Regards, Karl

    «The Crux of the Biscuit is the Apostrophe»

    Furthermore I consider that Donald Trump must be impeached as soon as possible

Re: MP3 Concatenation
by Anonymous Monk on Jun 05, 2017 at 13:38 UTC

    You can concatenate MP3 files just by concatenating them. Though some MP3 players might not like this if the files were recorded at different bit rates. And you might want to strip off any ID3 headers/trailers (see id3.org for more information).

    For anything more complicated than that, there are two problems: A) MP3 is a lossy compression format, so if you edit it several times, you will gradually degrade the quality of the audio. 2) MP3 is an "international standard", which means that it's a PITA to find any information about it without paying hundreds of dollars for a nigh-incomprehensible specification document.

Re: MP3 Concatenation
by andyf (Pilgrim) on Jun 16, 2004 at 01:10 UTC