Dirk80 has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks,

I have a lot of mp3 files like bere.mp3 with the corresponding text file like bere.txt.

My goal is it to split the mp3 file at silence. If this is done correctly then each line of the text file should have its own mp3 file. In my example there should be 7 mp3 files as result.

I found the module Audio::FindChunks. According to its description it is exactly what I need.

I read the document of this module but I did not understand exactly each parameter. So I had to play try and error.

I took the split_file function of this module and played around with the following parameters.

# Splitting into runs of signal/noise max_tracks, min_signal_sec, min_silence_sec, ignore_signal_sec, min_si +lence_chunks_merge

But I did not get the correct result. Sometimes no split was happening. Sometimes the file was only splitted into 2 or 3 mp3 files. But I never did it to split it into 7 mp3 files.

Thank you for your help.

Greetings,

Dirk

Replies are listed 'Best First'.
Re: Split MP3 file at silence
by educated_foo (Vicar) on Apr 23, 2011 at 22:24 UTC
    I haven't used this module, or tried to solve this problem, but I would try timing the gaps between verb clauses by hand -- they all seem about the same. You can then use that to set min_silcence_sec, then play around with the smoothing parameters.
Re: Split MP3 file at silence
by chrestomanci (Priest) on Apr 24, 2011 at 19:36 UTC

    I think the likely reason that your mp3 is only being split into 2 or 3 parts rather than the expected 7 is that the silence is not long enough to be detected. The first thing I would do would be to load the audio files into a waveform editor such as audacity, look at the waveform and get an idea of the loudness levels and lengths of each pause between words.

    I note from the docs for Audio::FindChunks that it does a lot of smoothing. Perhaps it does to much to the point that consecutive words are smoothed together. Can you adjust the smoothing parameters so that only much shorter periods of audio are smoothed together?

    A few years ago, I was cutting mp3 files, but unlike you I did not have handy periods of silence to denote the cut points. Instead I had to listen to the audio, and look at it with a wave editor until I could identify cut points by sight, and then I typed the time offsets into a cut list, and used a lossless mp3 cuter to do the cuts. After a while I became quite skilled at finding the cut points, and typing them in, so if your number of mp3 files is reasonable, you might try that approach, instead of spending more time programming.

      I'm trying and trying, but I don't get it.

      First thank you for the hint with audacity. This tool had even the possibility to do a silence/noise recognition. So I was able to see my mp3 file graphically and how audacity was splitting it. Look here: bere.png. My problem with audacity was then that I could not find out how to save the relevant (signals) parts in different mp3 files.

      Here my interpretation of the parameters I used in Audio::FindChunks

      • min_silence_sec: number of seconds until silence is regarded as silence, I set this value to 0.4 because according to the graphically interpretation the silence length seems to be about 0.5 seconds. And I wanted to assure that my value is long enough to really get the desired silences but shorter as 0.5 seconds.
      • min_signal_sec: number of seconds until signal is regarded as signal. The shortest signal seems to be about 0.6 seconds and the longest about 1.2 seconds. So I set this value to 0.5 seconds to be sure that each signal is recognized as signal.
      • above_thres_window: One of those smoothing parameters which you recommended me to change. The unit is in chunks. If I understand it right then one chunk is 0.1 seconds. I do not really understand this parameter. I played around with it. My experience was that if I set it high then I get a longer chunk. If I set it very low (e.g. 1) then I get a shorter chunk.

      So here my code:

      use strict; use warnings; use Audio::FindChunks; Audio::FindChunks->new(filename => 'bere.mp3', min_silence_sec => 0.4, min_signal_sec => 0.5, above_thres_window => 5)->split_file({verbose=> +1});

      But in this case I only get one file as result which is going from 0 - 4.6 seconds.

      To your other question to do it manually. This is not really possible for me. I want to write a conjugation/vocabulary trainer for me to learn the Italian language. I bought a SW which can create mp3 files out of a text file. But it is too much work for me to always give as input a text file with only one vocabulary. Because this tool has no command line or other possibilities I decided to enter a huge text file and then afterwards split the huge mp3 file at the silence points and then give them the name of the corresponding line in the text file.

      At the end my goal is it to have a lot of vocabularies. So a correct splitting at silence points is at the moment my biggest problem.

      Every other hint how I could do it is welcome. My favourite solution is of course perl. But I'd also be happy about every other way to solve it.

        I think you need to use much shorter time intervals for the min_silence_sec and min_signal_sec parameters. I don't know how the internals of that module work, and how it does silence detection, but it is quite possible that it chops the audio up into chunks that are min_silence_sec, and tests each to see if they are silent. In practice most of those chunks will contain some signal, so the module will not detect much silence.

        If I where you, I would write a script that tries out lots of different values for those three parameters, and tabulates the results according to the number of chunks found. You know that there ought to be 7 chunks in the sample you linked to, so just try out lots of values for min_signal_sec and min_silence_sec between 0.01 sec and 0.5 sec, and in each case count the number of chunks you get back. Ideally you should do permutations and build a table, or perhaps use some sort of genetic algorithm to home in on the correct set of settings.

        As you have already found, if min_silence_sec and min_signal_sec are to long, then not enough chunks are found. If they are to small, then your audio will get split in to many places, but there will probably be a wide range of values that work correctly. From that you can try feeding your test script with some more audio files until you know what values work reliably.

        The mp3 chunk size is 1/25 of a second. It is in the docs for MP3::Split by the same author as Audio::FindChunks.