in reply to Split MP3 file at silence

I think the likely reason that your mp3 is only being split into 2 or 3 parts rather than the expected 7 is that the silence is not long enough to be detected. The first thing I would do would be to load the audio files into a waveform editor such as audacity, look at the waveform and get an idea of the loudness levels and lengths of each pause between words.

I note from the docs for Audio::FindChunks that it does a lot of smoothing. Perhaps it does to much to the point that consecutive words are smoothed together. Can you adjust the smoothing parameters so that only much shorter periods of audio are smoothed together?

A few years ago, I was cutting mp3 files, but unlike you I did not have handy periods of silence to denote the cut points. Instead I had to listen to the audio, and look at it with a wave editor until I could identify cut points by sight, and then I typed the time offsets into a cut list, and used a lossless mp3 cuter to do the cuts. After a while I became quite skilled at finding the cut points, and typing them in, so if your number of mp3 files is reasonable, you might try that approach, instead of spending more time programming.

Replies are listed 'Best First'.
Re^2: Split MP3 file at silence
by Dirk80 (Pilgrim) on Apr 25, 2011 at 20:11 UTC

    I'm trying and trying, but I don't get it.

    First thank you for the hint with audacity. This tool had even the possibility to do a silence/noise recognition. So I was able to see my mp3 file graphically and how audacity was splitting it. Look here: bere.png. My problem with audacity was then that I could not find out how to save the relevant (signals) parts in different mp3 files.

    Here my interpretation of the parameters I used in Audio::FindChunks

    • min_silence_sec: number of seconds until silence is regarded as silence, I set this value to 0.4 because according to the graphically interpretation the silence length seems to be about 0.5 seconds. And I wanted to assure that my value is long enough to really get the desired silences but shorter as 0.5 seconds.
    • min_signal_sec: number of seconds until signal is regarded as signal. The shortest signal seems to be about 0.6 seconds and the longest about 1.2 seconds. So I set this value to 0.5 seconds to be sure that each signal is recognized as signal.
    • above_thres_window: One of those smoothing parameters which you recommended me to change. The unit is in chunks. If I understand it right then one chunk is 0.1 seconds. I do not really understand this parameter. I played around with it. My experience was that if I set it high then I get a longer chunk. If I set it very low (e.g. 1) then I get a shorter chunk.

    So here my code:

    use strict; use warnings; use Audio::FindChunks; Audio::FindChunks->new(filename => 'bere.mp3', min_silence_sec => 0.4, min_signal_sec => 0.5, above_thres_window => 5)->split_file({verbose=> +1});

    But in this case I only get one file as result which is going from 0 - 4.6 seconds.

    To your other question to do it manually. This is not really possible for me. I want to write a conjugation/vocabulary trainer for me to learn the Italian language. I bought a SW which can create mp3 files out of a text file. But it is too much work for me to always give as input a text file with only one vocabulary. Because this tool has no command line or other possibilities I decided to enter a huge text file and then afterwards split the huge mp3 file at the silence points and then give them the name of the corresponding line in the text file.

    At the end my goal is it to have a lot of vocabularies. So a correct splitting at silence points is at the moment my biggest problem.

    Every other hint how I could do it is welcome. My favourite solution is of course perl. But I'd also be happy about every other way to solve it.

      I think you need to use much shorter time intervals for the min_silence_sec and min_signal_sec parameters. I don't know how the internals of that module work, and how it does silence detection, but it is quite possible that it chops the audio up into chunks that are min_silence_sec, and tests each to see if they are silent. In practice most of those chunks will contain some signal, so the module will not detect much silence.

      If I where you, I would write a script that tries out lots of different values for those three parameters, and tabulates the results according to the number of chunks found. You know that there ought to be 7 chunks in the sample you linked to, so just try out lots of values for min_signal_sec and min_silence_sec between 0.01 sec and 0.5 sec, and in each case count the number of chunks you get back. Ideally you should do permutations and build a table, or perhaps use some sort of genetic algorithm to home in on the correct set of settings.

      As you have already found, if min_silence_sec and min_signal_sec are to long, then not enough chunks are found. If they are to small, then your audio will get split in to many places, but there will probably be a wide range of values that work correctly. From that you can try feeding your test script with some more audio files until you know what values work reliably.

      The mp3 chunk size is 1/25 of a second. It is in the docs for MP3::Split by the same author as Audio::FindChunks.

        I now tried to automatically get all combinations which produce 7 mp3 files.

        Here is the code:

        use strict; use warnings; use Audio::FindChunks; my @min_silence_sec_list = map { $_/100 } (0..50); my @min_signal_sec_list = map { $_/100 } (0..50); my @above_thres_window_list = (0,1,2,3,4); open( my $CSV_FH, ">", "split_results.csv") or die $!; for my $min_silence_sec ( @min_silence_sec_list ) { for my $min_signal_sec ( @min_signal_sec_list ) { for my $above_thres_window ( @above_thres_window_list ) { Audio::FindChunks->new(filename => 'bere.mp3', min_silence_sec => $min_silence_sec +, min_signal_sec => $min_signal_sec, + above_thres_window => $above_thres_ +window)->split_file(); my @split_mp3_files = glob("*_bere.mp3"); my $num_chunks = scalar(@split_mp3_files); unlink(@split_mp3_files); if( $num_chunks == 7 ) { print $CSV_FH "$min_silence_sec;$min_signal_sec;$above_thres_win +dow\n"; } } } } close($CSV_FH);

        Afterwards I tried some of these combinations. But the results are not good. Often the split is done in the middle of a word,...

        Difficult how to continue now with this module.

        But I learned one obvious thing from you and I don't know why I did not have this idea on my own. I was first trying manually the possible combinations. But you are right with your advice to do this automatically with a script as I have done it now.

        But the good message is that I now know how to solve the problem with audacity. It has a "Soundfinder" plugin and this is working perfectly with all my mp3 files.

        I'm still interested how to solve it with the Audio::FindChunks module. But I don't know how to continue now. Perhaps taking more parameters into account and trying more combinations or even comparing the split times with the correct split times I got from audacity.