|Keep It Simple, Stupid|
Let's simplify this to a decision problem to have a start.
Let's suppose we have n SRT-files with different time-stamps, and one is a perfect match to a given soundtrack.
Now we want to rank which ones fit best. (That's actually a real life scenario)
With SRT-files I can easily tell sequences of non-speech gaps, like here 1.3 secs between 00:05:15,300 and 00:05:16,400
I could check how the gaps of those n SRTs overlap with "silent" passages in the soundtrack (e.g an XOR metric) and rank the SRTs by proximity.
Question: how can I technically get the timestamps of silent passages of a soundtrack?
Let's define silent as falling under a certain volume's threshold after filtering frequencies.
In reply to Re^2: Adjust synchronizations of video and subtitles automatically by temporal distribution