http://qs1969.pair.com?node_id=11150042

I'm asking this as a meditation, because I don't expect any Perl (or other) code for that.

Questions:
I'm looking for a low tech solution offering a handful of plausible adjustments to chose, not a speech recognition bazooka (like YT's auto-subtitles)

Background:

I'm often downloading foreign language movies and like to see them with original voice and subtitles to practice and learn vocabulary, but am often obliged to download and adjust the subtitles timing, because

there are already Perl modules to fix the first two cases for .srt files.

That is, if the parameters are known. But finding them can be tricky.

FWIW: VLC offers an option for such synchros, but tends to freeze for a minute if the shift is in the area of 20 secs. No fun when trying out the best settings.

Cheers Rolf
(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
Wikisyntax for the Monastery

  • Comment on Adjust synchronizations of video and subtitles automatically by temporal distribution

Replies are listed 'Best First'.
Re: Adjust synchronizations of video and subtitles automatically by temporal distribution
by afoken (Chancellor) on Jan 31, 2023 at 08:48 UTC
    Is it possible to filter speech frequencies in a video with significant accuracy to identify the passages were people talk?

    Telephony started with very bad microphones, transmitting barely anything outside the range 300 Hz to 3 kHz, but that was "good enough". Technical development improved the microphones, but analog telephony was and still is intentionally limited to that frequency range. Even when switching to ISDN, the sampling rate was only 8 kHz, limiting audio to about 3 kHz. Things changed only after migration to SIP, with "HD" audio codecs that allow higher frequencies, using more bandwidth and/or more available computing power.

    So I would expect that a filter with that frequency range could be a usable indicator for speech.

    Unfortunately, because the human ear is most sensitive in exactly this range, almost all audible warning signals also use that frequency range. So you will get some false positives. A FFT should be able to identify sharp peaks coming from all kind of beepers and ignore those peaks.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      I suspect you will also get some false positives with big budget movie musical scoring. For example, some tracks of the "Titanic"(*) movie use instruments that are supposed to sound like voices. Unless you want a lot of subtitles saying "aaaaaahhhh", more advanced filtering or access to a soundtrack without the music would be required.

      (*) "Take Her to Sea, Mr. Murdoch" by James Horner

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
      Thanks.

      Let's simplify this to a decision problem to have a start.

      Let's suppose we have n SRT-files with different time-stamps, and one is a perfect match to a given soundtrack.

      Now we want to rank which ones fit best. (That's actually a real life scenario)

      With SRT-files I can easily tell sequences of non-speech gaps, like here 1.3 secs between 00:05:15,300 and 00:05:16,400

      1 00:05:00,400 --> 00:05:15,300 This is an example of a subtitle. 2 00:05:16,400 --> 00:05:25,300 This is an example of a subtitle - 2nd subtitle.

      I could check how the gaps of those n SRTs overlap with "silent" passages in the soundtrack (e.g an XOR metric) and rank the SRTs by proximity.

      Question: how can I technically get the timestamps of silent passages of a soundtrack?

      Let's define silent as falling under a certain volume's threshold after filtering frequencies.

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

Re: Adjust synchronizations of video and subtitles automatically by temporal distribution
by bliako (Monsignor) on Jan 31, 2023 at 08:49 UTC