Beefy Boxes and Bandwidth Generously Provided by pair Networks
No such thing as a small change
 
PerlMonks  

Adjust synchronizations of video and subtitles automatically by temporal distribution

by LanX (Saint)
on Jan 30, 2023 at 21:58 UTC ( [id://11150042]=perlmeditation: print w/replies, xml ) Need Help??

I'm asking this as a meditation, because I don't expect any Perl (or other) code for that.

Questions:
  • Is it possible to filter speech frequencies in a video with significant accuracy to identify the passages were people talk?
  • Can the resulting pattern be used to synchronize a subtitle file, to match the gaps?
I'm looking for a low tech solution offering a handful of plausible adjustments to chose, not a speech recognition bazooka (like YT's auto-subtitles)

Background:

I'm often downloading foreign language movies and like to see them with original voice and subtitles to practice and learn vocabulary, but am often obliged to download and adjust the subtitles timing, because

  • they are shifted, because of trailers or of "what happened last time" intros
  • they are stretched, because of different frame rates
  • they need readjustment in the middle because scenes were cut out
there are already Perl modules to fix the first two cases for .srt files.

That is, if the parameters are known. But finding them can be tricky.

FWIW: VLC offers an option for such synchros, but tends to freeze for a minute if the shift is in the area of 20 secs. No fun when trying out the best settings.

Cheers Rolf
(addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
Wikisyntax for the Monastery

  • Comment on Adjust synchronizations of video and subtitles automatically by temporal distribution

Replies are listed 'Best First'.
Re: Adjust synchronizations of video and subtitles automatically by temporal distribution
by afoken (Chancellor) on Jan 31, 2023 at 08:48 UTC
    Is it possible to filter speech frequencies in a video with significant accuracy to identify the passages were people talk?

    Telephony started with very bad microphones, transmitting barely anything outside the range 300 Hz to 3 kHz, but that was "good enough". Technical development improved the microphones, but analog telephony was and still is intentionally limited to that frequency range. Even when switching to ISDN, the sampling rate was only 8 kHz, limiting audio to about 3 kHz. Things changed only after migration to SIP, with "HD" audio codecs that allow higher frequencies, using more bandwidth and/or more available computing power.

    So I would expect that a filter with that frequency range could be a usable indicator for speech.

    Unfortunately, because the human ear is most sensitive in exactly this range, almost all audible warning signals also use that frequency range. So you will get some false positives. A FFT should be able to identify sharp peaks coming from all kind of beepers and ignore those peaks.

    Alexander

    --
    Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)

      I suspect you will also get some false positives with big budget movie musical scoring. For example, some tracks of the "Titanic"(*) movie use instruments that are supposed to sound like voices. Unless you want a lot of subtitles saying "aaaaaahhhh", more advanced filtering or access to a soundtrack without the music would be required.

      (*) "Take Her to Sea, Mr. Murdoch" by James Horner

      PerlMonks XP is useless? Not anymore: XPD - Do more with your PerlMonks XP
      Thanks.

      Let's simplify this to a decision problem to have a start.

      Let's suppose we have n SRT-files with different time-stamps, and one is a perfect match to a given soundtrack.

      Now we want to rank which ones fit best. (That's actually a real life scenario)

      With SRT-files I can easily tell sequences of non-speech gaps, like here 1.3 secs between 00:05:15,300 and 00:05:16,400

      1 00:05:00,400 --> 00:05:15,300 This is an example of a subtitle. 2 00:05:16,400 --> 00:05:25,300 This is an example of a subtitle - 2nd subtitle.

      I could check how the gaps of those n SRTs overlap with "silent" passages in the soundtrack (e.g an XOR metric) and rank the SRTs by proximity.

      Question: how can I technically get the timestamps of silent passages of a soundtrack?

      Let's define silent as falling under a certain volume's threshold after filtering frequencies.

      Cheers Rolf
      (addicted to the 𐍀𐌴𐍂𐌻 Programming Language :)
      Wikisyntax for the Monastery

Re: Adjust synchronizations of video and subtitles automatically by temporal distribution
by bliako (Monsignor) on Jan 31, 2023 at 08:49 UTC

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlmeditation [id://11150042]
Approved by Discipulus
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others pondering the Monastery: (4)
As of 2024-04-16 13:10 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found