Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

ok, i'm trying to convert a text file of subtitles for a video clip from this format:
1 00:00:38,585 --> 00:00:40,519 What's wrong? 2 00:00:40,554 --> 00:00:43,148 I think I hit something.
to this one:
#1 00;00;38;59 00;00;40;52 What's wrong? #2 00;00;40;55 00;00;43;15 I think I hit something. #3 00;00;43;92 00;00;45;41 You think it's a rock?
so, i wrote this script to do the job, but im having problems and its prolly some blatantly obvious thing i cant see cause i wrote it...please help:
open(IN,"./$file"); @infile = <IN>; close(IN); open(OUT,">$file-FORMATTED.txt"); @sections = split(/\n/, @infile); foreach $i (@sections) { ($l1,$l2,@l3) = split(/\n/,$i); $output = "#",chomp($l1); $l2 =~ s/(\d\d)\:(\d\d)\:(\d\d),(\d\d\d) --> (\d\d)\:(\d\d)\:(\d\d +),(\d\d\d)/\1;\2;\3;\4 \5;\6;\7;\8/; $x = $4; $y = $8; if(substr($x,2,1) > 5) { $x = substr($x,0,2); $x++; } else { $x = substr($x,0,2); } if(substr($y,2,1) > 5) { $y = substr($y,0,2); $y++; } else { $y = substr($y,0,2); } $l2 =~ s/(\d\d;\d\d;\d\d;)\d\d\d (\d\d;\d\d;\d\d;)\d\d\d/\1$x \2$y +/; $output .= "$l2\n", join("\n",@l3), "\n"; print OUT $output; undef $l1; undef $l2; undef $x; undef $y; undef @l3; } close(OUT);

Replies are listed 'Best First'.
Re: text reformatting woes
by ysth (Canon) on Apr 19, 2004 at 07:32 UTC
    Couple things I see right away: looks like your first split should be on /\n\n/, and you are printing the return value of chomp($l1) where I think you want to first to do the chomp($l1) and then print $l1 (chomp doesn't return what you think).

    Update: the chomp isn't even needed, since you've split on /\n/. Just replace chomp($l1) with $l1.

    Update (more comments): you are using $4 and $8 without checking that the match succeeded. I'd at the very least say

    if ($l2 =~ s/..../..../) { # $x and $y munging code # and substitution back into $l2 }
    and preferably also add
    else { warn "houston, we have a problem: $l2 "; }
    I see from your use of \1,\2, etc. that you aren't using warnings; stick a use warnings; and use strict; at the top, declare your variables (with real names instead of l1, l2, etc.), and see what other problems turn up for you.

    If $x or $y end up getting incremented, I think they may lose trailing zeros; if this is a problem, replace ++$x with $x = sprintf "%.2d", $x+1;; better yet, replace your whole substitution/rounding code with

    s!(\d\d)\:(\d\d)\:(\d\d),(\d\d\d) --> (\d\d)\:(\d\d)\:(\d\d),(\d\d\d)! +sprintf("%s;%s;%s;%02.0f %s;%s;%s;%02.0f",$1,$2,$3,$4/10,$5,$6,$7,$8/ +10)!e
    Update: even that has problems with 01:59:59,995 (which I'm guessing should become 02;00;00;00). Easiest way to avoid that is just add up total milliseconds and then divide it back out into hr,min,sec,centisecs.
Re: text reformatting woes
by kvale (Monsignor) on Apr 19, 2004 at 09:11 UTC
    Here is a simpler approach to your problem:
    while (<DATA>) { next if /^\s*$/; print("#$1 "), next if /^(\d+)$/; print("$1;$2;$3;$4 $5;$6;$7;$8\n"), next if /^(\d+):(\d+):(\d+),(\d+) --> (\d+):(\d+):(\d+),(\d+)$/; print; } __DATA__ 1 00:00:38,585 --> 00:00:40,519 What's wrong? 2 00:00:40,554 --> 00:00:43,148 I think I hit something.

    -Mark

Re: text reformatting woes
by ysth (Canon) on Apr 19, 2004 at 08:06 UTC
    You can avoid spliting and rejoining @l3 by saying: ($l1,$l2,$remainder) = split /\n/, $i, 3; and then printing $remainder instead of join("\n",@l3)

    See split for more on how the limit parameter works.

Re: text reformatting woes
by matija (Priest) on Apr 19, 2004 at 07:35 UTC
    You can't do output with regexp syntax - see perldoc sprintf for propper syntax.

    print sprintf "#%d %02d:%02d:%02d %02d:%02d:%02d\n",@l3;
Re: text reformatting woes
by Anonymous Monk on Apr 19, 2004 at 07:54 UTC
    heres the really strange thing...i run it with a -d flag, and after the split (you were right, changed it to /\n\n/), @sections is assigned "4881", i dont know why. its not returning scalar because there are only about 800 lines in the text file im inputting. and it IS loading the file contents into @infile, i verified that.
      How are you printing @sections and getting 4881? What does this show:
      use Data::Dumper; print Dumper \@sections;
        i run the script with the -d flag, and after the split() line, i type "print @sections" in the debugger. then it prints out "4811"
Re: text reformatting woes
by Anonymous Monk on Apr 19, 2004 at 18:51 UTC
    Late to answer, but I think this code will do what OP wanted.
    #!/usr/bin/perl use strict; use warnings; $/ = ""; while(<DATA>) { my @lines = split "\n"; for ($lines[1]) { s/,(\d+)(\d)/":".($2 >= 5 ? $1 + 1 : $1)/eg; s/--> //; tr/:/;/; } print "#$lines[0] ", join "\n", @lines[1,2], ""; } __DATA__ 1 00:00:38,585 --> 00:00:40,519 What's wrong? 2 00:00:40,554 --> 00:00:43,148 I think I hit something.
    HTH

    Chris