HDTVJohn has asked for the wisdom of the Perl Monks concerning the following question:

First time I've posted here, would appreciate some hints. I would like to reformat some tricky text related to timecode. (Closed caption actually), and it has me spinning in circles. Here's a snippet of the text:

\ TC: 16:12:48;28 ÷1426÷1426÷142D÷142D÷1470÷1470AND W \ TC: 16:12:49;08 HI \ TC: 16:12:49;11 LE \ TC: 16:12:49;14 OVER \ TC: 16:12:49;19 T \ TC: 16:12:49;21 HE L \ TC: 16:12:49;24 AS \ TC: 16:12:49;27 T 10 \ TC: 16:12:50;07 ÷1426÷1426÷142D÷142D÷1470÷1470YEARS \ TC: 16:12:50;17 W \ TC: 16:12:50;19 E' \ TC: 16:12:50;21 VE M \ TC: 16:12:50;25 AD \ TC: 16:12:50;27 E \ TC: 16:12:51;09 MAJ \ TC: 16:12:51;13 OR \ TC: 16:12:52;22 ÷1426÷1426÷142D÷142D÷1470÷1470INVESTMENTS
The ÷1xxx is the start of the text line that ends just before the timecode preceding the next ÷1xx, but I need to include the previous \TC: value, but delete all of the in-between TC values and the ÷1426 characters. End result ideally would be:
\ TC: 16:12:48;28 AND WHILE OVER THE LAST 10 \ TC: 16:12:50;07 YEARS WE'VE MADE MAJOR....etc etc
Matching patterns across lines, then going back to a previous line to collect and collate all of the numbers and text has me spinning. Any suggestions would be appreciated. Thx

Replies are listed 'Best First'.
Re: Search/format Across Multiple Lines
by GrandFather (Saint) on Jul 23, 2014 at 04:24 UTC

    Build up the current line by concatenating new bits on to it until you reach a new line. When you reach a new line spit out the previous line, then set the current line string to the time code prefix for the new line:

    use strict; use warnings; my $currLine; while (<DATA>) { next if ! /^\\ TC: +(\S+) (.+)/; my ($timecode, $tail) = ($1, $2); if ($tail !~ /^÷/) { $currLine .= $tail; next; } print "$currLine\n" if defined $currLine; $currLine = "\ TC: $timecode $tail"; } print $currLine if defined $currLine; __DATA__ \ TC: 16:12:48;28 ÷1426÷1426÷142D÷142D÷1470÷1470AND W \ TC: 16:12:49;08 HI \ TC: 16:12:49;11 LE \ TC: 16:12:49;14 OVER \ TC: 16:12:49;19 T \ TC: 16:12:49;21 HE L \ TC: 16:12:49;24 AS \ TC: 16:12:49;27 T 10 \ TC: 16:12:50;07 ÷1426÷1426÷142D÷142D÷1470÷1470YEARS \ TC: 16:12:50;17 W \ TC: 16:12:50;19 E' \ TC: 16:12:50;21 VE M \ TC: 16:12:50;25 AD \ TC: 16:12:50;27 E \ TC: 16:12:51;09 MAJ \ TC: 16:12:51;13 OR \ TC: 16:12:52;22 ÷1426÷1426÷142D÷142D÷1470÷1470INVESTMENTS

    Prints:

    TC: 16:12:48;28 ÷1426÷1426÷142D÷142D÷1470÷1470AND WHILE OVER THE LAS +T 10 TC: 16:12:50;07 ÷1426÷1426÷142D÷142D÷1470÷1470YEARS WE'VE MADE MAJOR TC: 16:12:52;22 ÷1426÷1426÷142D÷142D÷1470÷1470INVESTMENTS
    Perl is the programming world's equivalent of English
Re: Search/format Across Multiple Lines
by Athanasius (Archbishop) on Jul 23, 2014 at 04:51 UTC

    Hello HDTVJohn, and welcome to the Monastery!

    Here’s my take on your problem, a variation on GrandFather’s solution:

    #! perl use strict; use warnings; use utf8; while (<DATA>) { chomp; my ($timecode, $text) = m!^\\ TC: (\d{2}:\d{2}:\d{2};\d{2}) (.*)$ +! or die "Unexpected line: '$_'"; if ($text =~ m!÷\w{4}!) { $text =~ s!^(?:÷\w{4})+(.*?)$!$1!; print "\n\\ TC: $timecode $text"; } else { print $text; } } print "\n"; __DATA__ \ TC: 16:12:48;28 ÷1426÷1426÷142D÷142D÷1470÷1470AND W \ TC: 16:12:49;08 HI \ TC: 16:12:49;11 LE \ TC: 16:12:49;14 OVER \ TC: 16:12:49;19 T \ TC: 16:12:49;21 HE L \ TC: 16:12:49;24 AS \ TC: 16:12:49;27 T 10 \ TC: 16:12:50;07 ÷1426÷1426÷142D÷142D÷1470÷1470YEARS \ TC: 16:12:50;17 W \ TC: 16:12:50;19 E' \ TC: 16:12:50;21 VE M \ TC: 16:12:50;25 AD \ TC: 16:12:50;27 E \ TC: 16:12:51;09 MAJ \ TC: 16:12:51;13 OR \ TC: 16:12:52;22 ÷1426÷1426÷142D÷142D÷1470÷1470INVESTMENTS

    Output:

    14:49 >perl 945_SoPW.pl \ TC: 16:12:48;28 AND WHILE OVER THE LAST 10 \ TC: 16:12:50;07 YEARS WE'VE MADE MAJOR \ TC: 16:12:52;22 INVESTMENTS 14:49 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Wow! These responses greatly surpassed my expectations. Thank you Anonymous Monk for the hints and GrandFather and Athanasius for the clever and succinct coding for a solution.

Re: Search/format Across Multiple Lines
by Anonymous Monk on Jul 23, 2014 at 03:11 UTC