bowei_99 has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to figure out a multiline regex to:

1. Parse the following text:

#EXTM3U #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=80000 80.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=400000 400.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=700000 700.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1500000 1500.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2500000 2500.m3u8
2. Move the third set of lines, i.e.:
#EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=700000 700.m3u8
to the top, right under #EXTM3U, so that it would show up as line 2. The other lines should be kept the same, just shifted down.

I've written the following:

$text = <<'LIST'; #EXTM3U #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=80000 80.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=400000 400.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=700000 700.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1500000 1500.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2500000 2500.m3u8 LIST $text =~ m{ (\#EXT-X-STREAM-INF:PROGRAM-ID= \d+, BANDWIDTH=\d+ \s* \d+\.m3u8) }xms; print "1 - $1, 2 - $2, 3 - $3\n";
which yields:

# ./test.pl 1 - #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=80000 80.m3u8, 2 - , 3 -
However, I'm not sure how to specify in the regex to split by: a) the first chunk being the top two entries, b) the second chunk being what I want to move, c) the last chunk. I figure once I have that, it's a simple matter of specifying $1, $2 and $3 in the replacement part of the regex.

Note that I realize I could split this into an array, but I'm trying to find an efficient regex to do the job here. Thoughts?

-- Burvil

Replies are listed 'Best First'.
Re: Multiline regex for moving lines?
by kcott (Archbishop) on Sep 05, 2013 at 02:42 UTC

    G'day bowei_99,

    Your text is made up of a series of elements each consisting of a hash (#) followed by a variable number of characters which aren't hashes, i.e. "[#][^#]+".

    The "last chunk", as you called it, doesn't need to be part of the replacement. Just match the first "#...", a pair of "#...", the "#..." you want to move; then move the 3rd capture before the 2nd:

    $ perl -Mstrict -Mwarnings -le ' my $text = <<LIST; #EXTM3U #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=80000 80.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=400000 400.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=700000 700.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1500000 1500.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2500000 2500.m3u8 LIST $text =~ s{\A([#][^#]+)((?:[#][^#]+){2})([#][^#]+)}{$1$3$2}m; print $text; ' #EXTM3U #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=700000 700.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=80000 80.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=400000 400.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1500000 1500.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2500000 2500.m3u8

    -- Ken

Re: Multiline regex for moving lines?
by Athanasius (Archbishop) on Sep 05, 2013 at 03:01 UTC

    For flexibility, here’s an approach which accepts the number of the entry to be moved up as a command line parameter:

    #! perl use strict; use warnings; my $target = $ARGV[0] - 1; my $text = <<'LIST'; #EXTM3U #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=80000 80.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=400000 400.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=700000 700.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1500000 1500.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2500000 2500.m3u8 LIST my $entry = qr{ \#EXT-X-STREAM-INF:PROGRAM-ID= \d+ ,BANDWIDTH= \d+ \s+ \d+\.m3u8 \s+ }xms; if ($text =~ m{ ( \A \#EXTM3U \s+ ) #1 ( (?: $entry){$target} ) #2 ( (?: $entry) ) #3 ( (?: $entry)* ) #4 }xms) { print "$1$3$2$4"; } else { print "No match found\n"; }

    Output:

    12:57 >perl 710_SoPW.pl 3 #EXTM3U #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=700000 700.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=80000 80.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=400000 400.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1500000 1500.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2500000 2500.m3u8 12:57 >

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

Re: Multiline regex for moving lines?
by johngg (Canon) on Sep 05, 2013 at 08:15 UTC

    Rather than using a complicated regex you could read your records into an array then print in the desired order using an array slice.

    use strict; use warnings; open my $inFH, q{<}, \ <<EOF or die $!; #EXTM3U #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=80000 80.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=400000 400.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=700000 700.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1500000 1500.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2500000 2500.m3u8 EOF my @records; push @records, scalar <$inFH>; push @records, join q{}, map scalar <$inFH>, 1 .. 2 while not eof $inFH; print for @records[ 0, 3, 1, 2, 4, 5 ];

    The output.

    #EXTM3U #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=700000 700.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=80000 80.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=400000 400.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=1500000 1500.m3u8 #EXT-X-STREAM-INF:PROGRAM-ID=1,BANDWIDTH=2500000 2500.m3u8

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Multiline regex for moving lines?
by Anonymous Monk on Sep 05, 2013 at 02:33 UTC

    Note that I realize I could split this into an array, but I'm trying to find an efficient regex to do the job here. Thoughts?

    How would you do it using split into an array?

    The logic isn't too different using regex (maybe), just the language is different(regex, 1,2 )

    I imagine something along the lines of (you fill in the blanks )

    my $lineregex = qr{}ms; $input =~ s{ ( # $1 is first 3 lines $line $line $line ) ( # $2 is second 3 lines $line $line $line ) ( # $3 is third 3 lines $line $line $line ) }{$2$3$1}gx;
Re: Multiline regex for moving lines?
by Laurent_R (Canon) on Sep 05, 2013 at 06:13 UTC

    You can do it with regexes, and have been shown several ways of doing it, but regexes is not the best tool to do that. Just as you can drive a nail into a piece of wood by hammering it with the handle of a screwdriver.

    Splitting your input into an array of lines seems more natural, more flexible if your rules change and would scale better if your input grows larger. So why not using split?