Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Dear Friends, I am new to perl.. I want to read a block of lines from the file with some keywords. Example file contents:
some another lines =begin some lines some lines =end some another lines...
I want to read the block which is start from "=begin" and end with "=end" in a single stroke. I can read like ,
while (<file_handle>) { if(/=begin/) { read the next line. check whether the line is "=end" or not... If it "=end" stop the read operation. Otherwise contin +ue the read process. } }
But I need a best way to read the blocks in Perl regular expressions.. Thanks in advance....

Replies are listed 'Best First'.
Re: Regular expression...
by jwkrahn (Abbot) on Oct 08, 2010 at 06:27 UTC

    You probably want to use the flip-flop operator:

    while ( <file_handle> ) { if ( /^=begin/ .. /^=end/ ) { read each line between, and including, "-begin" and "=end" } }
      Thank you for introducing me to the flip-flop operator!
Re: Regular expression...
by murugu (Curate) on Oct 08, 2010 at 07:05 UTC

    Not exactly a regular expression approach. Regexp::Common can be used for these.

    use strict; use warnings; use Regexp::Common; my $string = ' =begin a bc d =end '; print $1 if $string=~/$RE{balanced}{-begin => "=begin"}{-end => "end"} +{-keep}/;

    Regards,
    Murugesan Kandasamy
    use perl for(;;);

Re: Regular expression...
by eyepopslikeamosquito (Archbishop) on Oct 08, 2010 at 07:33 UTC

    I would do it something like this:

    use strict; use warnings; sub slurp { my $file = shift; open( my $fh, '<', $file ) or die "error: open '$file': $!\n"; local $/ = undef; my $s = <$fh>; close($fh); return $s; } my $file = shift or die "usage: $0 filename\n"; my @m = slurp($file) =~ /^=begin[^\n]*\n(.*?)^=end/msg; print "---\n$_---\n\n" for @m;

Re: Regular expression...
by prasadbabu (Prior) on Oct 08, 2010 at 06:27 UTC

    Here is one way to do it.

    $string =' some another lines =begin some lines some lines =end some another lines...'; while ($string =~ /(?:(?:^|\n)=begin)((?:(?!\n=end).)*)\n=end/gs){ print "$1\n"; } output: ======= some lines some lines
    Updated: included '\n' before =end.

    Prasad

Re: Regular expression...
by johngg (Canon) on Oct 08, 2010 at 15:48 UTC

    A combination of the already recommended flip-flop inside a grep along with a join and a split breaks the file into records nicely.

    knoppix@Microknoppix:~$ perl -E ' > open my $fh, q{<}, \ <<EOD or die $!; > dsfiu > =begin > line1 > line2 > =end > wwefgwf > werfwef > =begin > line3 > line4 > line5 > =end > sadfwfe > dfbsdfbsfd > EOD > > @recs = > split m{(?<==end\n)(?==begin)}, > join q{}, > grep { m{=begin} .. m{=end} } > <$fh>; > > print do { local $" = qq{**********\n}; qq{@recs} };' =begin line1 line2 =end ********** =begin line3 line4 line5 =end knoppix@Microknoppix:~$

    I hope this is helpful.

    Cheers,

    JohnGG

Re: Regular expression...
by locked_user sundialsvc4 (Abbot) on Oct 08, 2010 at 16:11 UTC

    Personally, I would just use the loop.

    Yes, it is possible to use regular-expression tricks such as the /m (and /g) operators, but I also caution that “clarity, clarity usually wins the race.”

    You know that you are reading from a file-buffer (so a loop is not really “slower”), and you also know that a line-by-line approach can easily handle an arbitrary number of lines.   Plus, it is clear, and easily changed.   The programmers who come after you (or who work alongside you) can easily understand what old fashioned, loop-based code is now doing, and they can make changes to it with confidence.   When, not if, the requirements change, they can change the code with much less fear of breaking it.

Re: Regular expression...
by snape (Pilgrim) on Oct 10, 2010 at 01:12 UTC
    #!/usr/bin/perl use strict; use warnings; open IN,"StartEnd.txt" or die $!; open OUT,'>', "StartEnd_out.txt" or die $!; my $bol = 'FALSE'; while(<IN>){ if(/^=begin/){ $bol = 'TRUE'; } elsif(/^=end/){ $bol = 'FALSE'; } elsif($bol eq 'TRUE'){ print OUT $_,"\n"; } } close(IN); close(OUT);