in reply to how to get context between two flag

If your ending tag is a constant string, e.g. 'end' then the easiest way is to just set your end-of-record marker to that value using the Perl variable $/. You can learn more about $/ in perlvar. That way you will be sure to read in the entire run from 'start' to 'end' in a single gulp and the new lines won't cause you any trouble if 'start' is on one line and 'end' is on another. For example,

use strict; use warnings; local $/='end'; my @aFound; while (my $line=<DATA>) { # normally we would chomp to get rid of 'end' # but this may not be a good idea if the file ends in # junk outside of start ... end. # chomp $line; # make sure we really have a match just in case there is # an 'end' without a preceding "start"! # also use the s modifier at the end of your regex so that # . matches "\n" # see http://perldoc.perl.org/perlre.html#Modifiers # for further information next unless ($line =~ /\s+start\s+(.*)end\z/s); # store extracted string for later use push @aFound, $1; } # do something with the text between start...end # you'll want to change this: # your post looks like you would like to further separate # each word on a separate line, but for now, lets just # print out the run of characters between start...end. print join("\n", @aFound), "\n"; __DATA__ asdasd start asdasd asdasdasd asdasdas end asdasdas adasdas start as asdas dasdasdad asdasddas end qweqwe asdasd start asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd end here is some trailing garbage

which outputs

asdasd asdasdasd asdasdas as asdas dasdasdad asdasddas asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd

Best, beth

Update Fixed bug (missing s modifier on regex).

Replies are listed 'Best First'.
Re^2: how to get context between two flag
by Marshall (Canon) on Aug 07, 2009 at 07:47 UTC
    I liked this $/='end'; idea.

    I've been working on another approach using the / /.../ / operator. I will admit that I have not mastered this technique, but it appears to be designed for processing multi-line records.

    The code below produces the correct result, but my "gut feeling" is that it is overly complex. I hope some other Monk can show a better way with the "..." operator.

    This operator is weird in that it sometimes returns values in exponential format, like 3E0 instead of just 3. I haven't figured out how to use this info in the most efficient way yet. Actually below, this info is not used, turn on the print statements to see what this does - it is interesting.

    Anyway here is yet another approach for the OP to experiment with!

    Update: my brain is working slowly today, but Perl DBI folks will be familiar with 0E0. This is the Perl way to return a "TRUE" value for numeric zero. I'm not sure how this xE0 stuff can be used here...

    Update:I guess this is tangential to this discussion, but if you ever wondered "how can I return a "true" value meaning that the function worked and at the same time say that "zero" results were produced, returning the string '0E0' will do that trick.

    #!/usr/bin/perl -w use strict; my $line=(); while (<DATA>) { next if /^\s*$/; # $flag is not necessary here, it is there to # show the return value of this triple dot operator # for /start/.../end/ if ( my $flag = ( /start /.../end/) ) { s/end.*/end/s; s/.*?start/start/; s/\n//; $line .= "$_"; # print "$flag\n"; #interesting 1, 2, 3E0 etc.... if ( $_=~ m/end$/ ) { print "OUT:$line\n"; $line =(); } } } #Prints: #OUT:start 1asdasd asdasdasd asdasdas end #OUT:start 2as asdas dasdasdad asdasddas end #OUT:start 3asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asda +sdasd and this is an evenlonger way to stop a line with )&)9867 some +end #OUT:start 4another line end __DATA__ asdasd start 1asdasd asdasdasd asdasdas end asdasdas adasdas start 2as asdas dasdasdad asdasddas end qweqwe asdasd start 3asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd and this is an even longer way to stop a line with )&)9867 some end garbage start 4another line end abc