cxfcxf has asked for the wisdom of the Perl Monks concerning the following question:

my file is just like
asdasd start asdasd asdasdasd asdasdas end asdasdas adasdas start as asdas dasdasdad asdasddas end qweqwe asdasd start asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd end
i want to get
start asdasd asdasdasd asdasdas end start as asdas dasdasdad asdasddas end start asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd +end
i tried my script file
open(TST, "test.log") or dir $!; while (<TST>) { chomp; if (/start/ .. /end/) { s/.*(start.*end).*/$1/m; print "$_\n"; } }
i know why...cause it still read line by line even if you use range operator. so the $_ is like
1.asdasd start asdsadsdasddasds sdasdas asdasdasdasd asdasdsa 2.asdasd asdasdasd end
two separate line in "while" and range operator can just be used to print
start adasd asdasd asdasd asdas end
my question is how to get that output? thank you!

Replies are listed 'Best First'.
Re: how to get context between two flag
by ELISHEVA (Prior) on Aug 07, 2009 at 01:18 UTC

    If your ending tag is a constant string, e.g. 'end' then the easiest way is to just set your end-of-record marker to that value using the Perl variable $/. You can learn more about $/ in perlvar. That way you will be sure to read in the entire run from 'start' to 'end' in a single gulp and the new lines won't cause you any trouble if 'start' is on one line and 'end' is on another. For example,

    use strict; use warnings; local $/='end'; my @aFound; while (my $line=<DATA>) { # normally we would chomp to get rid of 'end' # but this may not be a good idea if the file ends in # junk outside of start ... end. # chomp $line; # make sure we really have a match just in case there is # an 'end' without a preceding "start"! # also use the s modifier at the end of your regex so that # . matches "\n" # see http://perldoc.perl.org/perlre.html#Modifiers # for further information next unless ($line =~ /\s+start\s+(.*)end\z/s); # store extracted string for later use push @aFound, $1; } # do something with the text between start...end # you'll want to change this: # your post looks like you would like to further separate # each word on a separate line, but for now, lets just # print out the run of characters between start...end. print join("\n", @aFound), "\n"; __DATA__ asdasd start asdasd asdasdasd asdasdas end asdasdas adasdas start as asdas dasdasdad asdasddas end qweqwe asdasd start asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd end here is some trailing garbage

    which outputs

    asdasd asdasdasd asdasdas as asdas dasdasdad asdasddas asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd

    Best, beth

    Update Fixed bug (missing s modifier on regex).

      I liked this $/='end'; idea.

      I've been working on another approach using the / /.../ / operator. I will admit that I have not mastered this technique, but it appears to be designed for processing multi-line records.

      The code below produces the correct result, but my "gut feeling" is that it is overly complex. I hope some other Monk can show a better way with the "..." operator.

      This operator is weird in that it sometimes returns values in exponential format, like 3E0 instead of just 3. I haven't figured out how to use this info in the most efficient way yet. Actually below, this info is not used, turn on the print statements to see what this does - it is interesting.

      Anyway here is yet another approach for the OP to experiment with!

      Update: my brain is working slowly today, but Perl DBI folks will be familiar with 0E0. This is the Perl way to return a "TRUE" value for numeric zero. I'm not sure how this xE0 stuff can be used here...

      Update:I guess this is tangential to this discussion, but if you ever wondered "how can I return a "true" value meaning that the function worked and at the same time say that "zero" results were produced, returning the string '0E0' will do that trick.

      #!/usr/bin/perl -w use strict; my $line=(); while (<DATA>) { next if /^\s*$/; # $flag is not necessary here, it is there to # show the return value of this triple dot operator # for /start/.../end/ if ( my $flag = ( /start /.../end/) ) { s/end.*/end/s; s/.*?start/start/; s/\n//; $line .= "$_"; # print "$flag\n"; #interesting 1, 2, 3E0 etc.... if ( $_=~ m/end$/ ) { print "OUT:$line\n"; $line =(); } } } #Prints: #OUT:start 1asdasd asdasdasd asdasdas end #OUT:start 2as asdas dasdasdad asdasddas end #OUT:start 3asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asda +sdasd and this is an evenlonger way to stop a line with )&)9867 some +end #OUT:start 4another line end __DATA__ asdasd start 1asdasd asdasdasd asdasdas end asdasdas adasdas start 2as asdas dasdasdad asdasddas end qweqwe asdasd start 3asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd and this is an even longer way to stop a line with )&)9867 some end garbage start 4another line end abc
Re: how to get context between two flag
by Marshall (Canon) on Aug 07, 2009 at 00:44 UTC
    Use of / /../ / operator is a good idea!

    But here I just show one simple approach that is easy to debug. There are Lot's of ways to do what you need.

    Update: I just assumed that the last line sequence was all one line as you had blank lines before other examples - maybe a bad assumption - bichonfrise74's code looks good to me also - there are many ways Rome here.

    #!/usr/bin/perl -w use strict; while (<DATA>) { next if /^\s*$/; #skip blank lines chomp; #optional as \n would get deleted anyway s/^.*?start\s+//; #remove start and all before s/end.*//; #remove end and all after print "$_\n"; } #prints: # asdasd asdasdasd asdasdas # as asdas dasdasdad asdasddas # asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd __DATA__ asdasd start asdasd asdasdasd asdasdas end asdasdas adasdas start as asdas dasdasdad asdasddas end qweqwe asdasd start asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asd +asdasd end
    Update:

    sorry for goof, brain isn't working full speed today!

    s/^.*?start/start/; s/end.*/end/;
    preserves start and end tokens.
Re: how to get context between two flag
by bichonfrise74 (Vicar) on Aug 07, 2009 at 00:23 UTC
    Are you looking for something like this?
    #!/usr/bin/perl use strict; while(<DATA>) { my ($line) = $_ =~ /\b(start\s.*end)\b/; print "$line\n" if ( $line ); } __DATA__ asdasd start asdasd asdasdasd asdasdas end asdasdas adasdas start as asdas dasdasdad asdasddas end qweqwe asdasd start asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asd +asdasd end
      Oops, I thought your 3rd line is just one big line... Didn't see that it was broken into 3rd and 4th line. Anyway, I modified the code... so, this should do the trick.

      I'm not sure how to update my existing comment, that's why I had to create a new one.
      #!/usr/bin/perl use strict; local $/ = "\n\n"; while( <DATA>) { my ($line) = $_ =~ /\b(start\s.*\n?.*end)\b/; $line =~ s/\n/ /g if ( $line ); print "$line\n" if ( $line ); } __DATA__ asdasd start asdasd asdasdasd asdasdas end asdasdas adasdas start as asdas dasdasdad asdasddas end qweqwe asdasd start asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd end ds start asda end
      the result is the same to mine... last line of file is
      asdasd start asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd end
      there is a \n at end of the first line