Re: how to get context between two flag

If your ending tag is a constant string, e.g. 'end' then the easiest way is to just set your end-of-record marker to that value using the Perl variable $/. You can learn more about $/ in perlvar. That way you will be sure to read in the entire run from 'start' to 'end' in a single gulp and the new lines won't cause you any trouble if 'start' is on one line and 'end' is on another. For example,

use strict;
use warnings;

local $/='end';
my @aFound;
while (my $line=<DATA>) {
  # normally we would chomp to get rid of 'end'
  # but this may not be a good idea if the file ends in
  # junk outside of start ... end.
  # chomp $line;

  # make sure we really have a match just in case there is
  # an 'end' without a preceding "start"!
  # also use the s modifier at the end of your regex so that
  # . matches "\n"
  # see http://perldoc.perl.org/perlre.html#Modifiers
  # for further information
  
  next unless ($line =~ /\s+start\s+(.*)end\z/s);

  # store extracted string for later use
  push @aFound, $1;
}

# do something with the text between start...end
# you'll want to change this:
# your post looks like you would like to further separate
# each word on a separate line, but for now, lets just
# print out the run of characters between start...end.

print join("\n", @aFound), "\n";

__DATA__
asdasd start asdasd asdasdasd asdasdas end asdasdas

adasdas start as asdas dasdasdad asdasddas end qweqwe

asdasd start asdsadsdasddasds sdasdas asdasdasdasd asdasdsa
asdasd asdasdasd end
here is some trailing garbage
[download]

which outputs

asdasd asdasdasd asdasdas 
as asdas dasdasdad asdasddas 
asdsadsdasddasds sdasdas asdasdasdasd asdasdsa
asdasd asdasdasd
[download]

Best, beth

Update Fixed bug (missing s modifier on regex).

Comment on Re: how to get context between two flag Select or Download Code

Replies are listed 'Best First'.
Re^2: how to get context between two flag by Marshall (Canon) on Aug 07, 2009 at 07:47 UTC
I liked this $/='end'; idea. I've been working on another approach using the / /.../ / operator. I will admit that I have not mastered this technique, but it appears to be designed for processing multi-line records. The code below produces the correct result, but my "gut feeling" is that it is overly complex. I hope some other Monk can show a better way with the "..." operator. This operator is weird in that it sometimes returns values in exponential format, like 3E0 instead of just 3. I haven't figured out how to use this info in the most efficient way yet. Actually below, this info is not used, turn on the print statements to see what this does - it is interesting. Anyway here is yet another approach for the OP to experiment with! Update: my brain is working slowly today, but Perl DBI folks will be familiar with 0E0. This is the Perl way to return a "TRUE" value for numeric zero. I'm not sure how this xE0 stuff can be used here... Update:I guess this is tangential to this discussion, but if you ever wondered "how can I return a "true" value meaning that the function worked and at the same time say that "zero" results were produced, returning the string '0E0' will do that trick. #!/usr/bin/perl -w use strict; my $line=(); while (<DATA>) { next if /^\s$/; # $flag is not necessary here, it is there to # show the return value of this triple dot operator # for /start/.../end/ if ( my $flag = ( /start /.../end/) ) { s/end./end/s; s/.*?start/start/; s/\n//; $line .= "$_"; # print "$flag\n"; #interesting 1, 2, 3E0 etc.... if ( $_=~ m/end$/ ) { print "OUT:$line\n"; $line =(); } } } #Prints: #OUT:start 1asdasd asdasdasd asdasdas end #OUT:start 2as asdas dasdasdad asdasddas end #OUT:start 3asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asda +sdasd and this is an evenlonger way to stop a line with )&)9867 some +end #OUT:start 4another line end __DATA__ asdasd start 1asdasd asdasdasd asdasdas end asdasdas adasdas start 2as asdas dasdasdad asdasddas end qweqwe asdasd start 3asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asdasdasd and this is an even longer way to stop a line with )&)9867 some end garbage start 4another line end abc [download]	[reply] [d/l]

Replies are listed 'Best First'.

Re^2: how to get context between two flag
by Marshall (Canon) on Aug 07, 2009 at 07:47 UTC

I've been working on another approach using the / /.../ / operator. I will admit that I have not mastered this technique, but it appears to be designed for processing multi-line records.

The code below produces the correct result, but my "gut feeling" is that it is overly complex. I hope some other Monk can show a better way with the "..." operator.

This operator is weird in that it sometimes returns values in exponential format, like 3E0 instead of just 3. I haven't figured out how to use this info in the most efficient way yet. Actually below, this info is not used, turn on the print statements to see what this does - it is interesting.

Anyway here is yet another approach for the OP to experiment with!

Update: my brain is working slowly today, but Perl DBI folks will be familiar with 0E0. This is the Perl way to return a "TRUE" value for numeric zero. I'm not sure how this xE0 stuff can be used here...

Update:I guess this is tangential to this discussion, but if you ever wondered "how can I return a "true" value meaning that the function worked and at the same time say that "zero" results were produced, returning the string '0E0' will do that trick.

#!/usr/bin/perl -w
use strict;

my $line=();
while (<DATA>)
{
   next if /^\s*$/;
   
   # $flag is not necessary here, it is there to
   # show the return value of this triple dot operator
   # for /start/.../end/
   
   if ( my $flag = ( /start /.../end/) )
   {
      s/end.*/end/s;
      s/.*?start/start/;
      s/\n//;
      $line .= "$_";
      # print "$flag\n"; #interesting 1, 2, 3E0 etc....

      if ( $_=~ m/end$/ )
      {
         print "OUT:$line\n";
         $line =();
      }
   }
}

#Prints:
#OUT:start 1asdasd asdasdasd asdasdas end
#OUT:start 2as asdas dasdasdad asdasddas end
#OUT:start 3asdsadsdasddasds sdasdas asdasdasdasd asdasdsa asdasd asda
+sdasd and this is an evenlonger way to stop a line with )&)9867 some 
+end
#OUT:start 4another line end

__DATA__
asdasd start 1asdasd asdasdasd asdasdas end asdasdas

adasdas start 2as asdas dasdasdad asdasddas end qweqwe

asdasd start 3asdsadsdasddasds sdasdas asdasdasdasd asdasdsa 
asdasd asdasdasd and this is an even
longer way to stop a line with )&)9867 some end garbage

start 4another line end abc
[download]

[reply]
[d/l]