ldbpm has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to parse a file similar to the following:

======================================================== Net Stuff ======================================================== slfjs sfafdds aslfkjsdl sfsdf ======================================================== Services ======================================================== blah glah sfsd asfsdf afsafdf ======================================================== Users ======================================================== Services sam bill frank

I am attempting to match between the first match of "Services", and not after, but I am getting everything using the following code:

while(<>) { if ( /^Services$/ .. /^Users$/ ) { print "$_"; } }

Even when I use multi-line match regular expression, I get the same output:

while(<>) { if ( /^=*Services$/m .. /^=*Users$/m ) { print "$_"; } }

Once again, I just want what is in between first occurrence of "Services" and "Users".
Any ideas?

Replies are listed 'Best First'.
Re: Matching ranges
by BrowserUk (Patriarch) on Sep 02, 2010 at 02:27 UTC

    Sometime $/ is just easier:

    #! perl -slw use strict; $/ = 'Services'; scalar <>; ## discard everything up to the start of the bit you want $/ = 'Users'; print scalar <>; print the bit you want __END__ c:\test>junk31 junk31.dat ======================================================== blah glah sfsd asfsdf afsafdf ======================================================== Users

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      Much thanks ...
Re: Matching ranges
by dasgar (Priest) on Sep 02, 2010 at 04:55 UTC

        Once again, I just want what is in between first occurrence of "Services" and "Users".

    I'm assuming that you aren't interested in the lines with the "======" stuff. Based on that, here's how I would approach the problem.

    use strict; my $file; open(DATA,"<data.txt") || die "Unable to open file 'data.txt': $!\n"; { local $/; $file = <DATA>; } close(DATA); my ($selection) = ($file =~ m/^.+?Services.+?[=]+(.+?)[=]+/is); my (@lines) = split /\n/,$selection; for (my $i=0;$i<=$#lines;$i++) { $lines[$i] =~ s/\n//; print "Line $i: $lines[$i]\n"; }

    Which produces the following output:

    Line 0: Line 1: blah Line 2: glah Line 3: sfsd Line 4: Line 5: Line 6: asfsdf Line 7: afsafdf

    Although BrowserUK's code is probably better, the code above may be easier to follow for some folks (such as myself). Also, I should note that I do lose a blank line at the end of the captured section when I use the split command. Again, I'm assuming that won't create problems for what you're trying to do.

    Anyways, this gives you an example of another approach to solving the problem.

      I like your output, but there is a lot of code to produce that output, and I am trying to produce concise code. This particular code is definitely an option to consider, though.

      Thanks

Re: Matching ranges
by JavaFan (Canon) on Sep 02, 2010 at 10:08 UTC
    I am attempting to match between the first match of "Services", and not after, but I am getting everything using the following code:
    I cannot reproduce that. I do not get the lines between 'Users' and the next 'Services' (as expected).
    Once again, I just want what is in between first occurrence of "Services" and "Users".
    use 5.010; while(<>) { state $done; if ( /^Services$/ .. /^Users$/ ) { print "$_"; $done = 1; next; } last if $done; }
    Or:
    while(<>) { if (my $r = /^Services$/ .. /^Users$/ ) { print $_; last if $r =~ /E0$/; } }

      This is the weirdest parsing problem I have ever encountered

      while(<>) { if (my $r = /^Services/ .. /^Users/ ) { next if $_ =~ /^Services/; print $_; last if $r =~ /E0$/; } }

      produces output without the "Services" line, but the following does not (at least for me):

      while(<>) { if (my $r = /^Services/ .. /^Users/ ) { next if $r =~ /^Services/; print $_; last if $r =~ /E0$/; } }

      even when I try to exit early, $r is not seemingly not respected

      while(<>) { if (my $r = /^Services/ .. /^Users/ ) { #next if $_ =~ /^Services/; exit if $r =~ /^Services/; print $_; last if $r =~ /E0$/; } }
        A slight bit more "regex kung-fu" is needed.

        #!/usr/bin/perl -w use strict; my $num =1; while(<DATA>) { if ( (/^Services\s*$/../^Users\s*$/) =~ /^(\d+)(?<!^1)$/ ) { last if $1 <= $num++; #only first Services section next if /^====/ || /^\s*$/; #if you don't want these print "$_"; } } =prints blah glah sfsd asfsdf afsafdf =cut
        Data Used: I highly recommend reading: not only Flipin good, or a total flop?, but some of the responses in that node that talk about eliminating start and end conditions - shows how to do "regex-fu" like above!
        Your $r =~ /^Services/ is the problem. Try running this and look at printout of values of $r.
        while(<>) { if (my $r = /^Services/ .. /^Users/ ) { #next if $r =~ /^Services/; next if /^Services/; print "$r:$_"; last if $r =~ /E0$/; } }

      Your second piece of code work well. Thanks, because I like to use ranges in this particular case.

Re: Matching ranges
by aquarium (Curate) on Sep 02, 2010 at 03:44 UTC
    sometimes i'm not too confident handling properly all the cases in multiline (and especially nested) records. if you know or can learn awk (very easy to learn), it's much better at this sort of stuff. with awk you can nest matches to particular levels and each level loop keeps its place properly. it's pretty neat. there's probably perl module or two to parse multiline records. crafting it on your own with a regex can sometimes end up inadvertantly not processing some data that was meant to be processed. anyway, there should still be the built-in automatic converter from awk scripts into perl in your perl distribution.
    that's just alternative to what you may be already doing, so my pick of the best horse may not be the same as yours.
    the hardest line to type correctly is: stty erase ^H
      i would really appreciate if the person that looks like they're stalking my replies with a minus one vote each time...would take the trouble of explaining why my post reply needs improvement. that way i can learn too. thanks.
      the hardest line to type correctly is: stty erase ^H