Matching ranges

ldbpm has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Matching ranges by BrowserUk (Patriarch) on Sep 02, 2010 at 02:27 UTC
Sometime $/ is just easier: `#! perl -slw use strict; $/ = 'Services'; scalar <>; ## discard everything up to the start of the bit you want $/ = 'Users'; print scalar <>; print the bit you want __END__ c:\test>junk31 junk31.dat ======================================================== blah glah sfsd asfsdf afsafdf ======================================================== Users` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re^2: Matching ranges by ldbpm (Initiate) on Sep 02, 2010 at 02:54 UTC
Much thanks ...	[reply]
Re: Matching ranges by dasgar (Priest) on Sep 02, 2010 at 04:55 UTC
Once again, I just want what is in between first occurrence of "Services" and "Users". I'm assuming that you aren't interested in the lines with the "======" stuff. Based on that, here's how I would approach the problem. `use strict; my $file; open(DATA,"<data.txt") \|\| die "Unable to open file 'data.txt': $!\n"; { local $/; $file = <DATA>; } close(DATA); my ($selection) = ($file =~ m/^.+?Services.+?[=]+(.+?)[=]+/is); my (@lines) = split /\n/,$selection; for (my $i=0;$i<=$#lines;$i++) { $lines[$i] =~ s/\n//; print "Line $i: $lines[$i]\n"; }` [download] Which produces the following output: `Line 0: Line 1: blah Line 2: glah Line 3: sfsd Line 4: Line 5: Line 6: asfsdf Line 7: afsafdf` [download] Although BrowserUK's code is probably better, the code above may be easier to follow for some folks (such as myself). Also, I should note that I do lose a blank line at the end of the captured section when I use the split command. Again, I'm assuming that won't create problems for what you're trying to do. Anyways, this gives you an example of another approach to solving the problem.	[reply] [d/l] [select]
Re^2: Matching ranges by ldbpm (Initiate) on Sep 02, 2010 at 12:15 UTC
I like your output, but there is a lot of code to produce that output, and I am trying to produce concise code. This particular code is definitely an option to consider, though. Thanks	[reply]
Re: Matching ranges by JavaFan (Canon) on Sep 02, 2010 at 10:08 UTC
I am attempting to match between the first match of "Services", and not after, but I am getting everything using the following code: I cannot reproduce that. I do not get the lines between 'Users' and the next 'Services' (as expected). Once again, I just want what is in between first occurrence of "Services" and "Users". `use 5.010; while(<>) { state $done; if ( /^Services$/ .. /^Users$/ ) { print "$_"; $done = 1; next; } last if $done; }` [download] Or: `while(<>) { if (my $r = /^Services$/ .. /^Users$/ ) { print $_; last if $r =~ /E0$/; } }` [download]	[reply] [d/l] [select]
Re^2: Matching ranges by ldbpm (Initiate) on Sep 02, 2010 at 12:42 UTC
This is the weirdest parsing problem I have ever encountered `while(<>) { if (my $r = /^Services/ .. /^Users/ ) { next if $_ =~ /^Services/; print $_; last if $r =~ /E0$/; } }` [download] produces output without the "Services" line, but the following does not (at least for me): `while(<>) { if (my $r = /^Services/ .. /^Users/ ) { next if $r =~ /^Services/; print $_; last if $r =~ /E0$/; } }` [download] even when I try to exit early, $r is not seemingly not respected `while(<>) { if (my $r = /^Services/ .. /^Users/ ) { #next if $_ =~ /^Services/; exit if $r =~ /^Services/; print $_; last if $r =~ /E0$/; } }` [download]	[reply] [d/l] [select]
Re^3: Matching ranges by Marshall (Canon) on Sep 02, 2010 at 14:24 UTC
A slight bit more "regex kung-fu" is needed. `#!/usr/bin/perl -w use strict; my $num =1; while(<DATA>) { if ( (/^Services\s$/../^Users\s$/) =~ /^(\d+)(?<!^1)$/ ) { last if $1 <= $num++; #only first Services section next if /^====/ \|\| /^\s*$/; #if you don't want these print "$_"; } } =prints blah glah sfsd asfsdf afsafdf =cut` [download] Data Used: Read more... (741 Bytes) I highly recommend reading: not only Flipin good, or a total flop?, but some of the responses in that node that talk about eliminating start and end conditions - shows how to do "regex-fu" like above!	[reply] [d/l] [select]
Re^4: Matching ranges by ldbpm (Initiate) on Sep 02, 2010 at 15:12 UTC
Re^3: Matching ranges by Marshall (Canon) on Sep 02, 2010 at 14:34 UTC
Your $r =~ /^Services/ is the problem. Try running this and look at printout of values of $r. `while(<>) { if (my $r = /^Services/ .. /^Users/ ) { #next if $r =~ /^Services/; next if /^Services/; print "$r:$_"; last if $r =~ /E0$/; } }` [download]	[reply] [d/l]
Re^4: Matching ranges by ldbpm (Initiate) on Sep 02, 2010 at 15:21 UTC
Re^2: Matching ranges by ldbpm (Initiate) on Sep 02, 2010 at 12:02 UTC
Your second piece of code work well. Thanks, because I like to use ranges in this particular case.	[reply]
Re: Matching ranges by aquarium (Curate) on Sep 02, 2010 at 03:44 UTC
sometimes i'm not too confident handling properly all the cases in multiline (and especially nested) records. if you know or can learn awk (very easy to learn), it's much better at this sort of stuff. with awk you can nest matches to particular levels and each level loop keeps its place properly. it's pretty neat. there's probably perl module or two to parse multiline records. crafting it on your own with a regex can sometimes end up inadvertantly not processing some data that was meant to be processed. anyway, there should still be the built-in automatic converter from awk scripts into perl in your perl distribution. that's just alternative to what you may be already doing, so my pick of the best horse may not be the same as yours. the hardest line to type correctly is: stty erase ^H	[reply]
Re^2: Matching ranges by aquarium (Curate) on Sep 02, 2010 at 04:07 UTC
i would really appreciate if the person that looks like they're stalking my replies with a minus one vote each time...would take the trouble of explaining why my post reply needs improvement. that way i can learn too. thanks. the hardest line to type correctly is: stty erase ^H	[reply]