Perl pattern finding

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Perl pattern finding by kennethk (Abbot) on Jul 06, 2011 at 20:06 UTC
Yes, though this sounds a lot like homework. We're here to help you improve your coding skills, not code for you. Assuming you've never really worked with regular expressions before, there's a handy index of regular expression tutorials as Re^3: My Favourite Regex Tools (Was: Parsing a Variable Format String). The pattern you really want will likely use character classes to specify 'digits other than zero' and `+` and `{n,}` for Matching repetitions. Since you'll want to cluster your digits with your hyphens, you'll need Non capturing groupings. Give it the old college try, and we'll coach your through it.	[reply] [d/l] [select]
Re^2: Perl pattern finding by Anonymous Monk on Jul 06, 2011 at 20:14 UTC
:-) Haha, nope I am not a college student ditching homework. More like a frustrated grad student. So here is what I am trying. `$string=~/((-\d{1,5}){4,20}(-0){4,20})/ print pos($1),"\n";` [download] In my head, this should find a pattern of up to four to twenty -(number)'s followed by four to twenty -0's. The print pos($1) should return the position in the string of the start of the match.	[reply] [d/l]
Re^3: Perl pattern finding by kennethk (Abbot) on Jul 06, 2011 at 20:35 UTC
One issue you are encountering is that you are using pos incorrectly. `pos` should be called on the variable you matched against, `$string` in your example. You also probably want to use a `m//g` (see Modifiers) wrapped in a `while` loop. Perhaps something like: `#!/usr/bin/perl -w use strict; my $string = '0-0-0-23-34-2345-345-21-0-0-0-256-78-0-0-0-0-0-0-0-56-45 +-3-34-0-2-3-4-5-6-0-0-0-0-0-0'; while ($string=~/((-\d{1,5}){4,20}(-0){4,20})/g) { print pos($string),"\n"; }` [download] This will return the end positions where your regular expression matched. I'm pretty sure this result does not meet your actual spec. If I were going to write that regex, however, it would look more like: `#!/usr/bin/perl -w use strict; my $string = '0-0-0-23-34-2345-345-21-0-0-0-256-78-0-0-0-0-0-0-0-56-45 +-3-34-0-2-3-4-5-6-0-0-0-0-0-0'; for my $match ($string =~/(?<!\d)(?:[^0]\d-)+(?:0-){3,}0/g) { print "$match\n"; }` [download] where the regular expression matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?<! look behind to see if there is not: ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- [1-9]+ any character of: '1' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- - '-' ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- (?: group, but do not capture (at least 3 times (matching the most amount possible)): ---------------------------------------------------------------------- 0- '0-' ---------------------------------------------------------------------- ){3,} end of grouping ---------------------------------------------------------------------- 0 '0' ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download] I've also used the magic that a Global matching modified regular expression in list context returns the list of all matches. Note that my negative lookbehind (see Looking ahead and looking behind) means that it will match at the start of the string, not just in the middle. Update:Changed `[1-9]+` to `[^0]\d` since we need "doesn't start with 0" not "no zeroes".	[reply] [d/l] [select]
Re^4: Perl pattern finding by Anonymous Monk on Jul 06, 2011 at 20:52 UTC
Re: Perl pattern finding by AnomalousMonk (Archbishop) on Jul 06, 2011 at 23:13 UTC
Here's my take. These assumptions are all easily changed in the code: A 'non-zero' number may have leading zero(s) (i.e., is numerically != 0), e.g., '01', '0023'; A 'zero' may have multiple 0 digits (again, numerically == 0), e.g., '00', '000'; A single non-zero number is 'consecutive'; 'Consecutive' means consecutive increasing, with differences == 1. >perl -wMstrict -le "my $s = '0-0-0-0-23-34-2345-345-21-0-0-0-256-78-' . '0-0-0-0-0-0-0-56-45-3-34-0-' . '2-3-4-5-6-0-0-0-0-0-0-3-2-1-0-0-0-0-0-' . '21-22-0-0-0-0-1-0-0-0-0-' . '24-25-0-0-0-33-34-0-00-000-0000-' . '01-02-0-0-0-0-3-4-01-01-01-01' ; ;; my $nz = qr{ (?<! \d) \d* [1-9] \d* (?! \d) }xms; my $z = qr{ (?<! \d) 0+ (?! \d) }xms; ;; my $nzs_4zs = qr{ $nz (?: - $nz)* (?: - $z){4} }xms; ;; print qq{'$_'} for grep { consecutive_nzs($_, $nz) } $s =~ m{$nzs_4zs}xmsg; ;; sub consecutive_nzs { my ($s, $nz_rx) = @_; my @nzs = $s =~ m{ $nz_rx }xmsg; $nzs[$_]+1 == $nzs[$_+1] \|\| return for 0 .. $#nzs - 1; return 1; } " '2-3-4-5-6-0-0-0-0' '21-22-0-0-0-0' '1-0-0-0-0' '33-34-0-00-000-0000' '01-02-0-0-0-0' [download]	[reply] [d/l]
Re: Perl pattern finding by ikegami (Patriarch) on Jul 06, 2011 at 20:20 UTC
`my @matches = "-$str-" =~ /(?:-[^0-]+)*(?:-0){4,}(?=-)/g;` [download] Useful test cases: Pattern is at the beginning of the string: `1-0-0-0-0-2-3-4` (That's why I prepended a "`-`" to the string.) Pattern is at the end of the string: `0-1-0-0-0-0` (That's why I appended a "`-`" to the string.) Two patterns in a row: `1-2-3-0-0-0-0-4-5-6-0-0-0-0` (That's why I used /`(?=-)`/ instead of /`-`/ at the end.) No leading non-zeroes: `0-0-0-0-1-2-3` Note: My code doesn't find overlapping matches. It's not an issue with the pattern you describe, but it might be for other patterns.	[reply] [d/l] [select]
Re: Perl pattern finding by locked_user sundialsvc4 (Abbot) on Jul 07, 2011 at 02:40 UTC
Generally speaking, this sort of problem is often handled by using the features that allow you to match an item in a string and then remain in the position where the match occurred, so that subsequent regexes pick up where the others left off in the same string. So, this helps by allowing you to break down the problem into two smaller, but well-defined patterns: (a) a series of consecutive non-zero numbers; and (b) a series of four-or-more zeroes. Your basic approach, then, now uses `if` statements and other standard procedural-programming constructs in conjunction with these two regexes. `(pseudo-code) ... while (the first pattern matches) { $p = the ending position if (second pattern matches) { if (the first pattern matches again) { success! $p = the ending position now ... } } resume at $p }` [download] Finally, if you find yourself doing a lot of this sort of thing, it is possible that you are starting to wander into the lands where Parse::RecDescent might be of service.