in reply to Perl pattern finding

Yes, though this sounds a lot like homework. We're here to help you improve your coding skills, not code for you.

Assuming you've never really worked with regular expressions before, there's a handy index of regular expression tutorials as Re^3: My Favourite Regex Tools (Was: Parsing a Variable Format String). The pattern you really want will likely use character classes to specify 'digits other than zero' and + and {n,} for Matching repetitions. Since you'll want to cluster your digits with your hyphens, you'll need Non capturing groupings.

Give it the old college try, and we'll coach your through it.

Replies are listed 'Best First'.
Re^2: Perl pattern finding
by Anonymous Monk on Jul 06, 2011 at 20:14 UTC

    :-) Haha, nope I am not a college student ditching homework. More like a frustrated grad student. So here is what I am trying.

    $string=~/((-\d{1,5}){4,20}(-0){4,20})/ print pos($1),"\n";

    In my head, this should find a pattern of up to four to twenty -(number)'s followed by four to twenty -0's. The print pos($1) should return the position in the string of the start of the match.

      One issue you are encountering is that you are using pos incorrectly. pos should be called on the variable you matched against, $string in your example. You also probably want to use a m//g (see Modifiers) wrapped in a while loop. Perhaps something like:

      #!/usr/bin/perl -w use strict; my $string = '0-0-0-23-34-2345-345-21-0-0-0-256-78-0-0-0-0-0-0-0-56-45 +-3-34-0-2-3-4-5-6-0-0-0-0-0-0'; while ($string=~/((-\d{1,5}){4,20}(-0){4,20})/g) { print pos($string),"\n"; }

      This will return the end positions where your regular expression matched. I'm pretty sure this result does not meet your actual spec.

      If I were going to write that regex, however, it would look more like:

      #!/usr/bin/perl -w use strict; my $string = '0-0-0-23-34-2345-345-21-0-0-0-256-78-0-0-0-0-0-0-0-56-45 +-3-34-0-2-3-4-5-6-0-0-0-0-0-0'; for my $match ($string =~/(?<!\d)(?:[^0]\d*-)+(?:0-){3,}0/g) { print "$match\n"; }

      where the regular expression matches as follows:

      NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?<! look behind to see if there is not: ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- [1-9]+ any character of: '1' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- - '-' ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- (?: group, but do not capture (at least 3 times (matching the most amount possible)): ---------------------------------------------------------------------- 0- '0-' ---------------------------------------------------------------------- ){3,} end of grouping ---------------------------------------------------------------------- 0 '0' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

      I've also used the magic that a Global matching modified regular expression in list context returns the list of all matches. Note that my negative lookbehind (see Looking ahead and looking behind) means that it will match at the start of the string, not just in the middle.

      Update:Changed [1-9]+ to [^0]\d* since we need "doesn't start with 0" not "no zeroes".

        Thanks, it will take me awhile to digest that, but upon initial survey, I can probably figure it out from what you said.

        If all else fails, I can bang my head against the wall for a few more hours and post again.