in reply to Re: Perl pattern finding
in thread Perl pattern finding

:-) Haha, nope I am not a college student ditching homework. More like a frustrated grad student. So here is what I am trying.

$string=~/((-\d{1,5}){4,20}(-0){4,20})/ print pos($1),"\n";

In my head, this should find a pattern of up to four to twenty -(number)'s followed by four to twenty -0's. The print pos($1) should return the position in the string of the start of the match.

Replies are listed 'Best First'.
Re^3: Perl pattern finding
by kennethk (Abbot) on Jul 06, 2011 at 20:35 UTC
    One issue you are encountering is that you are using pos incorrectly. pos should be called on the variable you matched against, $string in your example. You also probably want to use a m//g (see Modifiers) wrapped in a while loop. Perhaps something like:

    #!/usr/bin/perl -w use strict; my $string = '0-0-0-23-34-2345-345-21-0-0-0-256-78-0-0-0-0-0-0-0-56-45 +-3-34-0-2-3-4-5-6-0-0-0-0-0-0'; while ($string=~/((-\d{1,5}){4,20}(-0){4,20})/g) { print pos($string),"\n"; }

    This will return the end positions where your regular expression matched. I'm pretty sure this result does not meet your actual spec.

    If I were going to write that regex, however, it would look more like:

    #!/usr/bin/perl -w use strict; my $string = '0-0-0-23-34-2345-345-21-0-0-0-256-78-0-0-0-0-0-0-0-56-45 +-3-34-0-2-3-4-5-6-0-0-0-0-0-0'; for my $match ($string =~/(?<!\d)(?:[^0]\d*-)+(?:0-){3,}0/g) { print "$match\n"; }

    where the regular expression matches as follows:

    NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- (?<! look behind to see if there is not: ---------------------------------------------------------------------- \d digits (0-9) ---------------------------------------------------------------------- ) end of look-behind ---------------------------------------------------------------------- (?: group, but do not capture (1 or more times (matching the most amount possible)): ---------------------------------------------------------------------- [1-9]+ any character of: '1' to '9' (1 or more times (matching the most amount possible)) ---------------------------------------------------------------------- - '-' ---------------------------------------------------------------------- )+ end of grouping ---------------------------------------------------------------------- (?: group, but do not capture (at least 3 times (matching the most amount possible)): ---------------------------------------------------------------------- 0- '0-' ---------------------------------------------------------------------- ){3,} end of grouping ---------------------------------------------------------------------- 0 '0' ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------

    I've also used the magic that a Global matching modified regular expression in list context returns the list of all matches. Note that my negative lookbehind (see Looking ahead and looking behind) means that it will match at the start of the string, not just in the middle.

    Update:Changed [1-9]+ to [^0]\d* since we need "doesn't start with 0" not "no zeroes".

      Thanks, it will take me awhile to digest that, but upon initial survey, I can probably figure it out from what you said.

      If all else fails, I can bang my head against the wall for a few more hours and post again.