perlpal has asked for the wisdom of the Perl Monks concerning the following question:

The following 3 patterns need to be matched with the common regexp mentioned below :

my $out1 = "Line 5 f3050_233_10_vserver_nfs_1  f3050_233_13_simple_aggr_raid4_1";

my $out2 = "Line 3 f3050_233_10_vserver_cifs_1 f3050_233_13_simple_aggr_raid4_1";

my $out3 = " Line 4 f3050_233_10_vserver_nfs_1  f3050_233_13_simple_aggr_raid4_1";

The regular expression is -  /.*Line\s[3-30]\s+([\w-]+)\s+.*/i

This regular expression matches $out2 but not $out1 and $out3.

Im completely befuddled.
Advice is greatly appreciated!
Thanks.

Replies are listed 'Best First'.
Re: RegExp Pattern Matching Behavior
by kennethk (Abbot) on Jun 25, 2009 at 14:24 UTC
    First, please wrap code in code tags (<code>, </code>) to maintain proper formatting - note how your square brackets got converted to links. Please read Writeup Formatting Tips.

    Your issue is that you expect [3-30] to match numbers 3-30, when you are performing what come down to complex string comparisons. The result of that particular group would be matching the character 3 (actually the range 3-3) or the character 0. If you substituted /d{1,2}, you would match any 1 or 2 digit sequence. If you really mean 3-30, you could use the more complex (?:[3-9]|[1-2]\d|30). See perlreftut for more instruction on using regular expressions.

      With respect to 3-30 , i meant matching numbers between 3 and 30. In effect , i would be matching lines with line numbers from 3 to 30.
        You are still not using <code> tags. You've been around long enough you should know better.

        As I stated above, the expression (?:[3-9]|[1-2]\d|30) will match the numbers 3 through 30 inclusive. I used a non-capturing group to isolate the ors from the surrounding expressions and include three terms: [3-9] matches single digits 3-9; [1-2]\d matches single digits 1 or 2 followed by any digit, meaning 10-29; 30 matches 30.

Re: RegExp Pattern Matching Behavior
by ELISHEVA (Prior) on Jun 25, 2009 at 17:14 UTC

    My apologies to kennethk it appears I did not read the thread carefully enough and he has already stated what is written below.

    [3-30] does not match line numbers 3 through 30. It matches a single character, either 3 or 0. Only one line $out2 has either a 3 or 0 as its line number, hence that is the one that is matching. To match the numbers 3 through 30, you need a regular expression like this: (?:[3-9]|[12]\d|30). To explain a bit:
    • (?:...) is a "non-capturing" regular expression. That is, it matches a run of characters without stuffing the match into a variable.
    • | (the pipe) marks alternative regular expressions. Only one of the alternatives needs to match.
    • [...] matches one among a list of alternative characters. Ranges of characters may be indicated using char dash char
    • \d matches any digit, i.e. 0 through 9

    Please see perlretut for a fuller explanation of each part of the regular expression.

    Best, beth


      to kennethk and beth ,

      thank you for the regexp and the explanation.It solved a really big logic error in my code!

      Cheers!

Re: RegExp Pattern Matching Behavior
by Anonymous Monk on Jun 25, 2009 at 14:22 UTC
    missing brackets in ..3-30.., this ways it will only match 3 followed by - etc.. the right is : /.*Line\s[3-30]\s+(\w-+)\s+.*/i
      Note that the OP has 3-30 (and not 3-30) in the regular expression because the poster included square brackets but forgot code tags.
Re: RegExp Pattern Matching Behavior
by Gyatso (Novice) on Jun 25, 2009 at 18:07 UTC
    Regular expression:
    ^\s?Line\s{1}\d{1,}\s+[\w_\d]+\s+.*

    ^\s? ==> Starting with Zero or only one space
    \s{1} ==> Single space after "Line"
    \d{1,}==> Follwed by minimum one digit(maxm any no of digits)
    \s+ ==> Follwed by one or more number of spaces
    [\w_\d]+ ==> Follwed by any combination words,underscores and digits
    \s+ ==> Follwed by 1 or more number of spaces
    .* ==> Follwed by rest of the string

    Regards,
    Gytaso