genghis has asked for the wisdom of the Perl Monks concerning the following question:

I'm confused about assigning the results of regex matches to variables. An example from the Perl Cookbook (works, of course):

my $string1 = '123456789'; my @nonlap = $string1 =~/(\d\d\d)/g; #nonlap now contains (123,456,789)

But this time, when I try to pull out 81 from the string A81, it doesn't do what I expected (Regex Coach shows that it's matching both numbers):

my $string2 = 'A81'; my $digits = $string2 =~ /(\d)+/; #$digits now contains '1'

Thanks for your help...

.

Replies are listed 'Best First'.
Re: assigning regex matches to variables
by CountZero (Bishop) on May 28, 2011 at 20:22 UTC
    From the docs:
    m/PATTERN/ : Searches a string for a pattern match, and in scalar context returns true if it succeeds, false if it fails.
    The 1 you see is simply the value of truth.

    Matching in list context

    If the /g option is not used, m// in list context returns a list consisting of the subexpressions matched by the parentheses in the pattern

    (...)

    The /g modifier specifies global pattern matching -- that is, matching as many times as possible within the string. How it behaves depends on the context. In list context, it returns a list of the substrings matched by any capturing parentheses in the regular expression. If there are no parentheses, it returns a list of all the matched strings, as if there were parentheses around the whole pattern.

    In scalar context, each execution of m//g finds the next match, returning true if it matches, and false if there is no further match.

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Re: assigning regex matches to variables
by luis.roca (Deacon) on May 28, 2011 at 21:20 UTC

     "But this time, when I try to pull out 81 from the string A81, it doesn't do what I expected (Regex Coach shows that it's matching both numbers):

    You're getting a match using (\d)+ but it's not a precise match. This is why others have said to change (\d)+ to (\d+). Also keep in mind that if you know your data will ALWAYS be formatted Letter_DIGIT_DIGIT as in A81 with no other characters after, (\d+) or even (\d{2}) (matches only if two digits exist) will work.

    However if you have something like 'A81342545342' the precision of those regexes will be watered down significantly matching every digit after 'A' with (\d+) '81342545342', to loosely matching every pair after 'A' with something like (\d{2}) '81 34 25 45 34'. This is why knowing the context of the data you're searching within is very important (as has been mentioned). Regular Expressions will give you varying levels of precision. It's up to you to weigh how precise you need to be and proceed accordingly.


    "...the adversities born of well-placed thoughts should be considered mercies rather than misfortunes." — Don Quixote
Re: assigning regex matches to variables
by jwkrahn (Abbot) on May 28, 2011 at 20:28 UTC

    when I try to pull out 81 from the string A81, it doesn't do what I expected

        my $digits = $string2 =~ /(\d)+/;

    The modifier + is outside the parentheses so it will only match a single \d character multiple times.    To match multiple \d characters the modifier has to be applied directly to the pattern: /(\d+)/.

Re: assigning regex matches to variables
by lidden (Curate) on May 28, 2011 at 20:10 UTC
    Change my $digits = $string2 =~ /(\d)+/; to my ($digits) = $string2 =~ /(\d)+/; and it will work. The parens changes the assingment from scalar to list context.
      Context is one thing, but the regexp needs to be fixed as well. It ought to be /(\d+)/, with the plus inside the parenthesis.

      Couldn't get that to work. Also tried

      my @digits = $string2 =~ /(\d)+/;

      which didn't work either. I guess I'm generally confused about whether the =~ operator actually assigns the result of the search; i.e. whether

      $var =~ /some_regex/;

      changes the value of $var.

        $var =~ /some_regex/; does not change the value of $var, it just searches through $var for the pattern /some_regex/.

Re: assigning regex matches to variables
by genghis (Novice) on May 28, 2011 at 21:08 UTC

    Hope I haven't double-posted this... Problem solved -- thank you all VERY MUCH for your help. Having two errors at the same time made it hard to figure out. I didn't know that adding parentheses would force the statement to be evaluated in list context. I'm surprised that the distinction between list and scalar context isn't emphasized more in any of the regex documents I have read and I don't recall seeing the trick of forcing list context by using parentheses. However, it must be a common thing to want to change the value of a variable using a regex -- is there some other common syntax for that?

      context plays such a central role in perl that it isn't repeated at every opportunity how you can get it. Usually every tutorial or reference text on perl will tell you about context in one of the first chapters. Note that CountZero's exerpt of the regex documentation specifically mentions context

      Changing the value of a variable with a regex is done with the s/// operator, i.e. for example

      $f=~ s/must/can/g;

      would subsitute every occurance of "must" in the variable $f to "can"