Alex has asked for the wisdom of the Perl Monks concerning the following question:

Hello monks

I'm a newbie, and feel this question isn't 'worthy' of your attention, but I'm curious as to why this behavior is as such. I do plan to check the perldocs tomorrow, but I'd like to know your thoughts as well. Here's the problem..

my $ua = $ENV{'HTTP_USER_AGENT'};

Using browser Netscape Navigator 6.2 the string returned is this:

Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.4) Gecko/20011019 Netscape6/6.2

The problem is if I use this code:

my $ua_name = $1 if ($ua=~ m/(opera|netscape|gecko|msie)/i);

I get this:

$ua_name: Gecko

Even though netscape is delcared before gecko in my regular expression, BUT if I use this:

my $ua_name = $1 if ( ($ua=~ m/(opera)/i) || ($ua=~ m/(netscape)/i) || ($ua=~ m/(gecko)/i) || ($ua=~ m/(msie)/i) );

I will get this:

$ua_name: Netscape

obviously in my first example it didn't matter what order I put the search patterns between the or operator, I thought it would matter. Can anyone confirm this, or did I already do that?

Alex

2003-06-07 edit ybiC: retitle from "Simple Question for monks, but I'm curious..."

  • Comment on String order in regex match - left to right, or right to left?

Replies are listed 'Best First'.
Re: String order in regex match - left to right, or right to left?
by Enlil (Parson) on Jun 07, 2003 at 01:27 UTC
    The difference is that you have to remember that in:
    my $ua_name = $1 if ($ua=~ m/(opera|netscape|gecko|msie)/i);
    The whole regex is checked at every position in your string before moving on to the next position. The whole regex is checked at the current position in the string, before moving on to the next position. So at every position in the string it would check first to see if it can match opera, failing that it THEN checks to see if it can match netscape, failing that it THEN checks to see if it can match gecko, and finally failing everything else it checks to see if it can match msie. If all this fails it moves to the next position in the string and checks again, and again and again until the regex matches. (remember left most first match wins)

    In the second piece of code:

    my $ua_name = $1 if ( ($ua=~ m/(opera)/i) || ($ua=~ m/(netscape)/i) || + ($ua=~ m/(gecko)/i) || ($ua=~ m/(msie)/i) );
    What happens in this is that it will check the whole string and for something (ie. netscape, gecko, etc..) and then move on to the next regex if it fails. I can elaborate if need be.

    update: perhaps an example would help better explain how the regex engine is working, at the command line try this one-liner:

    perl -Mre=debug -le "$s = 'a man a plan a camel';print $1 if $s =~ /(c +amel|plan|monkey)/;"
    Also I struck out the first sentence which was causing confusion, and replaced it with a new first sentence. Thanks Zero Flop for pointing out my poor wording. -enlil
      Enlil stated "The difference is that you have to remember that in:
      my $ua_name = $1 if ($ua=~ m/(opera|netscape|gecko|msie)/i);

      The whole regex is checked at every position in your string before moving on to the next position."

      I would also follow this train of thought but this actually does not explain what is happening. If this was happening Netscape would be found first, going left to right.

      Does it instead go right to left or is the comparison Alex posted different than what he is running, or is it something completely different?

      Thanks

        Gecko comes first in the string that is being searched. When the regex engine is trying to match a regex in a string, it starts at the first character of the string and tries to match from the beginning of the regex. If it can't match at that position, it moves to the next character of the string and starts trying to match again (unless you anchor the regex). In this case, the match keeps failing until it gets to the point in the string where 'Gecko' appears. At that point, the regex engine says to itself: "Can this match 'opera'? No. Can this match 'netscape'? No. Can this match 'gecko'? Yes. Return captured string 'gecko'. Done." Which happens before it gets to the point in the string where 'Netscape' appears.

        kelan


        Perl6 Grammar Student

        I also have problems with this left/right thing :)

        left right | | Gecko/20011019 Netscape6/6.2