in reply to Difficult? regex

return 0 if $uri->path =~ /\?(C=N;O=D|C=M;O=A)?$/;

Update: There are two issues on the that regexp line. On my reply I considered only one. You said you wanted to have 0 returned in case of a match, but you are returning nothing. What you will have is an undefined value.

The second is addressed in the replies below.

Replies are listed 'Best First'.
Re^2: Difficult? regex
by Anonymous Monk on Feb 22, 2008 at 12:14 UTC
    Thanks to everyone for the suggestions. My code looks like this now and still doesn't work as expected (see original posting). It does return zero for URLs ending in .jpg so that regex works. The final regex also works.
    sub test_url { my ( $uri, $server ) = @_; # return 1; # Ok to index/spider # return 0; # No, don't index or spider; # ignore any common image files return 0 if $uri->path =~ /\.(gif|jpg|jpeg|png)?$/; # ignore directory listing sorting links return 0 if $uri->path =~ /\?(C=N;O=D|C=M;O=A)$/; # make sure that the path is limited to the docs path return $uri->path =~ m[^/starteam_area/]; }

      Regarding your original post, the replies given so far do solve the problem you mentioned.

      If the behavior is still not what you expected, then there are other things that you will want to say, because we cannot guess what that expected behavior is.

      You say the first and third regexps work. Let me show you that the second also works, and the returned value is '0', just like you want (unless you really mean 'zero' and not '0'.

      sub test_url { my ( $s, $server ) = @_; # return 1; # Ok to index/spider # return 0; # No, don't index or spider; # ignore any common image files return 0 if $s =~ /\.(gif|jpg|jpeg|png)?$/; # ignore directory listing sorting links return 0 if $s =~ /\?(C=N;O=D|C=M;O=A)$/; # make sure that the path is limited to the docs path return $s =~ m[^/starteam_area/]; } my $res; $res = test_url('http://someurl.com/?C=N;O=D'); print "returned value was - ".$res."\n"; $res = test_url('http://someurl.com/?C=M;O=A'); print "returned value was - ".$res."\n"; $res = test_url('http://someurl.com/'); print "returned value was - ".$res."\n"; $res = test_url('http://someurl.com/?C=X;O=A'); print "returned value was - ".$res."\n"; ---- #output returned value was - 0 returned value was - 0 returned value was - returned value was -

      Note that I replaced $uri with $s because I don't know what kind of structure $uri is.

        Ah, your last sentence is what led me to the bug. $uri->path from my example returns the URL without the URL parameters (i.e. without everything after the question mark). I didn't discover this until I created some tests similar to the one you posted. Good regex, bad input. Anyway, I ended up using the following regex (from one of the answers) because it is what I was eventually aiming for:
        /\?(C=[NMSD];O=[AD])$/
        Thank you to everyone for your help.
      Try:

      my $uripath = 'http://www.somewhere.com/~s/reports/?C=M;O=D'; # easier to maintian as a partial expression # added support for more sortorders # which is probably where you erred my $rx_dirsort = qr{\?(C=[NMSD];O=[AD])$}; print "hit!\n" if $uripath =~ /$rx_dirsort/;

      hth

      Edit: Actually, the most important error you made was using the final '?', which has been pointed out by many here.