in reply to Re: Difficult? regex
in thread Difficult? regex

Thanks to everyone for the suggestions. My code looks like this now and still doesn't work as expected (see original posting). It does return zero for URLs ending in .jpg so that regex works. The final regex also works.
sub test_url { my ( $uri, $server ) = @_; # return 1; # Ok to index/spider # return 0; # No, don't index or spider; # ignore any common image files return 0 if $uri->path =~ /\.(gif|jpg|jpeg|png)?$/; # ignore directory listing sorting links return 0 if $uri->path =~ /\?(C=N;O=D|C=M;O=A)$/; # make sure that the path is limited to the docs path return $uri->path =~ m[^/starteam_area/]; }

Replies are listed 'Best First'.
Re^3: Difficult? regex
by olus (Curate) on Feb 22, 2008 at 12:49 UTC

    Regarding your original post, the replies given so far do solve the problem you mentioned.

    If the behavior is still not what you expected, then there are other things that you will want to say, because we cannot guess what that expected behavior is.

    You say the first and third regexps work. Let me show you that the second also works, and the returned value is '0', just like you want (unless you really mean 'zero' and not '0'.

    sub test_url { my ( $s, $server ) = @_; # return 1; # Ok to index/spider # return 0; # No, don't index or spider; # ignore any common image files return 0 if $s =~ /\.(gif|jpg|jpeg|png)?$/; # ignore directory listing sorting links return 0 if $s =~ /\?(C=N;O=D|C=M;O=A)$/; # make sure that the path is limited to the docs path return $s =~ m[^/starteam_area/]; } my $res; $res = test_url('http://someurl.com/?C=N;O=D'); print "returned value was - ".$res."\n"; $res = test_url('http://someurl.com/?C=M;O=A'); print "returned value was - ".$res."\n"; $res = test_url('http://someurl.com/'); print "returned value was - ".$res."\n"; $res = test_url('http://someurl.com/?C=X;O=A'); print "returned value was - ".$res."\n"; ---- #output returned value was - 0 returned value was - 0 returned value was - returned value was -

    Note that I replaced $uri with $s because I don't know what kind of structure $uri is.

      Ah, your last sentence is what led me to the bug. $uri->path from my example returns the URL without the URL parameters (i.e. without everything after the question mark). I didn't discover this until I created some tests similar to the one you posted. Good regex, bad input. Anyway, I ended up using the following regex (from one of the answers) because it is what I was eventually aiming for:
      /\?(C=[NMSD];O=[AD])$/
      Thank you to everyone for your help.
Re^3: Difficult? regex
by stiller (Friar) on Feb 22, 2008 at 12:44 UTC
    Try:

    my $uripath = 'http://www.somewhere.com/~s/reports/?C=M;O=D'; # easier to maintian as a partial expression # added support for more sortorders # which is probably where you erred my $rx_dirsort = qr{\?(C=[NMSD];O=[AD])$}; print "hit!\n" if $uripath =~ /$rx_dirsort/;

    hth

    Edit: Actually, the most important error you made was using the final '?', which has been pointed out by many here.