in reply to Difficult? regex
I would argue against doing this with a regex. When you have something that is known to conform to a standard, like URIs or HTML, better to use a parser. It's easier to adapt, easier to extend use cases, immune to argument order, and generally more likely to be bomb-proof.
How about this for your thing.
use URI::QueryParam; # introduces a new method to URI sub test_url { my ( $uri, $server ) = @_; # returns true, ok to index/spider # return false, don't index or spider # A white list is always better than # a black list if you can make one return unless $uri->path =~ /\.html$/; # Note about what this condition really means return if $uri->query_param("C") eq "N" and $uri->query_param("O") eq "D"; # Note about what this condition really means return if $uri->query_param("C") eq "M" and $uri->query_param("O") eq "A"; # make sure that the path is limited to the docs path return $uri->path =~ m[^/starteam_area/]; }
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Difficult? regex
by Anonymous Monk on Feb 26, 2008 at 09:14 UTC |