in reply to Checking external links for inappropriate content

joealba asked:

Does anyone else have some good ideas on how to make this program a little more robust in its search, without returning too many misses?

Yes. Forward all of the images to me and I'll let you know if they are innapropriate.

Now, surprising as it may seem, I don't know a lot about the online porn industry (yet another area of future research, I suppose), but I am suspecting that they probably will redirect from the innocuous names to the suspect ones. Thus, you'll probably want to check for redirects. If you're using LWP::Simple, be careful. For example:

perl -MLWP::Simple -e "getprint(q|http://www.ovidinexile.com/|)"

The above code will print out HTML for a frameset. However, if you use Rex Swain's HTTP viewer, you discover that you are redirected to my real home page. I think a redirect should definitely be something you want to flag, even if the Russian words on the new site don't trigger your regexes :)

Cheers,
Ovid

Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

Replies are listed 'Best First'.
Re: (Ovid) Re: Checking external links for inappropriate content
by joealba (Hermit) on Feb 14, 2002 at 23:35 UTC
    NICE! I like the idea of returning every link that results in a redirect. Thanks, Ovid!

    So I should set up an HTTP::Request, pass it to an LWP::UserAgent, and check the $response->{_rc} response code? Or is there an easier way?
      Something like this may help...
      if ($res->is_success) { #Normal retrievel of content stuff }else{ #check for redirects if ($res->code() =~ /30[12]/){ #redirect codes (temp/perm) #grab the location my $remote_cgi = $res->header('Location'); { # Some servers erroneously return a relative URL for redirects, # so make it absolute if it not already is. local $URI::ABS_ALLOW_RELATIVE_SCHEME = 1; my $base = $res->base; $remote_cgi = $HTTP::URI_CLASS->new($remote_cgi,$base)->abs($ba +se); } }else{ # Request failed normaly, broken link }
      Where $res is your result object

      ---If it doesn't fit use a bigger hammer