in reply to Bot vs human User Agent strings

Alternation?

#!/usr/bin/perl -w use strict; my @bad = (); my ($f, $fh, $baddies, $sql, $badregex) = ('./badactors.txt',undef, un +def, '', undef); open($fh, '<', $f) or die $!; while(<$fh>){ chomp; next if m/^$/; push @bad, $_; } close($f); $baddies = join qq{\|},@bad; $badregex = qr~$baddies~; # easy test my $junk = 'doodle bot'; if ($junk=~m/$badregex/) { print qq~\nSee....?\n~; } 1; __DATA__ $dbh->do("INSERT INTO Site_Visit SET firstVisit = NOW(), lastPage = ?, + firstPage = ?, IP = ?, userAgent = ?, orsa = ?, orta = ?, Person_idP +erson = ?", undef, $ENV{'REQUEST_URI'}, $ENV{'REQUEST_URI'}, $ENV{'REMOTE_ADDR'}, $E +NV{'HTTP_USER_AGENT'}, $cookie{'orsa'}, $data{'orta'}, $user) unless $ENV{'HTTP_USER_AGENT'} =~ /bot/i or $ENV{'HTTP_USER_AGENT' +} =~ /facebook/i or $ENV{'HTTP_USER_AGENT'} =~ /dataprovider/i; <code>

Celebrate Intellectual Diversity

Replies are listed 'Best First'.
Re^2: Bot vs human User Agent strings
by Bod (Parson) on Feb 10, 2024 at 22:09 UTC

    Thanks...based on this solution and sleeping on it, I've implemented this solution:

    open my $fh, '<', "....data/UserAgents/block.dat"; my @agent = <$fh>; close $fh; chomp @agent; my $invalid = grep { $ENV{'HTTP_USER_AGENT'} =~ /$_/i } @agent;

    The solution from hippo has made me think, recall and investigate what Apache can do in this situation. However, I decided to do it this way instead of using Apache because it keeps all the logic in the method that processes the page headers. This is where the session cookie is set, so it makes sense (to me) to keep the code there as well. I feel this should be easy to maintain and easy to find.

    I am a bit confused by a line in the above code:

    if ($junk=~m/$badregex/) { print qq~\nSee....?\n~; }
    Isn't the m operator redundant here or is it doing something subtle that I have overlooked?

      Isn't the m operator redundant here

      Yes, you are correct. However, it does no harm. It's not really an operator - rather it can serve to disambiguate the regex (for the compiler) in circumstances where it might not be clear. This is not one of those circumstances AFAICT.


      🦛