Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

Blocking based on words in a list

by matts156 (Novice)
on Dec 28, 2005 at 00:26 UTC ( [id://519465]=perlquestion: print w/replies, xml ) Need Help??

matts156 has asked for the wisdom of the Perl Monks concerning the following question:

Ok, I've been playing with this for hours. I'm not a trained programmer, so what I've learned so far is by trial and error. Please bear with me.

I have a guestbook that someone has been spamming recently. I simply want to create a list of words in a txt file that can be assigned to a variable. Then when the user clicks submit, it checks to see if the comments field contains any of the blocked words in the list. If so, it simply posts a message saying BLOCKED!!. If not, it allows the user to post.

Here's the code I've been playing with. Someone please have a look at it and tell me where I'm going wrong. I can get it to work if I specify the exact word in quotes where you see "@blocked" below in the IF statement. I've tried various combinations of quotes, using $ instead of @, using "eq" instead of =~, and God knows what else. It either blocks nothing, or blocks everything, regardless of the value of $comments. Here's my flawed code.

# Check for blocked text open (BLOCK,"/home/cowpensv/public_html/cgi-bin/block.txt")|| die "Can +t Open block.txt"; @blocked = <BLOCK>; close (BLOCK); if ( $FORM{'comments'} =~ @blocked ) { print "Content-type: text/html\n\n"; print "<html><head><title>Blocked</title></head>\n"; print "<body><h1>BLOCKED!!</h1>\n"; print "\n</body></html>\n"; exit;}

Replies are listed 'Best First'.
Re: Blocking based on words in a list
by esskar (Deacon) on Dec 28, 2005 at 00:43 UTC
    i see this construct very often ($scalar =~ @array) but it's totally wrong... you have to check very entry in that blocked-array.
    reading all entries into an array can be slow and memory consumming and actually there is not need to use regexp's heer. so i would change the code like this.
    if(open (my $filehandle, '<', '/home/cowpensv/public_html/cgi-bin/bloc +k.txt')) { my $comments = $FORM{'comments'}; my $is_blocked = 0; while ($is_blocked == 0 && (my $word = <$filehandle>)) { chomp $word; $is_blocked = 1 if index($comments, $word) > -1; } close $filehandle; if ( $is_blocked ) { print "Content-type: text/html\n\n"; print "<html><head><title>Blocked</title></head>\n"; print "<body><h1>BLOCKED!!</h1>\n"; print "\n</body></html>\n"; exit; } } else { die "Can't open block.txt"; # or any other error-handling # i don't like die in a cgi-env }
      Hey, you're awesome. That worked just great. One of these days, I'm going to take a class on this stuff.
Re: Blocking based on words in a list
by moot (Chaplain) on Dec 28, 2005 at 00:50 UTC
    chomp for @blocked; # don't expect to match newlines if ( grep { $FORM{'comments'} =~ /\b$_\b/ } @blocked) { ... }
    This is somewhat simplistic, not to mention inefficient - the grep continues over every word in the list even if the first word matches. A better approach would use a for loop:
    chomp for @blocked; for my $m (@blocked) { if ($FORM{'comments'} =~ /\b$m\b/) { # print your message here. exit; # could also last outside the loop } }
    The \b markers are to match at a word boundary, so BLOCK will not be matched in THIS SHOULD BE UNBLOCKED, for example.

    However you may want to re-think your approach. Simplistic blockers like this rarely achieve the desired results, and often produce too high a level of false positives to be truly useful. YMMV, of course.

      $FORM{'comments'} =~ /\b$_\b/ }

      That may give errors or unexpected behaviour if there are metachars for REs in @blocked (e.g. opening parantheses, dots, ...). Either use quotemeta for each $_, or \Q$_\E, e.g.

      @blocks = map { quotemeta($_) } @blocks; ... $FORM{'comments'} =~ /\b$_\b/
      $FORM{'comments'} =~ /\b\Q$_\E\b/

      Best regards,
      perl -e "s>>*F>e=>y)\*martinF)stronat)=>print,print v8.8.8.32.11.32"

Re: Blocking based on words in a list
by myuji (Acolyte) on Dec 28, 2005 at 02:50 UTC
    Joining words with "|", you can check a string at one time. And I use "quotemeta" to quote regular expression magic characters.
    chomp (@blocked); my $regex = join "|", map{ quotemeta($_) } @blocked; if( $FORM{'comments'} =~ /$regex/){ ... }
    Hope this helps,
Re: Blocking based on words in a list
by idsfa (Vicar) on Dec 28, 2005 at 17:19 UTC

    Of course, you don't want to be using a single, hardcoded file name. If someone else submits a comment at the same time, your block.txt will get overwritten, avoiding your scan. Look into File::Temp.

    Updated: (repeating messaged clarification for the benefit of others) The problem isn't with the code you've shown. Suppose I click submit. Your code dumps my comment into block.txt. Before the file is read in by your scanner, someone else submits a comment. Now your scanner checks block.txt, but not MY block.txt. Depending on how the rest of your code works, my comment is either dropped (bad) or not scanned (also bad).


    The intelligent reader will judge for himself. Without examining the facts fully and fairly, there is no way of knowing whether vox populi is really vox dei, or merely vox asinorum. — Cyrus H. Gordon
      why is that? block.txt will be opened for reading, not for writing!?!

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://519465]
Approved by astaines
Front-paged by tye
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having a coffee break in the Monastery: (8)
As of 2024-03-28 09:13 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found