Katanya has asked for the wisdom of the Perl Monks concerning the following question:

Is there a way to use perl to search through a designated SQL table, looking for keywords, and delete any records found with those keywords? I would like to deploy this on my website as a scraper, to go through comments after they have been made in order to remove spam. I guess that adds the necessity of the keyword search needing to be done in a specific field in the table.

Replies are listed 'Best First'.
Re: Search a SQL Table and Delete Records
by roboticus (Chancellor) on Aug 29, 2014 at 00:33 UTC

    Katanya:

    Sure, just use DBI to connect to your database and use whatever SQL statement you use for spam removal:

    use strict; use warnings; my $DB = DBI::connect(....); $DB->do(q{ delete from comments where message like '%SPAM%' });

    As you can see, you don't really even need perl, since you can do it in the database.

    Note: Untested.

    As many will tell you, the hard part isn't doing the deletions, it's coming up with a reliable spam recognizer....

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Right now I just want to test something simple, by just searching one field in the record for keywords like http, www, .com, etc. I know I could do this in the database, but since I'm learning a bit of perl scripting, and I can run it on my site, I thought this would be a good way to learn how I can apply things together. Thank you for replying =)

        Learning Perl? Excellent!

        Learning Perl by applying it when not needed and in a way that makes the specific problem more complex to solve? Not usually advised.

        But for this quest: use DBI as recommended above; capture each row's id and the relevant field's contents to an array; use a regular expression to search the elements for whatever words you consider earmarks of spam (and NB roboticus' comment thereon) and when found, use the row id to tell your db engine to delete those rows.

        Adopting this scheme will be a SMOP... but one that is actually fairly 'simple' and one which provides a really relevant problem case for your effort to learn Perl.


        check Ln42!

      Will this work for retrieving the two columns I want?

      # Search the needed table and pull out needed columns my $search = "SELECT comment_ID,comment_content FROM wp_comments" my $sql = $db->prepare($search); $sql->execute(); # Input fetched data into an array my @retrievedData = ""; while (my @row = $sql->fetchrow_array()) { push @retrievedData, [@row]; }

      If that does I'm still trying to figure out how to search through the results. I only need to search through comment_content, and if a keyword is found, use the comment_ID to delete that row.

        See Regular Expressions perlre
        #!perl use DBI; use strict; my $dbh = get_dbh(); # connect to db # build regex my @keyword = qw(www http .com); my $words = join '|',@keyword; my $regexp = qr/$words/i; print "Regular Expression is $regexp\n"; # search for key words my $sql = 'SELECT comment_ID,comment_content FROM wp_comments'; my $sth = $dbh->prepare($sql); $sth->execute(); my @spam = (); while (my ($id,$content) = $sth->fetchrow_array()) { if ($content =~ /$regexp/ ){ print "ID : $id\ncomment : $content\n\n"; push @spam,$id; } } # delete records my $sql_del = 'DELETE FROM wp_comments WHERE comment_ID = ?'; my $sth_del = $dbh->prepare($sql_del); for my $id (@spam){ # enable this when you are sure it's working !! #$sth->execute($id); print "Deleted $id\n"; }
        poj
Re: Search a SQL Table and Delete Records
by trippledubs (Deacon) on Aug 29, 2014 at 13:48 UTC
    Hi Katanya,

    Good luck in your endeavors. SQL::Abstract and other SQL abstraction modules eliminate the risk of SQL injection and help make your code easier to maintain when working with databases. You could also implement a CAPTCHA using GD::SecurityImage to prevent getting spam in the first place, another great use of and way to learn perl!

      The weird thing is I already have a CAPTCHA installed since I am using Wordpress. I just wanted to try a Perl solution instead. Still trying to work out the code.