coldfingertips has asked for the wisdom of the Perl Monks concerning the following question:

I am writing my first real MySQL script and just started learning it yesterday from Programming The Perl DBI book.

It was suggested by people in the CB that I post my MYSQL code to see what you monks think about it (as quoted from the cb "your data layout must be crappy").

So below is my setup.cgi script which sets up my tables and everything.

SCRIPT INFORMATION: just to make a fun (simple) project with MySQL I want to create a search engine bot notifier. When you include it on your index page it'll check the UserAgent of everyone/everything that visits the site. It logs the UserAgent, IP, and Time (along with having an id increment which maybe serves no real purpose as of yet-- I just thought it'd be fun).

Each search engine has their own TABLE. Google, Altavista, etc. each gets their own table. And when the script tracks the UserAgent, it inserts the data in the related table (to say if Googlebot is on your site, it'll insert the id, useragent, time and ip into the Google table).

Now what I want to do is create a LAST 10 USERAGENTS table. The last 10 known UserAgents from the list we have will be placed there. Just 10, no more. I don't want to read the last 10, I only want their to be a max of 10 in this table. That's what I need help with, also.

But maybe you wise monks know of a better way to do this. I'm up for suggestions but please don't throw advanced MySQL terms or shortcuts my way since I just started doing this yesterday.

#!/usr/bin/perl use CGI::Carp qw(fatalsToBrowser); use warnings; use strict; use CGI qw/:standard/; use DBI; ################### # configurations ################### my $dbase = "spyders_test"; my $mysql_user = "spyders_admin"; my $mysql_pass = "pass"; my $table1 = "table1"; ################### # end configurations ################### print header; my $dbh = DBI->connect("DBI:mysql:$dbase", $mysql_user, $mysql_pass) o +r print DBI->errstr; print "Connection was made<p>"; ########## # Altavista ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS altavista ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "Altavista table created.<br>"; } ########## # Google ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS google ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "Google table created.<br>"; } ########## # Alexa ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS alexa ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "Alexa table created.<br>"; } ########## # AllTheWeb ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS alltheweb ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "AllTheWeb table created.<br>"; } ########## # Entireweb ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS entireweb ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "Entireweb table created.<br>"; } ########## # Excite ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS excite ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "Excite table created.<br>"; } ########## # Galaxy ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS altavista ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "Altavista table created.<br>"; } ########## # Euroseek ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS euroseek ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "Euroseek table created.<br>"; } ########## # Infoseek ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS infoseek ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "InfoSeek table created.<br>"; } ########## # Lycos ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS lycos ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "Lycos table created.<br>"; } ########## # MSN ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS msn ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "MSN table created.<br>"; } ########## # NothernLight ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS northernlight ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "NorthernLight table created.<br>"; } ########## # Teoma ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS teoma ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "Teoma table created.<br>"; } ########## # JUNK TEST ########## my $sth = $dbh->prepare( "CREATE TABLE IF NOT EXISTS junk ( id int auto_increment not null, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, primary key (id) )"); $sth->execute(); if ($sth->errstr) { print $sth->errstr; } else { print "Junk table created.<br>"; } $dbh->disconnect or warn "Couldn't disconnect from database: $DBI::errstr"; exit;

2005-01-29 Janitored by Arunbear - added readmore tags, as per Monastery guidelines

Replies are listed 'Best First'.
Re: newbie table structure
by Corion (Patriarch) on Jan 29, 2005 at 17:50 UTC

    Instead of having ten tables like that, which will be ugly for your program once you want to add another search engine, have two tables, one for the "visit", and one for the "search_engine", like that:

    -- SQL code: CREATE TABLE IF NOT EXISTS visit ( id int auto_increment not null, engine_id int, ip int, date date, primary key (id) ); CREATE TABLE IF NOT EXISTS search_engine ( id int auto_increment not null, useragent int, name char(256), homepage char(256) );

    You then add, for every visit, a new row into the visit table, and put in the number of the search engine from the search_engine table. Having all the data in one place makes then selecting the last 10 engines very easy:

    select distinct search_engine.id from search_engine inner join visit on search_engine.id = visit.engine_id order by visit.date desc limit 10;

    I'm not exactly sure if you want to show the last 10 visiting search engines, or if you want to show the search engines, if any, that made the ten last hits. Getting the search engine(s) for the ten last hits is harder:

    select distinct search_engine.id from search_engine inner join (select visit.id from visit order by visit.date desc limi +t 10);
      Thanks.

      I think I'll try your code but I do have a few questions.

      What is "engine_id", "name" and "homepage"? I don't know what information you're expecting to be in these fields. And why is useragent and integer in SEARCH_ENGINE when it's not an integer?

      Thanks.

        I added "name" and "homepage" as ideas for additional information you might want to store for each search engine - I should have said that. Having a central location to store that data allows you to easily update the information without having several places where you need to change it.

        The engine_id in the table visits points to the search engine that visited you. That useragent is an int in the search engine table is an error - it should be char(250) or something like that.

Re: newbie table structure
by merlyn (Sage) on Jan 29, 2005 at 17:42 UTC
      Randal's method is better. There is less to management. I would change the date to a MySQL date field which will store information as YYYY-MM-DD.
      "CREATE TABLE IF NOT EXISTS visits ( id int auto_increment not null, searchengine char(48) NOT NULL, useragent char(32) NOT NULL, ip char(32) NOT NULL, date date NOT NULL, primary key (id) )");
      You may have to use some perl magic to format the date appropriately. Although this probably not the most efficient way to write this, it demonstrates my point :)
      (undef, undef, undef, $Day, $Month, $Year, undef, undef, undef) = loca +ltime(time); $Month++; $Year += 1900; $MySQLDate = "${Year}-${Month}-${Day}";

      By including the searchengine field and a little extra work, you can also create some great reports such as:

      - how many searchengine entries per day
      select searchengine, date, count(*) from visits group by searchengine, + date order by date desc
      - how many searchengine entries for a specific day
      select searchengine, date, count(*) from visits where date = '2004-01- +31' group by searchengine, date order by date desc
      - total number of searchengine entries
      select searchengine, count(*) from visits group by searchengine
      And other reports based on IP addresses, specific useragents, and so on... You could also pass that information to GD::Graph::Bar and create some nice looking bar graphs that management will like. Even get more crafty and email them routinely to people using MIME::Lite and a cron job. These search engine can be found CPAN.

      Marc Bilodeau
      www.marcbilodeau.com

      Oh. In that case I appologize for getting the terms all messed up but I hope people know what I meant.
      It is not only a pleasure, but an honor to be posting near Randal L Schwartz!! I happen to agree with him. You need to read a bit more on working with mySQL and the Perl DBI. Too much to explain here in this forum. Best of luck, Joe
      It is a pleasure and an honor to be posting next to Randal L Schwartz!! I have to agree with him. Sounds like you may need to spend some more time with the Perl DBI and mySQL books. Best of luck! Joe
Re: newbie table structure
by Solo (Deacon) on Jan 29, 2005 at 17:55 UTC
    Ignoring whether this is the best way to log spider accesses, let's look at the database portion. You really don't want to have a table creation be part of the cgi logging access--you don't want to create the tables at every request! Create a separate 'installation' script that creates the tables once. Next, notice how all your tables look the same? This is a sure sign that there's a problem! Whether you want to call it refactoring or normalization, we need to avoid 'cut-n-paste' code. Consider that which agent/robot hits the cgi is just another piece of data, like so:

    id int NOT NULL auto_increment, useragent char(32) NOT NULL, ip char(32) NOT NULL, date char(32) NOT NULL, useragent char(32), -- added field for ua primary key (id)

    Now we have 1 table. In order to hold only the last 10 hits from each different user agent, we'll need to get a little clever (or stupid depending on your viewpoint) with our updates. First, I think we should add another column to track the order of the requests for each ua. (We could achieve the result using the date column, but I think it's clearer for the SQL uninitiated to add a column and avoid the date handling.)

    ... countdown int default(0), ...

    Now when we add a new row, it will get a countdown value of 0. Every insert, we increment the countdown for that ua. Then, delete any entry with a countdown > 10.

    UPDATE Table set countdown = countdown + 1 where useragent = ?; DELETE FROM Table WHERE countdown > 10;

    That's a enough of a start for you!

    --Solo

    --
    You said you wanted to be around when I made a mistake; well, this could be it, sweetheart.