Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

I'm logging some HTML form data to MySQL, and need to either INSERT the row with that data if it is unique, or UPDATE the existing row if some of the data already exists in the database (these are URLs).

I can do the INSERT for every form submit, but I'm unclear how to check if the URL exists already, and if so, do an UPDATE, if not, do an INSERT.

Bonus points for those who can help me figure out how to make sure http://www.example.com/foo/bar.html and http://example.com/foo/bar.html are the same data, and only to INSERT/UPDATE it once.

Replies are listed 'Best First'.
Re: INSERT or UPDATE, but only when unique
by jZed (Prior) on Mar 11, 2005 at 22:11 UTC
    Check out MySQL's REPLACE command and newer ON DUPLICATE KEY UPDATE syntax.
      The replace into is a nice thing, but remind that you need an id to check against and it is not cross-database capable.
      So when making an database driven app, that might run on more databases, i'd avoid the use of a not standard sql statement like this one.
      Why don't you just tell the database to create the field so that it only should contain unique values? then just use insert and update (catch the message of the database to see if insert went ok, else use update)
Re: INSERT or UPDATE, but only when unique
by jhourcle (Prior) on Mar 12, 2005 at 00:27 UTC

    You've not said what the criteria are for determining that the two host names are the same. If they're all systems in your control, and you know they map to the same pages, that's much different than just assuming they are because they resolve to the same IP address (which may be virtual hosts, and not the same page). The only way I would think to do it reliably, without any outside knowledge would be to getting the two pages, and comparing them, if the filepath of the URL is the same. (and even then, you might have someone who has virtual hosts set up before HTTP/1.0, where they might have http://host.domain.tld/index.html be the same as http://www.domain.tld/host/index.html.)

    If I had a requirement to do something like this, I'd probably just do all of my writes to a temporary table, and have a seperate process that reads from that table, processes the values, and moves them to the real table. ...or do it in a database trigger, but I've never used mySQL 5, so I have no idea what sorts of things you can do in them. (I've never tried doing HTTP requests in PL/SQL but I know it exists

    I won't bother with the initial question, as it's already been answered

Re: INSERT or UPDATE, but only when unique
by TedPride (Priest) on Mar 12, 2005 at 04:04 UTC
    You have a number of problems here. First of all, what if the submitted URL is the same as an existing one, but the new one has / at the end and the other has /index.html? or .php, or .htm, or whatever. If it has /index.*, you need to compare to the / version and see if they match. Second, what if the URL starts with a subdomain other than www, or no subdomain? You need to check against the regular www. URL. The best way to do this is by storing a MD5 hash of the contents of each page, and if you don't get a hash match, download the page already in the database again to make sure it wasn't just updated. Third, what if there are pages with duplicate content but different URLs? How do you handle those?

    The big question of course is what exactly you're trying to do with your data once you have it in your database. It's hard to determine the how without knowing the why.

Re: INSERT or UPDATE, but only when unique
by saintmike (Vicar) on Mar 11, 2005 at 22:11 UTC
    jZed's got the right anwer, see below.

    You could define this column as UNIQUE in MySQL, and if the INSERT fails, check if it failed because of a duplicate key. In this case, update the record.

Re: INSERT or UPDATE, but only when unique
by jdalbec (Deacon) on Mar 12, 2005 at 01:12 UTC
    YAWTDI is to UPDATE first and then if no rows were updated do the INSERT.
Re: INSERT or UPDATE, but only when unique
by Ben Win Lue (Friar) on Mar 12, 2005 at 07:37 UTC
    If you use Oracle, the  MERGE statement might be a good idea