UPDATE ON DUPLICATE and sorting

coldfingertips has asked for the wisdom of the Perl Monks concerning the following question:

This is a two part problem

I need to UPDATE ON DUPLICATE for this MySQL query so I can perform the needed actions based on whether or not a particular field was found. If it wasn't found, insert it, otherwise update a few of the fields

  my $data = qq(INSERT INTO temp (url, altavista, yahoo, msn, teoma, g
+oogle, alltheweb, Total) values("$url", "$altavista_results", "$yahoo
+_results", "$msn_results", "$teoma_results", "$google_results", "$all
+theweb_results", "$total") ON DUPLICATE KEY UPDATE altavista = "$alta
+vista_results", yahoo = "$yahoo_results", msn = "$msn_results", teoma
+ = "$teoma_results", google = "$google_results", alltheweb = "$allthe
+web_results", total = "$total");
  my $sth = $dbh->prepare($data);
  $sth->execute() or die $dbh->errstr;
[download]

I have an id auto_increment field as a primary key. I can't seem to get this to hold when UPDATE ON DUPLICATE works on the primary key. And since id is new with each row, of course all data will then be inserted because it's new data.

The id column is important because, with it, I can track down the last 10 rows entered into the table.

My question is, what do I have to do in order to perform the insert/or update and the last 10 rows in order to work? It was mentioned in the CB that I could use a time column with $time. It sounds like a great idea but not sure how I'd find the last 10 newest entries based on time.

The second question is, how can I bring back the latest 10 rows AND THEN sort them by a particular field "Total"?

   my $data = qq[SELECT url, altavista, yahoo, msn, teoma, google, all
+theweb, Total FROM temp WHERE id > (SELECT MAX(id) FROM temp) - 10
ORDER BY Total DESC];
[download]

This doesn't sort by Total like it should.

Comment on UPDATE ON DUPLICATE and sorting Select or Download Code

Replies are listed 'Best First'.
Re: UPDATE ON DUPLICATE and sorting by olus (Curate) on Feb 06, 2006 at 02:53 UTC
Haven't worked with update on duplicate yet, so I'll refer only to part 2 I've worked with subqueries in Oracle but not in mysql, so you can make a first query to get the max(id) and then pass it to second query as id > $max -10. The order by will work. You can get the 10 most recent records using the time column by ORDER BY timefield desc LIMIT 10.	[reply]
Re: UPDATE ON DUPLICATE and sorting by graff (Chancellor) on Feb 06, 2006 at 04:32 UTC
Um, if an auto-increment field is the primary key, and you are only inserting/updating non-key fields, how would you know whether or not a given set of fields is a "duplicate" of an existing row in the table? I see that your insert fields and your update fields are the same set, except for "url". Perhaps "url" should be defined the primary key, instead of an auto-increment field. Without this, the "ON DUPLICATE KEY UPDATE" operation makes no sense and cannot be applied to the table. You can make "url" the primary key while still having an auto-increment "row_num" field (or a "created_at" date/time field) that is not the primary key, and this can still be used to do the "order by" and "limit 10" that you want. BTW, you really should be using placeholders for the values in the insert/update statement, and passing those variables as params to the "execute" call; if you use the same statement for more than one set of fields, and especially if the values are coming from outside your script (i.e. any "untrusted" source, like cgi params, environment variables, files, or other user input), placeholders are The Best Way.	[reply]
Re^2: UPDATE ON DUPLICATE and sorting by coldfingertips (Pilgrim) on Feb 06, 2006 at 04:52 UTC
I know I can't do an update on duplicate with an auto increment, that was the main question. I am using the auto id to get the last 10. I would change the primary key to "url" because that's what I need to check duplicates for. The thing is, I believe I tried to change the primary key to "url" already today with the auto increment. It errored out saying I can't have that and a primary key (or some error like that). So I think an auto_increment has to be the primary key.	[reply]
Re^3: UPDATE ON DUPLICATE and sorting by graff (Chancellor) on Feb 06, 2006 at 05:14 UTC
Okay, it turns out that mysql does not allow you to declare an "auto_increment" field unless it's going to be the primary key. So don't use an auto_increment field. Use a date_time field instead (call it something like "create_date"), and whenever a new row is inserted, set its value with the "NOW()" function. Then you'll be able to "order by create_date desc limit 10". You don't need an auto_increment field -- you just need "url" to be the primary key, and you need something chronological to sort on. Update: About the 2nd part of your initial question: you could try to work out a two-stage query: find the "create_date" value for the row that is currently 10 rows "back" from the most recent row, then select all rows where create_date is greater than that value, and order by "Total". But I think it would be a lot easier to do a single query ordered chronologically, store those into an AoA or AoH and let the perl script sort them based on "Total".	[reply]