in reply to Database Record Order

Unless you include an "ORDER BY" clause in your SQL statement, the order of the results is undefined. It does not even need to be stable between calls.

For example, you may want to do....

SELECT * FROM [Production Words] ... ORDER BY ID
--MidLifeXis

Replies are listed 'Best First'.
Re: Re: Database Record Order
by vbrtrmn (Pilgrim) on Dec 30, 2003 at 22:29 UTC
    The problem in using ORDER BY is that each ID is somewhat random, rows 7, 8, 9 share the same ID in 7 and 9, but use a slightly different one in 8.

    --
    paul

      | dcz013   | dc  restaurants    | 0    | dcrestaurants
      | dcz0013  | dc  restaurants    | 0    | dcrestaurants
      | dcz013   | dc  resturants     | 1    |
      | dcz0013  | dc  resturants     | 1    |
      | dcz013   | dc american dining | 0    | dcamericandining
      
      Yeah ... that's going to cause problems alright. Since each ID is bogus, you will need to create a new one, but keep the old just in case. I suggest using an unsigned integer that is auto incremented by the RDMS, but PHB's tend to like ID's with letters in them (don't listen to 'em!).

      If i were in your shoes, i would create a new table and figure out a way to convert the rows in the old table into the new. Prune as you go ... some of those rows have to be redundant and incorrect. You will no doubt not get it right the first few attempts, so prepare for that by having your script first DROP the new table and CREATE it from scratch. Best of luck, this doesn't sound too fun ... :/

      UPDATE:
      OK, i think i might have a viable gameplan.

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
        1.4 million records is a lot, but you might just have enough memory to pull this off by using a hash to keep track of unique ID's (buying more RAM might be the thing to do)
        When faced with this dilema, I reach for the AnyDBM_File module, which comes with perl. This way, your memory requirements for the hash turn in to disk space requirements, which are usually a lot more lax. YMMV.

        thor

        Actually, you'd wanna select only the duplicate ones, since you don't want to deal with the ones that are fine already, right?
        SELECT DISTINCT * FROM TABLE_A as A, TABLE_B as B WHERE A.ID = B.ID and A.SECONDCOLUMN != B.SECONDCOLUMN
        of course, doing a ..
        SELECT COUNT(*) FROM ( SELECT ID, COUNT(*) AS CNT FROM TABLE_A GROUP BY ID ) as COUNTS WHERE CNT = 1
        will get you an idea of how many truely unique records there are. ++jeffa

        Play that funky music white boy..