Beefy Boxes and Bandwidth Generously Provided by pair Networks
The stupid question is the question not asked
 
PerlMonks  

Re: Re: Database Record Order

by vbrtrmn (Pilgrim)
on Dec 30, 2003 at 22:29 UTC ( [id://317797]=note: print w/replies, xml ) Need Help??


in reply to Re: Database Record Order
in thread Database Record Order

The problem in using ORDER BY is that each ID is somewhat random, rows 7, 8, 9 share the same ID in 7 and 9, but use a slightly different one in 8.

--
paul

Replies are listed 'Best First'.
3Re: Database Record Order
by jeffa (Bishop) on Dec 30, 2003 at 22:48 UTC

    | dcz013   | dc  restaurants    | 0    | dcrestaurants
    | dcz0013  | dc  restaurants    | 0    | dcrestaurants
    | dcz013   | dc  resturants     | 1    |
    | dcz0013  | dc  resturants     | 1    |
    | dcz013   | dc american dining | 0    | dcamericandining
    
    Yeah ... that's going to cause problems alright. Since each ID is bogus, you will need to create a new one, but keep the old just in case. I suggest using an unsigned integer that is auto incremented by the RDMS, but PHB's tend to like ID's with letters in them (don't listen to 'em!).

    If i were in your shoes, i would create a new table and figure out a way to convert the rows in the old table into the new. Prune as you go ... some of those rows have to be redundant and incorrect. You will no doubt not get it right the first few attempts, so prepare for that by having your script first DROP the new table and CREATE it from scratch. Best of luck, this doesn't sound too fun ... :/

    UPDATE:
    OK, i think i might have a viable gameplan.

    jeffa

    L-LL-L--L-LL-L--L-LL-L--
    -R--R-RR-R--R-RR-R--R-RR
    B--B--B--B--B--B--B--B--
    H---H---H---H---H---H---
    (the triplet paradiddle with high-hat)
    
      1.4 million records is a lot, but you might just have enough memory to pull this off by using a hash to keep track of unique ID's (buying more RAM might be the thing to do)
      When faced with this dilema, I reach for the AnyDBM_File module, which comes with perl. This way, your memory requirements for the hash turn in to disk space requirements, which are usually a lot more lax. YMMV.

      thor

      Actually, you'd wanna select only the duplicate ones, since you don't want to deal with the ones that are fine already, right?
      SELECT DISTINCT * FROM TABLE_A as A, TABLE_B as B WHERE A.ID = B.ID and A.SECONDCOLUMN != B.SECONDCOLUMN
      of course, doing a ..
      SELECT COUNT(*) FROM ( SELECT ID, COUNT(*) AS CNT FROM TABLE_A GROUP BY ID ) as COUNTS WHERE CNT = 1
      will get you an idea of how many truely unique records there are. ++jeffa

      Play that funky music white boy..

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://317797]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others learning in the Monastery: (3)
As of 2024-04-20 01:40 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found