Re: Re: Database Record Order

Replies are listed 'Best First'.
3Re: Database Record Order by jeffa (Bishop) on Dec 30, 2003 at 22:48 UTC
\| dcz013 \| dc restaurants \| 0 \| dcrestaurants \| dcz0013 \| dc restaurants \| 0 \| dcrestaurants \| dcz013 \| dc resturants \| 1 \| \| dcz0013 \| dc resturants \| 1 \| \| dcz013 \| dc american dining \| 0 \| dcamericandining Yeah ... that's going to cause problems alright. Since each ID is bogus, you will need to create a new one, but keep the old just in case. I suggest using an unsigned integer that is auto incremented by the RDMS, but PHB's tend to like ID's with letters in them (don't listen to 'em!). If i were in your shoes, i would create a new table and figure out a way to convert the rows in the old table into the new. Prune as you go ... some of those rows have to be redundant and incorrect. You will no doubt not get it right the first few attempts, so prepare for that by having your script first DROP the new table and CREATE it from scratch. Best of luck, this doesn't sound too fun ... :/ UPDATE: OK, i think i might have a viable gameplan. Read more... (2 kB) jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l] [select]
Re: 3Re: Database Record Order by thor (Priest) on Dec 31, 2003 at 00:45 UTC
1.4 million records is a lot, but you might just have enough memory to pull this off by using a hash to keep track of unique ID's (buying more RAM might be the thing to do) When faced with this dilema, I reach for the AnyDBM_File module, which comes with perl. This way, your memory requirements for the hash turn in to disk space requirements, which are usually a lot more lax. YMMV. thor	[reply]
Re: 3Re: Database Record Order by exussum0 (Vicar) on Dec 31, 2003 at 04:38 UTC
Actually, you'd wanna select only the duplicate ones, since you don't want to deal with the ones that are fine already, right? `SELECT DISTINCT * FROM TABLE_A as A, TABLE_B as B WHERE A.ID = B.ID and A.SECONDCOLUMN != B.SECONDCOLUMN` [download] of course, doing a .. `SELECT COUNT() FROM ( SELECT ID, COUNT() AS CNT FROM TABLE_A GROUP BY ID ) as COUNTS WHERE CNT = 1` [download] will get you an idea of how many truely unique records there are. ++jeffa Play that funky music white boy..	[reply] [d/l] [select]


The stupid question is the question not asked
	PerlMonks