| dcz013   | dc  restaurants    | 0    | dcrestaurants
| dcz0013  | dc  restaurants    | 0    | dcrestaurants
| dcz013   | dc  resturants     | 1    |
| dcz0013  | dc  resturants     | 1    |
| dcz013   | dc american dining | 0    | dcamericandining
Yeah ... that's going to cause problems alright. Since each ID is bogus, you will need to create a new one, but keep the old just in case. I suggest using an unsigned integer that is auto incremented by the RDMS, but PHB's tend to like ID's with letters in them (don't listen to 'em!).

If i were in your shoes, i would create a new table and figure out a way to convert the rows in the old table into the new. Prune as you go ... some of those rows have to be redundant and incorrect. You will no doubt not get it right the first few attempts, so prepare for that by having your script first DROP the new table and CREATE it from scratch. Best of luck, this doesn't sound too fun ... :/

UPDATE:
OK, i think i might have a viable gameplan.

The monkey wrench in my gears were these rows:

| wq12351y059 | hawkinschemical | 1    |
| wq12366y059 | healtheon corp. | 1    |
| wq12367y059 | healthgatedata. | 1    |
and the big caveat is that i have no idea what FLAG is for ... but, here goes.

1.4 million records is a lot, but you might just have enough memory to pull this off by using a hash to keep track of unique ID's (buying more RAM might be the thing to do). Read each row one at a time:

my $sth = $dbh->prepare('SELECT * FROM [Production Words]'); $sth->execute; # finish the rest of this line after you design the new table my $new_sth = $dbh->prepare('INSERT INTO new_table ...'); my %hash; while (my $row = $sth->fetchrow_hashref) {
Then, inside the while loop, pull apart each ID like so:
my ($str,$num) = $row->{ID} =~ /(\D+)(\d+)/; my $new_id = lc($str) . int($num);
This will turn wq12351y059 into wq12351 and dcz0013 into dcz13.

Now we check to see if we have encountered this record before:

unless ($hash{$new_id}++) { # INSERT $row into the new table $new_sth->execute( ... ); } else { # possibly fetch existing row and check to see # if this redundant old row has better data } }
That's about the best i can think of right now. Again, best of luck. :)

jeffa

L-LL-L--L-LL-L--L-LL-L--
-R--R-RR-R--R-RR-R--R-RR
B--B--B--B--B--B--B--B--
H---H---H---H---H---H---
(the triplet paradiddle with high-hat)

In reply to 3Re: Database Record Order by jeffa
in thread Database Record Order by vbrtrmn

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.