in reply to Re: Dulpicate data? is it worth de-duping?
in thread Dulpicate data? is it worth de-duping?

Thanks for the thought, however the problem more along the lines of restructuring the data (which is pretty easy), but then should I de-dupe it?

I'm begining to think its a trade off between purity and efficiency. I personally like the purity option, however I also like to try squeeze cycles out of my code (to the best of my limited knowledge).

I'm torn! what do I do? ;-)

  • Comment on Re: Re: Dulpicate data? is it worth de-duping?

Replies are listed 'Best First'.
Re: Re: Re: Dulpicate data? is it worth de-duping?
by tjh (Curate) on Mar 14, 2002 at 15:25 UTC
    I vote for:

    1.  De-duping the existing data,
    2.  Discovering how dupe data got in there in the first place, and,
    3.  Debugging discoveries from #2 so it never happens again.

    Depending on your application now, and in the future, duplicate records could become a real nightmare. It sounds like you have reasoned it through and the dupes don't represent a problem, so maybe I'm off base. But after having a problem with 100,000+ dupe transactions getting dumped into a multi-client ecommerce database by renegade code from a know-it-all coder after an unscripted production install... I guess I have issues....

      Ouch!

      Removing duplicates from a SQL database is always a fun exercise...

      I had the dubious pleasure last year, when I discovered that a table on the production system didn't have a unique index on one of the keys, which led to duplicates there, and ultimately, because this was a mapping table for userId values between two systems, to generating duplicate user records.

      Michael

        Oops. So, hmm.

        Lol, I started to say 'ok, yours is bigger than mine' because I could easily see that one becoming hell...

        But I'll refrain :) In any case, these were "opportunities for excellence", eh?.

        Where's the laugh track when you need it?