in reply to Dulpicate data? is it worth de-duping?

You mention that you're calling fetchrow_hashref, which suggests you're using DBI. If this is the case, you should consider de-duping in your SQL SELECT statement rather than in your Perl code. The SQL dialect you're using might have a DISTINCT parameter that can be passed to SELECT.

If you find yourself writing Perl code to filter the results of a SQL query, it's usually a sign that the SQL query could be improved.

Replies are listed 'Best First'.
Re: Re: Dulpicate data? is it worth de-duping?
by Ryszard (Priest) on Mar 14, 2002 at 12:40 UTC
    Thanks for the thought, however the problem more along the lines of restructuring the data (which is pretty easy), but then should I de-dupe it?

    I'm begining to think its a trade off between purity and efficiency. I personally like the purity option, however I also like to try squeeze cycles out of my code (to the best of my limited knowledge).

    I'm torn! what do I do? ;-)

      I vote for:

      1.  De-duping the existing data,
      2.  Discovering how dupe data got in there in the first place, and,
      3.  Debugging discoveries from #2 so it never happens again.

      Depending on your application now, and in the future, duplicate records could become a real nightmare. It sounds like you have reasoned it through and the dupes don't represent a problem, so maybe I'm off base. But after having a problem with 100,000+ dupe transactions getting dumped into a multi-client ecommerce database by renegade code from a know-it-all coder after an unscripted production install... I guess I have issues....

        Ouch!

        Removing duplicates from a SQL database is always a fun exercise...

        I had the dubious pleasure last year, when I discovered that a table on the production system didn't have a unique index on one of the keys, which led to duplicates there, and ultimately, because this was a mapping table for userId values between two systems, to generating duplicate user records.

        Michael