Re: Dulpicate data? is it worth de-duping?

You mention that you're calling fetchrow_hashref, which suggests you're using DBI. If this is the case, you should consider de-duping in your SQL SELECT statement rather than in your Perl code. The SQL dialect you're using might have a DISTINCT parameter that can be passed to SELECT.

If you find yourself writing Perl code to filter the results of a SQL query, it's usually a sign that the SQL query could be improved.

Comment on Re: Dulpicate data? is it worth de-duping? Select or Download Code

Replies are listed 'Best First'.
Re: Re: Dulpicate data? is it worth de-duping? by Ryszard (Priest) on Mar 14, 2002 at 12:40 UTC
Thanks for the thought, however the problem more along the lines of restructuring the data (which is pretty easy), but then should I de-dupe it? I'm begining to think its a trade off between purity and efficiency. I personally like the purity option, however I also like to try squeeze cycles out of my code (to the best of my limited knowledge). I'm torn! what do I do? ;-)	[reply]
Re: Re: Re: Dulpicate data? is it worth de-duping? by tjh (Curate) on Mar 14, 2002 at 15:25 UTC
I vote for: De-duping the existing data, Discovering how dupe data got in there in the first place, and, Debugging discoveries from #2 so it never happens again. Depending on your application now, and in the future, duplicate records could become a real nightmare. It sounds like you have reasoned it through and the dupes don't represent a problem, so maybe I'm off base. But after having a problem with 100,000+ dupe transactions getting dumped into a multi-client ecommerce database by renegade code from a know-it-all coder after an unscripted production install... I guess I have issues....	[reply]
Re: Re: Re: Re: Dulpicate data? is it worth de-duping? by mpeppler (Vicar) on Mar 14, 2002 at 23:46 UTC
Ouch! Removing duplicates from a SQL database is always a fun exercise... I had the dubious pleasure last year, when I discovered that a table on the production system didn't have a unique index on one of the keys, which led to duplicates there, and ultimately, because this was a mapping table for userId values between two systems, to generating duplicate user records. Michael	[reply]
Re: Re: Re: Re: Re: Dulpicate data? is it worth de-duping? by tjh (Curate) on Mar 15, 2002 at 01:22 UTC