Dulpicate data? is it worth de-duping?

Ryszard has asked for the wisdom of the Perl Monks concerning the following question:

I'm doing a fetchrow_hashref and want to restructure the data slightly.

I want to go from this (Data::Dumper outut)

{
  'KEY1' => 'VALUE1',
  'KEY2' => 'VALUE2',
  'KEY3' => 'VALUE3'
};
[download]

to this:

$VAR1 = 'VALUE1';
$VAR2 = {
           'KEY1' => 'VALUE1',
           'KEY2' => 'VALUE2',
           'KEY3' => 'VALUE3'
         };
[download]

Which is not a problem. What I am wondering is, if it would be worthwhile then delete'ing the duplicate data, ie KEY1.(are there any perf issues?)

The extra data will pose no problem (other than it being redundant), however I'm looking at creating a "best practise" app.

Comment on Dulpicate data? is it worth de-duping? Select or Download Code

Replies are listed 'Best First'.
Re: Dulpicate data? is it worth de-duping? by rdfield (Priest) on Mar 14, 2002 at 11:03 UTC
From a purist point of view, duplicate data is a no-no, as there is a lot of room for inconsistencies and other nasties to creep in. To save deleting from the hash, why not assign $VAR1 as a ref to $VAR2{KEY1}? rdfield	[reply]
Re: Dulpicate data? is it worth de-duping? by tomhukins (Curate) on Mar 14, 2002 at 12:24 UTC
You mention that you're calling `fetchrow_hashref`, which suggests you're using DBI. If this is the case, you should consider de-duping in your SQL `SELECT` statement rather than in your Perl code. The SQL dialect you're using might have a `DISTINCT` parameter that can be passed to `SELECT`. If you find yourself writing Perl code to filter the results of a SQL query, it's usually a sign that the SQL query could be improved.	[reply] [d/l] [select]
Re: Re: Dulpicate data? is it worth de-duping? by Ryszard (Priest) on Mar 14, 2002 at 12:40 UTC
Thanks for the thought, however the problem more along the lines of restructuring the data (which is pretty easy), but then should I de-dupe it? I'm begining to think its a trade off between purity and efficiency. I personally like the purity option, however I also like to try squeeze cycles out of my code (to the best of my limited knowledge). I'm torn! what do I do? ;-)	[reply]
Re: Re: Re: Dulpicate data? is it worth de-duping? by tjh (Curate) on Mar 14, 2002 at 15:25 UTC
I vote for: De-duping the existing data, Discovering how dupe data got in there in the first place, and, Debugging discoveries from #2 so it never happens again. Depending on your application now, and in the future, duplicate records could become a real nightmare. It sounds like you have reasoned it through and the dupes don't represent a problem, so maybe I'm off base. But after having a problem with 100,000+ dupe transactions getting dumped into a multi-client ecommerce database by renegade code from a know-it-all coder after an unscripted production install... I guess I have issues....	[reply]
Re: Re: Re: Re: Dulpicate data? is it worth de-duping? by mpeppler (Vicar) on Mar 14, 2002 at 23:46 UTC
Re: Re: Re: Re: Re: Dulpicate data? is it worth de-duping? by tjh (Curate) on Mar 15, 2002 at 01:22 UTC