Ryszard has asked for the wisdom of the Perl Monks concerning the following question:

I'm doing a fetchrow_hashref and want to restructure the data slightly.

I want to go from this (Data::Dumper outut)

{ 'KEY1' => 'VALUE1', 'KEY2' => 'VALUE2', 'KEY3' => 'VALUE3' };
to this:
$VAR1 = 'VALUE1'; $VAR2 = { 'KEY1' => 'VALUE1', 'KEY2' => 'VALUE2', 'KEY3' => 'VALUE3' };

Which is not a problem. What I am wondering is, if it would be worthwhile then delete'ing the duplicate data, ie KEY1.(are there any perf issues?)

The extra data will pose no problem (other than it being redundant), however I'm looking at creating a "best practise" app.

Replies are listed 'Best First'.
Re: Dulpicate data? is it worth de-duping?
by rdfield (Priest) on Mar 14, 2002 at 11:03 UTC
    From a purist point of view, duplicate data is a no-no, as there is a lot of room for inconsistencies and other nasties to creep in. To save deleting from the hash, why not assign $VAR1 as a ref to $VAR2{KEY1}?

    rdfield

Re: Dulpicate data? is it worth de-duping?
by tomhukins (Curate) on Mar 14, 2002 at 12:24 UTC

    You mention that you're calling fetchrow_hashref, which suggests you're using DBI. If this is the case, you should consider de-duping in your SQL SELECT statement rather than in your Perl code. The SQL dialect you're using might have a DISTINCT parameter that can be passed to SELECT.

    If you find yourself writing Perl code to filter the results of a SQL query, it's usually a sign that the SQL query could be improved.

      Thanks for the thought, however the problem more along the lines of restructuring the data (which is pretty easy), but then should I de-dupe it?

      I'm begining to think its a trade off between purity and efficiency. I personally like the purity option, however I also like to try squeeze cycles out of my code (to the best of my limited knowledge).

      I'm torn! what do I do? ;-)

        I vote for:

        1.  De-duping the existing data,
        2.  Discovering how dupe data got in there in the first place, and,
        3.  Debugging discoveries from #2 so it never happens again.

        Depending on your application now, and in the future, duplicate records could become a real nightmare. It sounds like you have reasoned it through and the dupes don't represent a problem, so maybe I'm off base. But after having a problem with 100,000+ dupe transactions getting dumped into a multi-client ecommerce database by renegade code from a know-it-all coder after an unscripted production install... I guess I have issues....