in reply to hashes performance issue

It's not surprising that your loop is taking a long time. You're duplicating every hash in your HoHs, for every token in your token table. And it is completely unnecessary.

This will probably run an order (or two) of magnitude faster:

for $a (sort keys %mstrToken) { $df = 1; foreach $doc ( @docNames ) { if( exists $hash2{ $doc }{ $a } ) { $mh{$a}->{ docf } = $df++; $mh{$a}->{ $doc } = $hash2{ $doc }{ $a }; } } }

That said, you really ought to consider using more descriptive names for your variables. If $a was $token, things would be much clearer. And %hash2? Is that the second hash in this program? Or the second one this week; decade; century?

While your at it, move to use strict;. It's not obligatory, but you'll be glad you did in the long term.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

Replies are listed 'Best First'.
Re^2: hashes performance issue
by stan131 (Acolyte) on Mar 29, 2009 at 04:50 UTC
    Thanks BrowserUk. This reduced the performance by a huge magnitude. Now its taking only about 22 seconds. This was the first time I used array references, but yes I did realize the mistake.
    I agree with your naming convention advise and will start using strict;
    I am still newbie with perl and learning all the best practices.
    Thanks,
    Stan