Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: To Hash or to Array--Uniqueness is the question.

by Ryszard (Priest)
on Dec 02, 2005 at 08:28 UTC ( #513532=note: print w/replies, xml ) Need Help??

in reply to To Hash or to Array--Uniqueness is the question.

Warning, untested code:
my %stathash; while (<FH>) { $stathash{$_}++; }
Has the extra advantage of counting the number of hits for each unique value.

You can then do some grooy stuff, like pulling out records which occur n times, records which appear in one set and not another (if you use two hashes, two datasets), or records which appear in both sets, (again,if you use two hashes, two datasets)

I regularly do this with sets of about 500k records to determine where my data integrity issues lie, its pretty damn fast.

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://513532]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others about the Monastery: (4)
As of 2022-12-10 09:47 GMT
Find Nodes?
    Voting Booth?

    No recent polls found