Beefy Boxes and Bandwidth Generously Provided by pair Networks
Clear questions and runnable code
get the best and fastest answer
 
PerlMonks  

storing a huge text file into a hash

by Angharad (Pilgrim)
on Dec 07, 2010 at 18:17 UTC ( #875853=perlquestion: print w/replies, xml ) Need Help??

Angharad has asked for the wisdom of the Perl Monks concerning the following question:

Hi there
I'm wanting to store the contents of a particularly large text file into a hash so I can use it as a look up table later on in the perl script. I've written the code to create the hash, but the act of storing the data is taking an impossibly long time. Heres the code:
use strict; use Data::Dumper; my $file = shift; my %hash; open(IN, "$file") || die "ERROR: can't open $file: $!\n"; while(<IN>) { chomp; my @info = split(/\,/, $_); my $md5 = $info[0]; # I only want to store the data # in columns 1 and 2 my $uni = $info[1]; $hash{$uni} = $md5; } close IN;
Can anyone tell me a better way of doing this so that the contents of the file are stord in the hash more quickly? Any thoughts much appreciated!!

Replies are listed 'Best First'.
Re: storing a huge text file into a hash
by roboticus (Chancellor) on Dec 07, 2010 at 18:44 UTC

    Angharad:

    With that much data, I'd suggest putting it in a DBM file (AnyDBM_File or similar) or a database (DBI with DBD::SQLite) would give you a couple of advantages:

    • You wouldn't have to keep all that data in memory
    • You'd only have to load data into the database when it changes
    • You'd still get quick lookup on your data

    ...roboticus

    When your only tool is a hammer, all problems look like your thumb.

      Ok, thanks for the advice. I've decided to store the data in a database and query it as required :)
Re: storing a huge text file into a hash
by raybies (Chaplain) on Dec 07, 2010 at 20:08 UTC
    If you want the whole hash in a file, quick and don't mind if the output file's readable text format, I'm currently in love with the Storable module. Just use "store" and the filename you wanna put it in, and "retrieve" to get it back.
Re: storing a huge text file into a hash
by sundialsvc4 (Abbot) on Dec 07, 2010 at 21:55 UTC

    Another clever-trick is a tied hash.   This looks like a hash but it is backed by some other storage mechanism, such as Berkeley-DB.

    But, lately, I think that this has paled in favor of SQLite (http://www.sqlite.org) ... a public domain(!) flat-file database that works extremely well.   You could prepare an SQL query ahead of time, then execute it repeatedly to retrieve “the ’droid you’re looking for.”   It is blazingly fast.

Re: storing a huge text file into a hash
by BrowserUk (Patriarch) on Dec 07, 2010 at 18:23 UTC
    a particularly large text file ... but the act of storing the data is taking an impossibly long time

    How big (in lines) and how long?

      its 574MB in size with over 11000000 lines

        You didn't say how long it is taking on your system? On mine, this one liner loads 11e6 lines into a hash in a < 100 seconds:

        >perl -e"BEGIN{keys %h=2**23}" -nE"$h{$_->[0]}=$_->[1] for [split];print qq[\r$.]" junk.dat 11000000

        But it does use over 2GB of RAM.

        If your system is taking substantially longer than that, it could be that you are moving in to swapping, which would slow things down a lot.

        If you are loading this hash frequently, then you'd probably be better to stick your data into a tied DB like SQLite.


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://875853]
Approved by ww
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others pondering the Monastery: (5)
As of 2023-01-30 10:23 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found

    Notices?