comment on

Just in case you want to stick with the hash approach (the DB approach and the other suggestions would be fine, too), you could also speed up things by avoiding to reload the hash every time. That's essentially the client/server approach I was hinting at in Re: reading dictionary file -> morphological analyser. I.e., you split the program into a server and a client. The server loads the hash once and keeps running, providing a 'dict' service via some socket. The client just connects to the socket, sends the request (the word to be looked up), and reads the server's reply.

Here's a minimal example, just to get you started. (This is not production quality code, and could be improved in many ways (e.g. by making a proper daemon out of it, better error handling, etc.), but I wanted to keep it simple...)

The server:

use IO::Socket;
use Storable;

my $dict = retrieve("hash.txt");

my $sock = IO::Socket::INET->new(
               LocalAddr => "localhost:8888",
                                     # any available port you like
               ReuseAddr => 1,
               Listen    => 2,
           ) or die "$0: can't create listening socket: $!\n";

while (1) {
    my $conn = $sock->accept();  # wait for connection

    next unless ref $conn;       # (just in case...)

    my $query = <$conn>;
    chomp $query;

    print STDERR "query: $query\n";   # just for debugging

    my $reply = (exists $dict->{$query}) ? "FOUND\n" : "NOT FOUND\n";
    print $conn $reply;

    close $conn;
}
[download]

The client:

use IO::Socket;

sub connect_server {
    my $sock = IO::Socket::INET->new(
                   PeerAddr  => "localhost:8888",  
               ) or die "$0: can't connect: $!\n";
    return $sock;
}

my @inputs = qw(
foo
fooed
fooen
);

for my $input (@inputs) {
    
    my $conn = connect_server();
    
    print $conn "$input\n";       # send the query
    my $reply = <$conn>;          # read the response

    close $conn;

    print "found '$input' in lexicon\n" if $reply eq "FOUND\n";
}
[download]

Even though this code is opening a new connection for every word being looked up, it's still pretty fast (around 2 ms per lookup, on my machine).

(To play with the example, start the server in one terminal (as I mentioned, it's not a daemon, so it will stay in the foreground), and then run the client from another terminal... )

In reply to Re: storable: too slow retrieval of big file. by almut
in thread storable: too slow retrieval of big file. by pc2

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.