in reply to storable: too slow retrieval of big file.
Just in case you want to stick with the hash approach (the DB approach and the other suggestions would be fine, too), you could also speed up things by avoiding to reload the hash every time. That's essentially the client/server approach I was hinting at in Re: reading dictionary file -> morphological analyser. I.e., you split the program into a server and a client. The server loads the hash once and keeps running, providing a 'dict' service via some socket. The client just connects to the socket, sends the request (the word to be looked up), and reads the server's reply.
Here's a minimal example, just to get you started. (This is not production quality code, and could be improved in many ways (e.g. by making a proper daemon out of it, better error handling, etc.), but I wanted to keep it simple...)
The server:
use IO::Socket; use Storable; my $dict = retrieve("hash.txt"); my $sock = IO::Socket::INET->new( LocalAddr => "localhost:8888", # any available port you like ReuseAddr => 1, Listen => 2, ) or die "$0: can't create listening socket: $!\n"; while (1) { my $conn = $sock->accept(); # wait for connection next unless ref $conn; # (just in case...) my $query = <$conn>; chomp $query; print STDERR "query: $query\n"; # just for debugging my $reply = (exists $dict->{$query}) ? "FOUND\n" : "NOT FOUND\n"; print $conn $reply; close $conn; }
The client:
use IO::Socket; sub connect_server { my $sock = IO::Socket::INET->new( PeerAddr => "localhost:8888", ) or die "$0: can't connect: $!\n"; return $sock; } my @inputs = qw( foo fooed fooen ); for my $input (@inputs) { my $conn = connect_server(); print $conn "$input\n"; # send the query my $reply = <$conn>; # read the response close $conn; print "found '$input' in lexicon\n" if $reply eq "FOUND\n"; }
Even though this code is opening a new connection for every word being looked up, it's still pretty fast (around 2 ms per lookup, on my machine).
(To play with the example, start the server in one terminal (as I mentioned, it's not a daemon, so it will stay in the foreground), and then run the client from another terminal... )
|
|---|