in reply to improving the efficiency of a script

If this is going to be used a lot, I'd either stuff the dictionary file into a database, or else construct an index to the offsets and sizes of initial letter sections of the dictionary file.

Assuming the dictionary file is alphabetically sorted, you don't need to slurp the whole file into an array. That is a large chunk of memory for a million words. Allocations that size will slow you painfully if you are driven into swap.

Try just building an array with the a's, shuffling, and taking the first hundred elements. Then discard the a's and replace with the b's, all in a while loop that only reads one line at a time.

You don't need a loop to pick the first hundred elements of an array. A slice will do,

@array[0..99]
and is much faster.

After Compline,
Zaxo

Replies are listed 'Best First'.
Re^2: improving the efficiency of a script
by Limbic~Region (Chancellor) on Jun 19, 2006 at 00:03 UTC
    Zaxo,
    This is very similar to the idea I had. I discovered that compiling the offsets using DBM::Deep was extremely slow, but was fast for subsequent runs. This also has the advantage of not requiring the dictionary file to be sorted.
    #!/usr/bin/perl use strict; use warnings; use DBM::Deep; open(my $dict, '<', 'words.raw') or die "Unable to open 'words.raw' fo +r reading: $!"; my $db = DBM::Deep->new("offsets.db"); build_db($db, $dict) if ! scalar keys %$db; for my $char ('a' .. 'z') { for (1 .. 100) { print get_rand_word($db, $char, $dict); } } sub build_db { my ($db, $dict) = @_; my $pos = tell $dict; while ( <$dict> ) { my $char = substr($_, 0, 1); push @{$db->{$char}}, $pos; $pos = tell $dict; } } sub get_rand_word { my ($db, $char, $dict) = @_; my $offset = $db->{$char}[rand @{$db->{$char}}]; seek $dict, $offset, 0; my $word = <$dict>; return $word; }
    Other options include Storable and DBD::SQLite if a real RDBMS isn't available.

    Cheers - L~R