in reply to Re^2: Turning A Problem Upside Down
in thread Turning A Problem Upside Down
Using the word list you linked to earlier (TWL06.txt) with 178,590 eligible words, I use just under 80MB with the following solution:
You might be able to reduce that a bit and gain a little speed by using arrays instead of hashes. I tested this by replacing your trie builder with:
if (defined $opt{w}) { my @data; open(my $fh, '<', $opt{w}) or die "Unable to open '$opt{w}' for re +ading: $!"; while (<$fh>) { chomp; next if length($_) < 3 || /[^a-zA-Z]/; $_ = lc($_); my $code = join('', 'push @{$data', (map { $_-=97; "[$_]"} sor +t{$a<=>$b} unpack 'C*', $_), "}, '$_';"); eval $code; } store(\@data, $opt{d}) or die "Can't store '%data' in '$opt{d}'\n" +; }
And the resultant file is just 8MB rather than 80MB. I started to modify the rest of teh code to use it, but then got lost and gave up, but arrays should be quicker than hashes.
Note: I rewrote it in my own style as a mechanism for understanding it.
I do it myself all the time. (I hate what you did with it! But that's a (probably pointless) argument for another time. :)
Assuming I understood it correctly
You're spot on.
there isn't a lot of room for optimizations.
It only takes 5 seconds to build the index, so that doesn't seem to warrant the effort.
And if I feed it the top 7 characters ordered by frequency in the dictionary: esiarnt--which shoudl be close to worst case--it only takes 0.7 of a second to find the 243 complient words, so there's not much reason to optimise there either.
it is quite beautiful
I think that Perl's bitwise logical operators are one of it's best kept secrets.
They lend themselves to performing a huge variety of set-type operations, very quickly, in a single operation. Effectively 'in parallel'. They can perform a million SELECT-style operations on sets of 8000 items in 1.3 seconds:
$a = ~( $b = chr(0)x 1e3 ); $t=time; my $x = $a ^ $b for 1 .. 1e6; printf "Took %.6f seconds\n", time()-$t;; Took 1.297000 seconds
Or a thousand such operations upon sets of 8 million items in 1.7 seconds:
$a = ~( $b = chr(0)x 1e6 ); $t=time; my $x = $a ^ $b for 1 .. 1e3; printf "Took %.6f seconds\n", time()-$t;; Took 1.645000 seconds
And storage usage doesn't get much better than 1-bit per item. I've never actually looked to see how tightly coded the underlying opcodes are. I've never found the need to.
Maybe I should one day though, as these tend to be the dark corners of the codebase that never get revisited. It's quite possible that they are currently written to operate byte-by-byte which is generally far slower than dealing with data in register-sized chunks. With the advent of 64-bit registers, it is quite possible that they could be speed up significantly.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^4: Turning A Problem Upside Down
by Limbic~Region (Chancellor) on Oct 17, 2010 at 04:29 UTC |