Using the word list you linked to earlier (TWL06.txt) with 178,590 eligible words, I use just under 80MB with the following solution:

You might be able to reduce that a bit and gain a little speed by using arrays instead of hashes. I tested this by replacing your trie builder with:

if (defined $opt{w}) { my @data; open(my $fh, '<', $opt{w}) or die "Unable to open '$opt{w}' for re +ading: $!"; while (<$fh>) { chomp; next if length($_) < 3 || /[^a-zA-Z]/; $_ = lc($_); my $code = join('', 'push @{$data', (map { $_-=97; "[$_]"} sor +t{$a<=>$b} unpack 'C*', $_), "}, '$_';"); eval $code; } store(\@data, $opt{d}) or die "Can't store '%data' in '$opt{d}'\n" +; }

And the resultant file is just 8MB rather than 80MB. I started to modify the rest of teh code to use it, but then got lost and gave up, but arrays should be quicker than hashes.

Note: I rewrote it in my own style as a mechanism for understanding it.

I do it myself all the time. (I hate what you did with it! But that's a (probably pointless) argument for another time. :)

Assuming I understood it correctly

You're spot on.

there isn't a lot of room for optimizations.

It only takes 5 seconds to build the index, so that doesn't seem to warrant the effort.

And if I feed it the top 7 characters ordered by frequency in the dictionary: esiarnt--which shoudl be close to worst case--it only takes 0.7 of a second to find the 243 complient words, so there's not much reason to optimise there either.

it is quite beautiful

I think that Perl's bitwise logical operators are one of it's best kept secrets.

They lend themselves to performing a huge variety of set-type operations, very quickly, in a single operation. Effectively 'in parallel'. They can perform a million SELECT-style operations on sets of 8000 items in 1.3 seconds:

$a = ~( $b = chr(0)x 1e3 ); $t=time; my $x = $a ^ $b for 1 .. 1e6; printf "Took %.6f seconds\n", time()-$t;; Took 1.297000 seconds

Or a thousand such operations upon sets of 8 million items in 1.7 seconds:

$a = ~( $b = chr(0)x 1e6 ); $t=time; my $x = $a ^ $b for 1 .. 1e3; printf "Took %.6f seconds\n", time()-$t;; Took 1.645000 seconds

And storage usage doesn't get much better than 1-bit per item. I've never actually looked to see how tightly coded the underlying opcodes are. I've never found the need to.

Maybe I should one day though, as these tend to be the dark corners of the codebase that never get revisited. It's quite possible that they are currently written to operate byte-by-byte which is generally far slower than dealing with data in register-sized chunks. With the advent of 64-bit registers, it is quite possible that they could be speed up significantly.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

In reply to Re^3: Turning A Problem Upside Down by BrowserUk
in thread Turning A Problem Upside Down by Limbic~Region

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.