Beefy Boxes and Bandwidth Generously Provided by pair Networks
We don't bite newbies here... much
 
PerlMonks  

Re: Finding dictionary words in a string.

by kvale (Monsignor)
on Mar 13, 2004 at 04:35 UTC ( [id://336333]=note: print w/replies, xml ) Need Help??


in reply to Finding dictionary words in a string.

Instead of testing the string against every dictionary word, it would be better to search along the string for words that fit the current pattern. Suppose that you have some minmum word length n of dictionary words. Then at each position in the string, get that substring of length n and search your dictionary of sorted words for that pattern; this is a O(n*log(n)) operation. Then check each of the possible words against the rest of the string. For a reasonable n, this will cut down search time drastically.

This basic idea carried to its logical conclusion forms the basis for a using a trie data structure to find words fast. There is a module Text::Trie that implements this:

use Tree::Trie; use strict; my($trie) = new Tree::Trie; $trie->add(qw[aeode calliope clio erato euterpe melete melpomene mneme + polymnia terpsichore thalia urania]); my(@all) = $trie->lookup(""); my(@ms) = $trie->lookup("m"); $" = "--"; print "All muses: @all\nMuses beginning with 'm': @ms\n"; my(@deleted) = $trie->remove(qw[calliope thalia doc]); print "Deleted muses: @deleted\n";
A trie consisting of 250,000 words will take up a good deal of space, but even a truncate trie will speed things up.

-Mark

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://336333]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-03-28 15:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found