in reply to Re: Blazingly FAST (... yet oh so wrong)
in thread Fast sublist generation

"Your first code would return ('ab', 'abc'). The code I'm replying to will return ('a', 'ab')."

Lucky me, that is more correct than the "first code". If you
replace $key with $word, youŽll see the reason why:

Every entry in the hash is a potentional prefix (or suffix
for that matter) of $word. Thus yes, the blazingly fast(tm)
code does the right thing.(tm)

I encountered a second problem, that is, that simple prefix
suffix consideration is far away from being enough (no worry
- it alway was meant as first approximation). But there are
in various languages all kinds of morphemes, suffixes, prefixes,
infixes, reduplication, elipses etc. In fact, if you look
at a given word of n chars, the FULL segmentation would
require you to produce 2^(n-1) alternatives.

Try that with a 39char word (Eisenbahnerwohnungsnutzungsverordnungen)

So I thought we reduce it to already present entries of a lexicon.

Ciao

  • Comment on Re: Re: Blazingly FAST (and very right)

Replies are listed 'Best First'.
Re: {3} Blazingly FAST (and very right)
by dragonchild (Archbishop) on Jul 30, 2001 at 18:17 UTC
    Fair enough. Your original problem statement presented the situation as you had a list of words and were searching to determine which words had the search string as a prefix or suffix. That is what your original code did, and what your second code did not do.

    I agree that storing a list of prefixes (in one hash) and suffixes (in another hash) is definitely preferable (if possible) and will result in a huge savings in time (and probably memory, too!). Restating the problemspace is a very good thing to do, when you're stuck like that.