in reply to Blazingly FAST
in thread Fast sublist generation

This doesn't do what you want it to. Let's take a small example:

%hash = (a => 1, ab => 1, abc => 1, ac => 1); $key = 'ab';
Your first code would return ('ab', 'abc'). The code I'm replying to will return ('a', 'ab').

Let's examine the problem. You need to find all the keys that start or end with a given string. To do this, you need to get a list of the keys.

Now, once you have that list of keys, treat it as if it's just a list of strings with no duplicates. The shortest Perl code to find this would be:

my @list_of_keys = sort grep /(^$search|$search$)/o, keys %hash;
There are a few optimizations you can do (I'd think of more, but I haven't had my first pot of coffee yet): Putting both together yields:

my @list_of_keys = sort grep { substr($_, 0, length($search)) eq $sear +ch || substr($_, -length($search)) eq $search + }, keys %hash;
</code> Please note that this code is completely untested.

Replies are listed 'Best First'.
Re: Re: Blazingly FAST (and very right)
by PetaMem (Priest) on Jul 30, 2001 at 18:01 UTC
    "Your first code would return ('ab', 'abc'). The code I'm replying to will return ('a', 'ab')."

    Lucky me, that is more correct than the "first code". If you
    replace $key with $word, youŽll see the reason why:

    Every entry in the hash is a potentional prefix (or suffix
    for that matter) of $word. Thus yes, the blazingly fast(tm)
    code does the right thing.(tm)

    I encountered a second problem, that is, that simple prefix
    suffix consideration is far away from being enough (no worry
    - it alway was meant as first approximation). But there are
    in various languages all kinds of morphemes, suffixes, prefixes,
    infixes, reduplication, elipses etc. In fact, if you look
    at a given word of n chars, the FULL segmentation would
    require you to produce 2^(n-1) alternatives.

    Try that with a 39char word (Eisenbahnerwohnungsnutzungsverordnungen)

    So I thought we reduce it to already present entries of a lexicon.

    Ciao

      Fair enough. Your original problem statement presented the situation as you had a list of words and were searching to determine which words had the search string as a prefix or suffix. That is what your original code did, and what your second code did not do.

      I agree that storing a list of prefixes (in one hash) and suffixes (in another hash) is definitely preferable (if possible) and will result in a huge savings in time (and probably memory, too!). Restating the problemspace is a very good thing to do, when you're stuck like that.