in reply to Partial Searches Against Multiple Wildcards

seems like your customer wants to define ranges.

> (A) the users and the maintainers are not computer people. To them, the asterisk does not mean one or more characters, it just means any single letter or number in that position.

> (B) the database will only contain a couple thousand rows so loading the entire database into memory is feasible.

> (C) the asterisk will only be the character class [A-Z0-9]

If that's the case, I can see two additional approaches.

  1. you search the DB for all 2^n combinations of an n-word search-term with positions replaced by *. In your example D7*D would lead to your expected solution. ² °
  2. you reverse the task by loading all old-keys into memory and create one (or multiple) long or-ed regexes out of it, which you try to match against the search term. When cleverly made optimization should lead to quick results.

I'd start with the first approach b/c it's way easier to implement. If performance becomes a hindrance go for the second one.

Cheers Rolf
(addicted to the Perl Programming Language :)
see Wikisyntax for the Monastery

²) I really doubt that your customer will try to use many wildcards in the same ID, because this would turn the old->new mapping ambiguous. Anyway you can analyze upfront the max number of * allowed and their minimal distance. This will help you cut down the number of possible combinations considerably from the previous 2^n.

°) in the case of 5 asterix 2^n = 32 is far better than your previous 36^n = 60466176.