Hello Monks. I'm building a cross reference utility in Perl and I've run into something that has me stumped. The utility is very basic, users search for an old item number and the utility returns a list of matches. The database is simple:
+-----+-----------+-----+ |rowid| old | new | +-----+-----------+-----+ | 1 | ABFD-1234 | AAA | | 2 | ABFD-178G | BBB | | 3 | F2HB-9401 | AAA | | 4 | ZDDR-00W5 | DDD | +-----+-----------+-----+
I've implemented the ability to search anywhere in the old item number (searching for 940 returns row #3) and the ability to not enter the hyphen (searching for ABFD-1 or ABFD1 returns rows #1 and #2.) The users maintaining the utility have asked to use wildcards in the old item number. They'd like to insert an entry like this:
+-----+-----------+-----+ |rowid| old | new | +-----+-----------+-----+ | 5 | D7*D-48*6 | EEE | +-----+-----------+-----+
The expectation being that searching for D7RD or DD482 would return row #5. This wildcard request is what has me stumped. I haven't figured out a sensible way to implement it and I'm thinking I need new sets of eyes to lead me down the right path.
A few things that are potentially relevant: (A) the users and the maintainers are not computer people. To them, the asterisk does not mean one or more characters, it just means any single letter or number in that position. (B) the database will only contain a couple thousand rows so loading the entire database into memory is feasible. (C) the asterisk will only be the character class [A-Z0-9]
One option I've tested is to take any old item number with an asterisk and pre-generate all possible combinations. The challenge with this option is that there are old item numbers that have 5 asterisks which means creating 60 million additional entries for that one item. I've also tried building the 60 million rows in memory and searching against those but the application performance degrades as more wildcard entries are added into the database.
Do you have any suggestions for perl-ish ways to tackle this? I don't really need code examples, just ideas that would be performant.
In reply to Partial Searches Against Multiple Wildcards by p_jacobson
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |