Hello Monks. I'm building a cross reference utility in Perl and I've run into something that has me stumped. The utility is very basic, users search for an old item number and the utility returns a list of matches. The database is simple:

+-----+-----------+-----+ |rowid| old | new | +-----+-----------+-----+ | 1 | ABFD-1234 | AAA | | 2 | ABFD-178G | BBB | | 3 | F2HB-9401 | AAA | | 4 | ZDDR-00W5 | DDD | +-----+-----------+-----+

I've implemented the ability to search anywhere in the old item number (searching for 940 returns row #3) and the ability to not enter the hyphen (searching for ABFD-1 or ABFD1 returns rows #1 and #2.) The users maintaining the utility have asked to use wildcards in the old item number. They'd like to insert an entry like this:

+-----+-----------+-----+ |rowid| old | new | +-----+-----------+-----+ | 5 | D7*D-48*6 | EEE | +-----+-----------+-----+

The expectation being that searching for D7RD or DD482 would return row #5. This wildcard request is what has me stumped. I haven't figured out a sensible way to implement it and I'm thinking I need new sets of eyes to lead me down the right path.

A few things that are potentially relevant: (A) the users and the maintainers are not computer people. To them, the asterisk does not mean one or more characters, it just means any single letter or number in that position. (B) the database will only contain a couple thousand rows so loading the entire database into memory is feasible. (C) the asterisk will only be the character class [A-Z0-9]

One option I've tested is to take any old item number with an asterisk and pre-generate all possible combinations. The challenge with this option is that there are old item numbers that have 5 asterisks which means creating 60 million additional entries for that one item. I've also tried building the 60 million rows in memory and searching against those but the application performance degrades as more wildcard entries are added into the database.

Do you have any suggestions for perl-ish ways to tackle this? I don't really need code examples, just ideas that would be performant.


In reply to Partial Searches Against Multiple Wildcards by p_jacobson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.