If I understand correctly, you want to build a searchable index of your documents, with 10,000 distinct indexable terms, and 100,000 synonyms for those terms. The global substitution seems like a bad way to go, and it's not clear to me how you're going to only do 10,000 subs. It seems like you should have 100,000 keys and 10,000 values in your %termhash.

I think what you'd want to do is, given a synonym hash, make a reverse mapping that keys the unique values to a regex that alternates among all the keys.

my %synhash = ( meat => 'meat', ham => 'meat', beef => 'meat'); my %revsynhash; # Collect the synonyms as an array while (my ($k,$v) = each %synhash) { push @{$revsynhash{$v}}, $k; } # Turn the arrays into alternative-lists, longest first while (my $k = each %revsynhash) { $revsynhash{$k} = join '|', sort {length $b <=> length $a} @{$revsyn +hash{$k}}; } # Now you have # %revsynhash = ( meat => 'beef|meat|ham')
So you run your indexing scheme using the 10,000 items in %revsynhash, the keys being the indexed term, while the values are the regexes to count as hits for those terms. When someone wants to look up a term, you use %synhash to translate it, and hit your index for what it translated to.

Caution: Contents may have been coded under pressure.

In reply to Re^3: Hash versus substitutation efficiency? by Roy Johnson
in thread Hash versus substitutation efficiency? by bwelch

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.