dear monks---I have an odd need. I want to do an incremental search on words that sit in many files. so, first I form a hash, such as
michael => "file1.txt" mike => "file1.txt,file30.txt" ...
now, I would like to see all keys matching a subset, such as
my %mi_results = $myhashlike{ "mi*" };
this is not hard if the hash is small. first, put all the keys into an array, then do a grep-match on the keys, and then extract the results from %myhashlike.

unfortunately, I may have up to 300 million words (keys) from 30,000 files in my hash.

what's a good solution for this sort of problem? are there data bases that allow regex key searches that would be suitable (esp. if they can cache intelligently)? any perl solutions? is there such a thing as a memory-efficient (say, read-only squeezed) hash?

advice appreciated.

/iaw


In reply to memory-efficient hash kind for incremental sort by iaw4

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.