I have a home brew archive to dvd app that chunks data in directories and builds DVD's with that data. It also takes all of the fully qualified directory info and stores it into a text file with the attached Archive DVD number. End users can seach this index with a search string such as "this really blows !stupid fun" and get back matches for:
will return:
"/data2/studio/projects/clientname/print/this really/funermaker/santa
+blows/thefile.tif"
exists on Archive-dvd-000002122
but not this:
"/data2/studio/projects/clientname/print/this really/funermaker/santa
+blows stupid/thefile.tif"
because it has stupid in the fq file name.
That is all fine and dandy, my issue is that I have about 420 million items so far (and it grows weekly) and my search is starting to take a long time to show all of the results. I have optimized the search to use qr// regexes for the words (in oder of word lenght) and short circuit out of the loop as soon as one of the qr fails to save time.
I am now jumping back on this to see if there is a "better way" to do this -- currently the search is taking about 1 minute to return all of the matches, my gaol is to get it down to less than 10 seconds if possible. I think i could use one of the many btree searches out there but I think the index size would be way too huge for this (its already very large).
Any sugestions please let me know I want to see how the rest of you would tackle this.
-Waswas
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.