Hi,

well though the fellow monks have posted some interesting solutions, and that is why this forum is much better than any other, i think that the real answer, the one that meets your requirements, is : "You cannot!"
What do i mean by this, well i have been into string bizz for a while now, and dealing with the exact problem that you have stated and those bit more complicated. First, to meet the requirements in terms of ram you will not achieve this in perl. try c, and even then you will have some high constants associated with your data structure. Second, the best linear time, proportional to the size of the pattern, searches, can be achieved by using suffix trees (Ukkonnen).
There is a perl module (module) that is extremely good for this and i would suggest buying some extra ram and using it instead writing the tree by yourself in c. It has an O(P) search time capabilities and as far as the "additional constants" associated with perl goes,well you can in fact neglect those since the tree will use much more space(RAM).

This was a comment to your statement:"...this is not efficient as I have to deal with file sized over GB's..."

Now as GrandFather always keeps reminding me:

"Note that "efficient" probably isn't the same as "fast"..."

and almost always is right, maybe this isn't what you need. if i were you i would look into bit-vectors -> through them you can compress your DNA by 75%, meaning 75% less memory required and 75% shorter the search space, that on a 2GB data will outperform(in terms of effective efficiency) the suffix tree and O(P) time complexity, though it doesn't run in linear time. However if you are thinking of using it on larger data sets, well welcome to the club and good luck !

Cheers

Baxy


In reply to Re: Exact string matching by baxy77bax
in thread Exact string matching by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.