It could be that you are getting way too complicated for your application!

This is the first chunk. // This is the second chunk. // This is the third chunk. //
Some strategies:
1. Build a memory resident structure with the data that you need all in one pass through the data file. Fancy hash table stuctures,etc.
2. Search the file again and again and let the O/S do the "dirty work". Use regex and just do "something that appears stupid.
3. create a DB (which is expensive) and then query that DB

I mean how big is this file? If it is "small: like 250 MB" after the first search it all winds up memory resident anyway. Next searches (even linear) are 10x+ as fast.

I recommend option(2)..do something stupid and let the O/S do the work. If that is not "fast enough", then start thinking about option 1 or 3.

If say there are only 5,000 files and total DB size is 500 MB...do something easy... this is actually considered "small"! Don't get complex until you need to do it!

Update: Anyway you will be amazed at how quickly even a linear regex search works on a huge file once you have done it once before. On Win XP file size < 1GB.


In reply to Re: split file into smaller chunks by Marshall
in thread split file into smaller chunks by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.