I would encourage you to question your motives in this task. Why do you think you need to have just 12 records per output file? And what's wrong, really, with having just a single file with 5000+ records in it?

If you're bothered now by the extra time it takes to create 400+ new files, there's a good chance you'll be bothered again whenever you need to do a global scan of those 400+ files later on (e.g. to search for some particular string).

In your later reply, you say that reading/parsing the one big file seems to take very little time, and most of the time during your split routine is spent handling the open/write/close on all the little files. This is a normal outcome, which you will also observe when reading data back from all those little files. So, what benefit do you get from the little files that will offset the price you pay in extra run time?

If you're trying to improve access time to any chosen record in the set, by reducing the size of the file that must be read to fetch that record, there are better ways to do this, that do not involve writing tons of little files.

For instance, create an index table of byte offsets for each record within the one big file; if each record is uniquely identified by some sort of "id" field in the xml structure, store that with the record's byte offset. Then to read a record back, just use the "seek()" function to go directly to that record, read to the end of that record, and parse it. That's a simple technique, and it would be hard to come up with a faster access method than that.


In reply to Re: Quickest way to write multiple files by graff
in thread Quickest way to write multiple files by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.