I noticed a couple of things about your code that might help to speed things up a little.

The first is that you are striping trailing spaces from your fields with a regex. 80+ calls into the regex engine per line is going to be quite expensive, and is probably unnecessary. You don't show us what unpack tempate you are using, but if you can use the template char 'A' to unpack your fields, then there is no need to take additional steps to trim trailing spaces as this will be done for you. Eg.

print "'$_' " for unpack '(A5)5', 'abcdeabcd abc ab a '; 'abcde' 'abcd' 'abc' 'ab' 'a';

You would need 5.8 in order to use the '(Ann)*' syntax, but using earlier versions of perl, you can achieve the same effect using

my $template = 'A5' x 80;

Also, the way you are building your $record var is less efficient that it could be. Once you have removed the need for the regex, you can more simply CSVify the fields using join, reducing the body of the while loop to

print '"' . join( '","', unpack '(A15)86', $_ ), "\"\n";

Thus removing the need for the intermediates @values, $record and $field and a chop which should further improve things.

This assumes that your fields don't contain any "s that would need escaping as is indicated by your code.

You seem to be running without strict and without using my. It worth noting that lexical vars are generally faster that globals, although if the above changes are possible it pretty much removes the need for either.

Your idea of accumulating 100 or so lines of output together before printing them is likely to backfire. Given the length of your lines, building up a buffer of 130k in several hundred (80-fields x100) steps is likely to cause lots of reallocing and copying of memory. NOTE: This is speculation. It may be that perl is clever enough to re-use the largest lump of memory for the second and subsequent lines, but given the step wise manner in which it would be accumulated, it probably isn't.

It probably would be worth while ensuring that you have buffering turned on for STDOUT. Perl is probably already quite adept at buffering the output in a fairly optimal fashion given the chance.


Examine what is said, not who speaks.
"Efficiency is intelligent laziness." -David Dunham
"When I'm working on a problem, I never think about beauty. I think only how to solve the problem. But when I have finished, if the solution is not beautiful, I know it is wrong." -Richard Buckminster Fuller



In reply to Re: Re: Re: What's the most efficient way to write out many lines of data? by BrowserUk
in thread What's the most efficient way to write out many lines of data? by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.