I'm assuming the change only let the process work through the first 65KB of the file?

No. It did process the whole file, but in 64k chunks.

The reason it ran more quickly is because if the file doesn't contain newlines, -p will load the file as one huge single line.

As pointed out above, the problem with the processing files in chunks, is that if the search term straddles a 64k chunk--say 2 bytes at the end of one chunk, and two bytes at the beginning of the next, then the search term won't match and the substitution won't be made.

The really simple solution to that, it to process the file twice, with different buffer sizes chosen to be relatively prime. You might use 1MB for the first pass and 1MB -3 for the second. This will ensure than any overlaps missed by the first pass will not fall on a boundary on the second pass. Up to 1024GB anyway.

So,

perl -e"BEGIN{$/=\(1024**2) }" -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg" infile >outfile1 perl -e"BEGIN{$/=\(1024**2-3)}" -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg" outfile1 >outfile2

Two passes is obviously slower than one, but much faster than loading the whole damn file into ram on a constrained machine.

This last point is what I assume to be the cause of the performance differential between your Linux and Windows set-ups. If the former has sufficient free ram to allow the whole file to be loaded in one pass, and the latter does not and moves into swapping, the difference is explained.

Another alternative would be to use a sliding buffer, but that too complicated for a one-liner, and often doesn't yield sufficient performance to beat the two-pass approach.


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

In reply to Re^5: Slow find/replace hex in Perl win32 by BrowserUk
in thread Slow find/replace hex in Perl win32 by rickyboone

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.