Re^2: Slow find/replace hex in Perl win32

Just to clarify, there are no line-endings in this file (at least not in ASCII).

I do think I found the problem, though. I didn't realize that perl was trying to find the end of line. Searching for that, I found "slurp mode", -0777 (undefined record separator). And using a few other recommendations, I also reduced the s///sgx options to just s///g, since my example didn't seem to need s and x. It seems to allow the file to be processed in a matter of seconds, and compares properly to other files processed "manually" with hex editors.

perl -0777 -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/g" input > output

I'm waiting on the availability of another file to test another hex string against, but it won't be available until Oct 1. I think the issue is resolved, but I'd like to wait until then to be sure, unless anyone else has any recommendations or considerations I should be aware of.

Comment on Re^2: Slow find/replace hex in Perl win32 Download Code

Replies are listed 'Best First'.
Re^3: Slow find/replace hex in Perl win32 by rickyboone (Novice) on Oct 07, 2010 at 14:41 UTC
Okay, well I think the code is doing what I want it to do, however I've run into a new problem... "Out of memory!" errors. The file is greater than 2GB, which is more than the available memory space for applications in 32-bit Windows. I'm going to try booting the server with /3GB or /PAE to workaround the issue.	[reply]
Re^4: Slow find/replace hex in Perl win32 by BrowserUk (Patriarch) on Oct 07, 2010 at 15:12 UTC
I'm going to try booting the server with /3GB or /PAE to workaround the issue. If that works, it'll will only be a matter of time before the file grows bigger than memory again. Did you try the two-pass solution. A tad slower, but it'll never run out of memory. It can handle files upto 1024GB as posted using a 1MB buffer. And if 1 Terabyte becomes limiting, increasing the buffer size to 2MB means it can handle 4 TB. A 4MB buffer takes you to 16TB; and so on. You can even avoid the need to make two (disk) passes. Simply pipe the output of the first pass to the input of the second: `perl -e"BEGIN{$/=\(10242) }" -pe "s/\x00\x42\x00\x11/\x00\x42\x00\x +f0/sg" infile \| perl -e"BEGIN{$/=\(10242-3)}" -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg" >outfile2` [download] It still makes two passes of the data, but only reads and writes the disk once for each block. To demonstrate that it works. Given the input file fred: `c:\test>type fred 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890` [download] Using one pass, with a search term that straddles the buffer boundaries, no changes are made: `c:\test>perl -e"BEGIN{$/=\10}" -pe" s[8901][abcd]" fred > joe c:\test>type joe 1234567890123456789012345678901234567890123456789012345678901234567890 +123456789012345678901234567890123456789012345678901234567890` [download] But after two piped passes: `c:\test>perl -e"BEGIN{$/=\10}" -pe" s[8901][abcd]g" fred \| perl -e"BEG +IN{$/=\7}" -pe"s[8901][abcd]g" >joe` [download] The changes are made: `c:\test>type joe 1234567abcd234567abcd2345678901234567abcd2345678901234567abcd234567890 +1234567abcd234567abcd2345678901234567abcd2345678901234567890` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l] [select]