Slow find/replace hex in Perl win32

rickyboone has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Slow find/replace hex in Perl win32 by BrowserUk (Patriarch) on Sep 29, 2010 at 19:22 UTC
On a 2GB/5.7 million line file, it consistently runs in 30-35 seconds on my AS1007 perl: `[20:14:52.61] C:\test>dir 834245.masks Volume in drive C has no label. Volume Serial Number is 8C78-4B42 Directory of C:\test 18/04/2010 01:02 2,412,431,484 834245.masks 1 File(s) 2,412,431,484 bytes 0 Dir(s) 296,257,802,240 bytes free [20:16:19.67] C:\test>perl -ne "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg +" 834245.masks >junk.dat [20:16:55.62] C:\test>` [download] Which is only a few seconds longer that `wc -l` takes to just count the lines. How long does your SP 5.12 take? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l] [select]
Re^2: Slow find/replace hex in Perl win32 by TomDLux (Vicar) on Sep 29, 2010 at 20:40 UTC
You're using `perl -ne` without a print in the executed chunk. Does junk.dat have any size? Not that that is likely to affect reading the source file ... As Occam said: Entia non sunt multiplicanda praeter necessitatem.	[reply] [d/l]
Re^3: Slow find/replace hex in Perl win32 by BrowserUk (Patriarch) on Sep 29, 2010 at 21:13 UTC
You're right. That's a typo. It takes five minutes when writing the data back to the disk. `[22:01:55.40] C:\test>perl -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg +" 834245.masks >junk.dat [22:06:46.76] C:\test>perl -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg +" 834245.masks >junk.dat [22:09:31.99] C:\test>` [download] Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l]
Re^2: Slow find/replace hex in Perl win32 by rickyboone (Novice) on Sep 29, 2010 at 19:54 UTC
I haven't let it finish, but I have let it run for over 30 minutes before having to kill it. Watching performance stats for the perl.exe process, the CPU hangs around 50%, there is very little I/O (60KB/s or less), and the private memory footprint just keeps increasing.	[reply]
Re^3: Slow find/replace hex in Perl win32 by BrowserUk (Patriarch) on Sep 29, 2010 at 20:03 UTC
the private memory footprint just keeps increasing. On the example I gave, the process memory didn't get above the 3.2MB start-up footprint. It sounds like the file has no (windows recognisable) newlines, so `-pe` is trying to load the entire file into memory as a single line? If so, you may have to resort to processing the file in blocks. Try using: `perl -e"BEGIN{ $/ = \65536 }" -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0 +/sgx"` [download] And see what if any difference that makes? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. RIP an inspiration; A true Folk's Guy	[reply] [d/l] [select]
Re^4: Slow find/replace hex in Perl win32 by rickyboone (Novice) on Sep 30, 2010 at 00:13 UTC
Re^5: Slow find/replace hex in Perl win32 by BrowserUk (Patriarch) on Sep 30, 2010 at 02:39 UTC
Re^4: Slow find/replace hex in Perl win32 by jwkrahn (Abbot) on Sep 29, 2010 at 21:59 UTC
Re^5: Slow find/replace hex in Perl win32 by BrowserUk (Patriarch) on Sep 29, 2010 at 22:02 UTC
Re^4: Slow find/replace hex in Perl win32 by Anonymous Monk on Sep 29, 2010 at 21:48 UTC
Re^5: Slow find/replace hex in Perl win32 by BrowserUk (Patriarch) on Sep 29, 2010 at 22:14 UTC
Re^3: Slow find/replace hex in Perl win32 by ikegami (Patriarch) on Sep 29, 2010 at 20:57 UTC
Try disabling your anti-virus's real-time protection temporarily. You could also be running out of memory if you have long sequences without any 0x0A.	[reply]
Re: Slow find/replace hex in Perl win32 by TomDLux (Vicar) on Sep 29, 2010 at 20:54 UTC
To figure out what is happening, I would start by adding some `print` statements, to get a handle on what is happening. Start with something just before and just after opening the file, just after reading a line, ... That should help narrow down where your processing is hanging up. Once you've got it running through the loop, and it seems to be working, comment out the prints and time the program processing a 1 line file to completion, then 10, 100, 1000, 10000, 100000 line data files. What's the trend? What's the expected processing time for 42 million lines? As Occam said: Entia non sunt multiplicanda praeter necessitatem.	[reply] [d/l]
Re^2: Slow find/replace hex in Perl win32 by rickyboone (Novice) on Sep 30, 2010 at 01:59 UTC
Just to clarify, there are no line-endings in this file (at least not in ASCII). I do think I found the problem, though. I didn't realize that perl was trying to find the end of line. Searching for that, I found "slurp mode", -0777 (undefined record separator). And using a few other recommendations, I also reduced the s///sgx options to just s///g, since my example didn't seem to need s and x. It seems to allow the file to be processed in a matter of seconds, and compares properly to other files processed "manually" with hex editors. `perl -0777 -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/g" input > output` I'm waiting on the availability of another file to test another hex string against, but it won't be available until Oct 1. I think the issue is resolved, but I'd like to wait until then to be sure, unless anyone else has any recommendations or considerations I should be aware of.	[reply] [d/l]
Re^3: Slow find/replace hex in Perl win32 by rickyboone (Novice) on Oct 07, 2010 at 14:41 UTC
Okay, well I think the code is doing what I want it to do, however I've run into a new problem... "Out of memory!" errors. The file is greater than 2GB, which is more than the available memory space for applications in 32-bit Windows. I'm going to try booting the server with /3GB or /PAE to workaround the issue.	[reply]
Re^4: Slow find/replace hex in Perl win32 by BrowserUk (Patriarch) on Oct 07, 2010 at 15:12 UTC