Re: Slow find/replace hex in Perl win32
by BrowserUk (Patriarch) on Sep 29, 2010 at 19:22 UTC
|
On a 2GB/5.7 million line file, it consistently runs in 30-35 seconds on my AS1007 perl:
[20:14:52.61] C:\test>dir 834245.masks
Volume in drive C has no label.
Volume Serial Number is 8C78-4B42
Directory of C:\test
18/04/2010 01:02 2,412,431,484 834245.masks
1 File(s) 2,412,431,484 bytes
0 Dir(s) 296,257,802,240 bytes free
[20:16:19.67] C:\test>perl -ne "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg
+" 834245.masks >junk.dat
[20:16:55.62] C:\test>
Which is only a few seconds longer that wc -l takes to just count the lines.
How long does your SP 5.12 take?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] |
|
|
[22:01:55.40] C:\test>perl -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg
+" 834245.masks >junk.dat
[22:06:46.76] C:\test>perl -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg
+" 834245.masks >junk.dat
[22:09:31.99] C:\test>
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] |
|
|
I haven't let it finish, but I have let it run for over 30 minutes before having to kill it. Watching performance stats for the perl.exe process, the CPU hangs around 50%, there is very little I/O (60KB/s or less), and the private memory footprint just keeps increasing.
| [reply] |
|
|
the private memory footprint just keeps increasing.
On the example I gave, the process memory didn't get above the 3.2MB start-up footprint.
It sounds like the file has no (windows recognisable) newlines, so -pe is trying to load the entire file into memory as a single line?
If so, you may have to resort to processing the file in blocks. Try using:
perl -e"BEGIN{ $/ = \65536 }" -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0
+/sgx"
And see what if any difference that makes?
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] [d/l] [select] |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| [reply] |
Re: Slow find/replace hex in Perl win32
by TomDLux (Vicar) on Sep 29, 2010 at 20:54 UTC
|
To figure out what is happening, I would start by adding some print statements, to get a handle on what is happening. Start with something just before and just after opening the file, just after reading a line, ...
That should help narrow down where your processing is hanging up.
Once you've got it running through the loop, and it seems to be working, comment out the prints and time the program processing a 1 line file to completion, then 10, 100, 1000, 10000, 100000 line data files. What's the trend? What's the expected processing time for 42 million lines?
As Occam said: Entia non sunt multiplicanda praeter necessitatem.
| [reply] [d/l] |
|
|
Just to clarify, there are no line-endings in this file (at least not in ASCII).
I do think I found the problem, though. I didn't realize that perl was trying to find the end of line. Searching for that, I found "slurp mode", -0777 (undefined record separator). And using a few other recommendations, I also reduced the s///sgx options to just s///g, since my example didn't seem to need s and x. It seems to allow the file to be processed in a matter of seconds, and compares properly to other files processed "manually" with hex editors.
perl -0777 -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/g" input > output
I'm waiting on the availability of another file to test another hex string against, but it won't be available until Oct 1. I think the issue is resolved, but I'd like to wait until then to be sure, unless anyone else has any recommendations or considerations I should be aware of.
| [reply] [d/l] |
|
|
Okay, well I think the code is doing what I want it to do, however I've run into a new problem... "Out of memory!" errors. The file is greater than 2GB, which is more than the available memory space for applications in 32-bit Windows. I'm going to try booting the server with /3GB or /PAE to workaround the issue.
| [reply] |
|
|