in reply to Slow find/replace hex in Perl win32

On a 2GB/5.7 million line file, it consistently runs in 30-35 seconds on my AS1007 perl:

[20:14:52.61] C:\test>dir 834245.masks Volume in drive C has no label. Volume Serial Number is 8C78-4B42 Directory of C:\test 18/04/2010 01:02 2,412,431,484 834245.masks 1 File(s) 2,412,431,484 bytes 0 Dir(s) 296,257,802,240 bytes free [20:16:19.67] C:\test>perl -ne "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg +" 834245.masks >junk.dat [20:16:55.62] C:\test>

Which is only a few seconds longer that wc -l takes to just count the lines.

How long does your SP 5.12 take?


Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
RIP an inspiration; A true Folk's Guy

Replies are listed 'Best First'.
Re^2: Slow find/replace hex in Perl win32
by TomDLux (Vicar) on Sep 29, 2010 at 20:40 UTC

    You're using perl -ne without a print in the executed chunk. Does junk.dat have any size?

    Not that that is likely to affect reading the source file ...

    As Occam said: Entia non sunt multiplicanda praeter necessitatem.

      You're right. That's a typo. It takes five minutes when writing the data back to the disk.

      [22:01:55.40] C:\test>perl -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg +" 834245.masks >junk.dat [22:06:46.76] C:\test>perl -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0/sg +" 834245.masks >junk.dat [22:09:31.99] C:\test>

      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.
Re^2: Slow find/replace hex in Perl win32
by rickyboone (Novice) on Sep 29, 2010 at 19:54 UTC
    I haven't let it finish, but I have let it run for over 30 minutes before having to kill it. Watching performance stats for the perl.exe process, the CPU hangs around 50%, there is very little I/O (60KB/s or less), and the private memory footprint just keeps increasing.
      the private memory footprint just keeps increasing.

      On the example I gave, the process memory didn't get above the 3.2MB start-up footprint.

      It sounds like the file has no (windows recognisable) newlines, so -pe is trying to load the entire file into memory as a single line?

      If so, you may have to resort to processing the file in blocks. Try using:

      perl -e"BEGIN{ $/ = \65536 }" -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0 +/sgx"

      And see what if any difference that makes?


      Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
      "Science is about questioning the status quo. Questioning authority".
      In the absence of evidence, opinion is indistinguishable from prejudice.

        Sorry about the delay... meetings.

        The file processed quickly, but didn't seem to get through the whole file. I'm assuming the change only let the process work through the first 65KB of the file?

        I'm trying to have the file processed as a constant, binary stream. I don't need Perl or Windows to perform any EOL conversions, or working on a line-by-line basis, for example. The intent is for the script to just find the hex string, replace it with another, leaving the rest of the file intact.

        perl -e"BEGIN{ $/ = \65536 }" -pe "s/\x00\x42\x00\x11/\x00\x42\x00\xf0 +/sgx"

        And what if the four byte pattern crosses the 65536 byte boundary?    Oops.     :-)

        Maybe the file contains \n but not \r\n, so the auto-splitting works on unix but not windows?

      Try disabling your anti-virus's real-time protection temporarily.

      You could also be running out of memory if you have long sequences without any 0x0A.