But neither regexps nor unpack feels like the fastest way to do it. Is there any other way?

Without benchmarking, I don't know. But one thing you might want to try is to use the index function to determine how many characters to grab in your substr. This may save on the cost of the regexp engine, which might just be doing a Boyer-Moore search anyway.

Here are a couple of snippets to try. If you can guarentee that there is always a null character in your four bytes, the problem is pretty trivial: just look for the next one starting at the current index point in the string:

$value = substr($string, $index, index($string, "\x00", $index) - $index)

Which literally means "take the substring of the string starting at the index point for n characters, where n characters is the difference between the index and the next NUL.

If there are values like 9876 (that is, no NULs), then this simple-minded approach won't work, since you may run arbitrarily far down the string before encountering the next NUL, or there may be no more at all, in which case index will return -1, and things will really go askew.

In this case, you'll have to save the result in a temporary length variable, and clamp it to 4 if it's not in range:

my $len = index($string, "\x00", $index) - $index; $len = 4 if $len < 0 or $len > 4; $value = substr($string, $index, $len)

That's probably about the best you can do (in terms of other alternatives). If this approach is worse, or only marginally better, the only remaining card to play would be to code it in C and use Inline::C.

A bit later: regarding unpack, in my experience, when I'm really strapped for speed it rarely makes the grade with respect to other alternatives. I think the meme that says "unpack is the fastest" needs to be taken with a grain of salt, and my hunch is that it takes a certain amount of time to decode the format string argument. A single substr with offsets determined by index is usually faster (but at the cost of more make-work code). YMM undoubtedly V.

• another intruder with the mooring in the heart of the Perl


In reply to Re: Null-stripping performance by grinder
in thread Null-stripping performance by qiau

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.