Eradicatore has asked for the wisdom of the Perl Monks concerning the following question:

Does someone know if you can use regex to grep through a scalar value you just read out from a file in binmode?

I want to read out say 1000 bytes of a file, and then scan that 100 bytes for a string. But I do NOT want to first unpack it because I'm hoping that avoiding the unpack may save some time.

Or am I fooling myself into thinking that dealing with the scalar in binary form will be any faster than unpacking and then using normal regex?

If I do unpack, what's the best way to do that?

open BIN_IN, "<test.elf"; binmode BIN_IN; read(BIN_IN, $stuff, 1000,0); if ($stuff =~ /7F/) {print "got it!\n";} @tmp = unpack("(H2)*", $stuff); foreach $h (@tmp) { print "$h "; } close BIN_IN;

Justin Eltoft

"If at all god's gaze upon us falls, its with a mischievous grin, look at him" -- Dave Matthews

Replies are listed 'Best First'.
Re: Way to grep binary scalar without unpacking
by ikegami (Patriarch) on Oct 04, 2007 at 18:37 UTC
    $stuff =~ /\x7F/

    or

    index($stuff, "\x7F") >= 0

    I think the latter is faster.

    These are better than checking the unpacked string because they won't match 07 folled by F3.

      "You believe?" You oughta know that the regexp engine uses the same code as index() for cases like this. Also, there's more ops to dispatch for the index() way. It isn't obvious at all which is faster.

      ⠤⠤ ⠙⠊⠕⠞⠁⠇⠑⠧⠊

        I never said it was obvious. I never said I was guessing. I believe it's faster because I did some benchmarks to test this, but that was some time ago. I could redo them, but so can the OP, and he has the benefit of having representative data.
Re: Way to grep binary scalar without unpacking
by mwah (Hermit) on Oct 04, 2007 at 19:32 UTC
    EradicatoreI want to read out say 1000 bytes of a file,
    and then scan that 100 bytes for a string.
    But I do NOT want to first unpack it because
    I'm hoping that avoiding the unpack may save some time.


    OK, ikegami answered to that already. I'd like to add that
    index() would be the fastest pure-Perl solution. If its a
    very big binary chunk, write a small "Inline => C {}" wrapper
    to C's (stdlib) memchr() function. This might be, depending on the
    architecture, up to three times faster than index() (on large chunks).

    If I do unpack, what's the best way to do that?

    Your solution would be o.k., you might consider to do
    a pseudo-Schwartzian to map the indices into your target
    array @tmp, sth. like:
    open my $fh, '<', 'test.elf' or die "can't do anything: $!"; binmode $fh; read $fh, my $stuff, 1000 or die "read error: $!"; close $fh; print length $stuff, " bytes in\n"; my $offs = 0; my @tmp = map $_->[1], grep $_->[0] eq '7f', map [$_, $offs++], unpack "(H2)*", $stuff; # prints "7f" offsets in binary file print join':', @tmp;
    Regards

    mwa