iKnowNothing has asked for the wisdom of the Perl Monks concerning the following question:

I am writing a script to read and decode binary files which contain numeric data. The file contains some 32-bit words which contain "Discrete" data(data fields less than 8-bits).

I have come up with a solution (see below), however it runs very slow since I must call this subroutine about 100 times per record. It takes about 1-minute to read a 4MB file, which contains about 4000 records. When I remove the calls to this subroutine, the time drops to about 17-seconds.

I am looking for suggestions on more efficient ways to do this. Any input would be appreciated. (There is a readmore section to describe the code displayed)
sub ReadBits { my $BitStringIn = @_[0]; my $OffsetRef = @_[1]; my $Length = @_[2]; my $ItemListRef = @_[3]; my $BitStringOut = 0; #capture the relevant bits $BitStringOut = substr($BitStringIn,$$OffsetRef,$Length); #update Offset $$OffsetRef += $Length; #determine the number of zero's I need to add to make a 32-bit num +ber $NumZeros = 32-$Length; #Create string of zeros with a length of $NumZeros $Zeros = unpack("B$NumZeros",pack("N",0)); #Insert string of zeros to $BitString to create a 32-bit string $BitStringOut = $Zeros.$BitStringOut; #Convert 32-bit string into a number $NumOut = unpack("N",pack("B32",$BitStringOut)); #Update ItemList @$ItemListRef = (@$ItemListRef,$NumOut); return($NumOut); }
$BitStringIn is the 32-bit number which contains the data I'm interested in, represented as a string of 1's and 0's.

$OffsetRef is a reference to the offset (in bits) of the data I'm interested in.

$Length is the length (in bits) of the data I'm interested in.

$ItemListRef is a reference to a list where I'm collecting data from the file.

I think the comments in the code describe what I am trying to do.

Replies are listed 'Best First'.
Re: Reading individual bits
by BrowserUk (Patriarch) on Jul 29, 2004 at 22:45 UTC

    If I'm reading this correctly, the bitstring your passing in is an "ascii-ized binary" string? If so, then the conversion of your strings from real binary to ascii-ized binary is the source of one performance hit. You should not need to do that.

    Have you looked at vec? It only deals with powers-of-two numbers of bits, but it it may fit your needs. If not it is easily adapted to do so.

    You can almost certainly get away with not padding your numbers with trailing zeros before conversion, pack and unpack are pretty clever about such things.

    I'd have had a go at providing an alternative subroutine, but it's structure--the way it maintains values in the callers scope through references--plus the confusion over the terms 'binary' and 'bitstring' would mean making (possibly wrong) guesses about this subroutines use.

    Perhaps you could post a short program that sets up a piece of test data, and then calls this subroutine to extract a couple of values from it?


    Examine what is said, not who speaks.
    "Efficiency is intelligent laziness." -David Dunham
    "Think for yourself!" - Abigail
    "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
      Sure thing, here's a program that shows basically how I use the function.
      @ItemList = (); #this would be filled with some data already, and pass +ed in as a reference. $ItemListRef = \@ItemList; $LCSStatWd1 = 524288; #this would normally come from the file $LCSStatWd1Bits = unpack("B32",pack("N",$LCSStatWd1)); #creating the b +it string print "$LCSStatWd1Bits\n"; #Discretes $Offset = 0; $LidarPwrState = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $LidarPwrStateToggle = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRe +f); $LCSMode = ReadBits($LCSStatWd1Bits,\$Offset,6,$ItemListRef); $LidarCondition = ReadBits($LCSStatWd1Bits,\$Offset,3,$ItemListRef); $NewCentAvail = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $ObjOutOfScnWin = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $ReAcqFailed = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $LaserState = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $GrCommEnable = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $LidarMode = ReadBits($LCSStatWd1Bits,\$Offset,2,$ItemListRef); $LCSSubMode = ReadBits($LCSStatWd1Bits,\$Offset,3,$ItemListRef); $GNCpredit = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $FOVedgemAz = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $FOVedgepAz = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $FOVedgemEl = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $FOVedgepEl = ReadBits($LCSStatWd1Bits,\$Offset,1,$ItemListRef); $Spare2 = ReadBits($LCSStatWd1Bits,\$Offset,6,$ItemListRef); sub ReadBits { my $BitStringIn = @_[0]; my $OffsetRef = @_[1]; my $Length = @_[2]; my $ItemListRef = @_[3]; my $BitStringOut = 0; $BitStringOut = substr($BitStringIn,$$OffsetRef,$Length); $$OffsetRef += $Length; $NumZeros = 32-$Length; $Zeros = unpack("B$NumZeros",pack("N",0)); $BitStringOut = $Zeros.$BitStringOut; $NumOut = unpack("N",pack("B32",$BitStringOut)); @$ItemListRef = (@$ItemListRef,$NumOut, "\t"); print "\n$NumOut"; return($NumOut); }
      I did look at vec, however it didn't look like it would work for me since I sometimes read 3 or 6 bits. I'll try taking out the padding, I thought that each number would need to be the same length to do the conversion right.

        This performs bit mask operations upon 32-bit values thereby saving the need to ascii-ize the binary.

        #! perl -slw use strict; sub getBits { my( $N32, $offset, $bits ) = @_; my $mask = 1 << $bits; $mask--; $mask <<= $offset; return ( $N32 & $mask ) >> $offset; } ## Encode some test data. my $data = unpack 'N', pack 'b*', #0123456789 123456789 123456789 1 '11101111101111111011111111101111'; # 7 31 127 511 15 print $data; ## 4160617975 print getBits( $data, 0, 3 ); ## Should be 7 print getBits( $data, 4, 5 ); ## should be 31 print getBits( $data, 10, 7 ); ## Should be 127 print getBits( $data, 18, 9 ); ## Should be 511 print getBits( $data, 28, 4 ); ## Should be 15 __END__ P:\test>test 4160617975 7 31 127 511 15

        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon

        Whilst I'm looking, this bit of code is very strange--and slow.

        @$ItemListRef = (@$ItemListRef,$NumOut, "\t");

        You

        1. pass in a reference to an array,
        2. dereference that array to produce a list,
        3. tack the latest value and a tab onto the end of the list
        4. and then assign that back to the original array.

        This action is the same as doing:

        push @$ItemListRef, $NumOut, "\t";

        Accept that it uses gob-loads (tech. term) of memory in building lots of intermediate lists in the process. No wonder your subroutine is slow.

        Also, why are you interspersing your values with tabs?

        If the intention is to print them as tab-delimited list to a file or the screen. Just build the array without the inspersed tabs and then use join to add teh tabs when you print.

        print join "\t", @ItemList;

        Not only will you save half the memory of your list, by not allocating it you'll save a bit more time.


        Examine what is said, not who speaks.
        "Efficiency is intelligent laziness." -David Dunham
        "Think for yourself!" - Abigail
        "Memory, processor, disk in that order on the hardware side. Algorithm, algoritm, algorithm on the code side." - tachyon
Re: Reading individual bits
by jmcnamara (Monsignor) on Jul 29, 2004 at 23:59 UTC

    Here are two variations. The first uses sprintf to do the zero padding. The second uses oct to convert the binary string to decimal (for perl >= 5.6).

    For clarity I've omitted the $$OffsetRef part. Using proper bitmasks and the bitwise operators would probably be faster, see perlop.

    sub ReadBits2 { my $bitstring = @_[0]; my $offset = @_[1]; my $length = @_[2]; return unpack "N", pack "B*", sprintf "%032s", substr $bitstring, $offset, $length; } sub ReadBits3 { return oct b . substr @_[0], @_[1], @_[2]; }

    --
    John.

Re: Reading individual bits
by Anonymous Monk on Aug 01, 2004 at 09:14 UTC
    Have a look at Bit::Vector. It's main routines are written in C and execute about 5 times faster than native perl for me. And it contains quite high-level subroutines, which makes programming more fun.