in reply to perl in DOS woes

I probably could have been a little clearer. The data is coming from an A/D test set. The A/D has so many bits per sample, let's say 10 bits/sample. What the setup does is take the data and store it in labview. To save, you hit a save button and then it presents you with a menu to add header information, which is simply placed at the beginning of the file. Then it takes the samples, which are two's complement, sign extends to fill up the MSByte (in this case resulting in two bytes) and saves all of the samples as one long string of binary data... MSbyte0,LSbyte0,MSbyte1,LSbyte1,MSbyte2,LSbyte2,... ...,MSbyten,LSbyten Do you still recommend unpack as opposed to ord... I suppose it would be more compact with unpack.

Replies are listed 'Best First'.
Re: Re: perl in DOS woes
by BrowserUk (Patriarch) on Nov 06, 2002 at 00:31 UTC

    Do you still recommend unpack as opposed to ord.

    The answer is a qualified "yes". I'll get to the qualification in bit.

    Ostensibly, once you have stripped the header from your binary data, extracting your 16-bit, big-endian values from string would be a simple as

    my @ADsamples = unpack 'n*', $binaryData;

    Which will 'do the right thing' with big-endian 16 bit data regardless of the architecture that it runs on.

    However, there is a caveat as I mention a the top. The 'n' upack format specifier is for unsigned 16-bit values. There is no equivalent for big-endian, signed 16-bit data.

    The upshot of this is that if your original data has the msb of the original sample is set, once this is sign extended, the resultant will be a negative value. Treating this as an unsigned value will result in large positive values!

    This may not be a problem as you maybe oring out the bits that you need and ignoring the machinations of the sign extension. Otherwise, getting the negative values back is fairly trivial.

    @signed = map{ $_ > 32767 ? $_ - 65536 : $_ } @unsigned;

    So long as your aware of it, no problem.

    That does bring up one other matter though,related to your use of ord. Depending on how you're breaking out the values from your sting, and which version of Perl you are using, ord has a trap waiting for the unwary using it to manipulate non-character data, namely utf.

    In recent version of Perl (>5.6.1 I think but I'm not certain), ord can return values >255. Additionally, depending on what function you use and what regex (if any) you use, attempting to break a scalar into chars will not necessarially render bytes. split/./, $binaryData; is going to attempt to treat your data as variable-length chars if it discovers anything in the data that looks like a ucs char. This also holds true for  my @bytes = /./g; for instance unless you take precations to prevent it.

    You may already be aware of this and taking the appropriate steps, but as I fell into the trap myself very recently, I thought it worth mentioning.


    Nah! You're thinking of Simon Templar, originally played (on UKTV) by Roger Moore and later by Ian Ogilvy