Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Hello all I am reading in a binary file with unpack (32 bit unsigned integer), shuffling the order of the individual numbers, then writing it out again in binary using pack (big-endian 32-bit unsigned integer). This all works well, until I try to do something to the data in between - a simple arithmetic opperation like scaling the numbers by a factor. I know this is probably trivial too you all, but the opperation seems to transform the data into internal floating point, that I can no longer repack correctly. Is there a way around this? Desperate and Dataless
  • Comment on Data munging - in, out and something in between

Replies are listed 'Best First'.
Re: Data munging - in, out and something in between
by GrandFather (Saint) on Apr 23, 2007 at 22:24 UTC

    Show us the code. In particular, show us a short program the demonstrates the problem with what you are getting and what you expected to get.


    DWIM is Perl's answer to Gödel
Re: Data munging - in, out and something in between
by ikegami (Patriarch) on Apr 24, 2007 at 02:56 UTC

    Whether Perl converts the number to a floating point number internally doesn't matter. The number will automatically get converted back into an integer. Decimals will be lost, of course.

    $p = 'ABCD'; $n = unpack('N', $p); print("$n\n"); # 1094861636 $n *= 1/3; print("$n\n"); # 364953878.666667 $p = pack('N', $n); $n = unpack('N', $p); print("$n\n"); # 364953878

    The two most likely causes in my mind are 1) that you're trying to store a number too big for the field into which it's being packed, or 2) that you didn't binmode the file from which you are reading or to which you are writing.

    We can best help you if you show us a bit of code that reproduces the problem. Be sure to specify what output you are getting and what output you are expecting.

Re: Data munging - in, out and something in between
by rodion (Chaplain) on Apr 24, 2007 at 02:58 UTC
    You may find your problem by just putting together a snippet of code for us to look at, as Grandfather suggests, but if you don't find it, this forum will then have something to work with. As an additional way to characterize the problem, you might try the following command lines:
    perl -e print(pack('II',0x085c,0x5c080000)); > temp.pk perl -e "print(pack('NN',map($_*2,unpack('II',<>))));" > temp2.bk perl -e "printf('%8x:%8x',unpack('NN',<>));" temp2.pk produces --> 10b8:b8100000 (as expected)
    They look like they do what you describe, but the numbers come out as expected, at least on my windows box. If the numbers come out right on your system, then take a look at what you're doing that's different from the examples, or show us and someone here will certainly take a look at it.
Re: Data munging - in, out and something in between
by Anonymous Monk on Apr 24, 2007 at 08:05 UTC
    Sorry for the lack of code in my original question; it is not so easy to demonstrate in a code snippet, and impossible to show the data, but here goes...
    open(IN, " < $infile" ); my $si=8192; # the number of entries is known my %matrix; while(my $c<$si) { read(IN, my $bin, 4); $data = unpack('N', $bin); my $d = Unshuffle($c); $matrix{$d} += $data; $c++; } close( IN ); open(OUT, " >$outfile" ); foreach my $d (sort {$a <=> $b} (keys %matrix)) { print (OUT pack('V', $matrix{$d})); } close( OUT );
    The subroutine Unshuffle() simply calculates a new position in the output matrix for each entry. The above works until I try to scale the data, e.g.  $matrix{$d} += $data/3; In which case the output is scrambled. BTW, null opperations like:  $matrix{$d} += $data/1;  $matrix{$d} += $data+0; still work. This, I suspect, explains the fact that the script also fails if more than one of data points added to make $matrix{$d} is non-zero. I guess that my number is now too long for the "V" template in pack. Yes, the template must be 'V', otherwise a downstream program spits the dummy. thanks again
      I tried your code on my machine, with a few additions/modifications to get it running, all of them marked with "# **" below. I didn't see any problem with the output. The only really significant modifications were to take the "my" away from the "$c" in the while loop, since it was re-initializing $c with each pass through the loop, and to add binmode(IN), which actually doesn't make any difference with the data I'm using.

      Try this modified code on your machine and see if you get appropriate output. It only processes the first two numbers, so it should be easy to see what's going on and play with it.

      use warnings; # ** use strict; # ** my $data; # ** my $c = 0; # ** my $outfile = 'tempo.pk'; # ** my $infile = shift; # ** open(IN, " < $infile" ); binmode(IN); # ** prevents newline translation my $si=2; # the number of entries is known my %matrix; while($c<$si) { # ** removed my read(IN, my $bin, 4); $data = unpack('N', $bin); my $d = Unshuffle($c); print "$c -> $d ($data)\n"; # ** $matrix{$d} += $data/3; # ** $c++; } close( IN ); open(OUT, " >$outfile" ); foreach my $d (sort {$a <=> $b} (keys %matrix)) { print "$outfile $d ($matrix{$d})\n"; # ** print (OUT pack('V', $matrix{$d})); } close( OUT ); open(IN, " <$outfile" ); # ** while( read(IN, my $bin, 4) ) { # ** $data = unpack('V', $bin); # ** print "$data:"; # ** } # ** sub Unshuffle { # ** interchange the first two positions my $val = shift; return 1 if ($val == 0); return 0 if ($val == 1); return $val; } # output was # 0 -> 1 (2140) # 1 -> 0 (1544028160) # tempo.pk 0 (514676053.333333) # tempo.pk 1 (713.333333333333) # 514676053:713: