andri85 has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys, I'm a Perl newbie (but I love it already!) and I'm struggling with one of my first programs... I have to read a bin file which contains recors like this:
1 5.7 -12.3 2.1 0 1 225.3 0 0 1 3.24 2.98 -10.02 1 2 334.7 ...
which has to be interpretated as follows:
1) read the first value (1 byte, integer)
2) if it's larger than 0, read the following 6 values which are 3 doubles (8bytes), 2 long integer (4bytes) and another double (8bytes again). Store the values and restart from 1)
3) if it's 0, then go back to 1)

In matlab this is rather easy, but extremely slow for the dimesnions of my files (it takes more than one hour to read one of them). I implemented it as follows:
[photon1, c] = fread(fidin, [1], 'int8'); if (photon1>0) [readout(1,1:3), c] = fread(fidin, [3], 'double'); [readout(1,4:5), c] = fread(fidin, [2], 'int32'); [readout(1,6), c] = fread(fidin, [1], 'double'); end

I'm using read() and unpack() to read the file but i'm getting the wrong numbers. I'm surely making mistakes in the interpeetation of the type of data. Here is my perl code:
read(BIN_DATA, $bin_data, $short_int_size); # Convert from binary to a numeric value $photon1 = unpack('C', $bin_data); if ($photon1>0) { read(BIN_DATA, $bin_data, $double_size); $x1 = unpack('f', $bin_data); read(BIN_DATA, $bin_data, $double_size); $y1 = unpack('f', $bin_data); read(BIN_DATA, $bin_data, $double_size); $z1 = unpack('f', $bin_data); read(BIN_DATA, $bin_data, $int_size); $obj_scatt1 = unpack('L', $bin_data); read(BIN_DATA, $bin_data, $int_size); $coll_scatt1 = unpack('L', $bin_data); read(BIN_DATA, $bin_data, $double_size); $en1 = unpack('f', $bin_data); }

I'm sure I didn't hike all the way up to the mountain top in vain, and that your wisdom will enlight me to solve this problem. I thank you in advance for that.

Andri

Replies are listed 'Best First'.
Re: read + unpack
by Anonymous Monk on May 06, 2009 at 01:34 UTC
    Your data doesn't look like binary, looks like text. If your data is really binary, please provide a short hexdump we can use to test your code, like this
    C:\>hexdump datafile | head 00000000: 45 43 48 4F 20 69 73 20 - 6F 6E 2E 0D 0A |ECHO is o +n. | 0000000d; C:\>od -tx1 datafile | head 0000000 45 43 48 4f 20 69 73 20 6f 6e 2e 0d 0a 0000015 C:\>
      The data should be binary, indeed with matlab I'm not reading it as a text and it works fine. Another hint, a friend of mine wrote a c++ code that outputs the correct numbers and it works:
      // Read the pink photon data arg_file.read(num_photons_bin, 1); pink_photons = (unsigned int) *num_photons_bin; // Increment the counter based on the bytes read status_counter += 1; // Read pink photon data for (int photon = 0; photon < pink_photons; photon++) { // Read the binary data into the variable arg_file.read((char *)&pink_x_pos, sizeof(double)); arg_file.read((char *)&pink_y_pos, sizeof(double)); arg_file.read((char *)&pink_z_pos, sizeof(double)); arg_file.read((char *)&pink_num_scatters, sizeof(int)) +; arg_file.read((char *)&pink_col_scatters, sizeof(int)) +; arg_file.read((char *)&pink_energy_val, sizeof(double) +); }

      So to my understanding this is a bin file. Regarding the hexdump, unfortunately the file has a 32768bytes header so I don't know how to skip them for the hexdump (what do I have to put instead of "head"?).
      Thank you very much for your attention.
      Andri
Re: read + unpack
by citromatik (Curate) on May 06, 2009 at 07:14 UTC
    read the following 6 values which are 3 doubles (8bytes), 2 long integer (4bytes) and another double (8bytes again)

    For double precision floats use the d template in unpack, you are using f which is for single precision floats. See pack for details

    citromatik

      I did: in fact, I tried to "guess" the right combination of data files with many attempts but I never got a good result. I was also wondering how little-endian or big-endian system can change the way I have to read the file and how I can understand whether my system is littleendian or bigendian
      Thanks for the suggestions.
      Andri
Re: read + unpack
by Anonymous Monk on May 06, 2009 at 12:52 UTC
    Depending on the system you're on, you may also need to do binmode(BIN_DATA) after you open() it. It's good practice to do this regardless when working with binary files for portability's sake. -Greg
      It was not in the portion of the code I pasted but I did use binmode(BIN_DATA).