G'day, I'm having a problem with YET ANOTHER BINARY CONVERSION ... I have read help from other (previous) questions on this forum as well as perlpacktut, and this is just *not* gelling. I hope I'm not stupid, but this could very well be the case because my problem seems simple enough, yet I just don't seem to find a solution. I feel stupid having to come here for help on this, but I'm close to hanging myself on this ...

My problem: convert binary data to text (really to a piddle)<\p>

I program on Mac 10.6 and I come from the land of matlab/octave

Below I provide my code (attempt) at extracting just the *some* header data from this binary file. Just this was poking holes in the dark. Woefully pathetic, but just this is mystifying me. I guess the most frustrating aspect of this is that I can convert this file using matlab (a language I know), but cannot use matlab any longer because 1.) it's terribly slow when dealing with hundreds+ of these files and 2.) I want to learn perl for processing the data that I use because of reason 1.). I have dabbled a with Perl for scripting over the last decade (use it mainly for file management) and really like language. I really am eager to learn here and not just get this working or seeking code. So if you could review the code below and offer any general or specific insight I'm sure I'd find it most helpful. Thank you.

My attempt:

#!/opt/local/bin/perl -w # # cs2txt.pl will convert cross-spectra binary file to text file # This will quickly become outdated with the Perl module under deve +lopment, however, there is good cause for constructing this script: e +ducational (learning Perl). The Perl module will need to read data in + from these files and store them in a matrix using PDL as the backbon +e for computations. So understanding how to extact (without matlab or + octave) is crucial. # # ################################### # BINARY FILE TEMPLATE: # # * They have a variable size header section followed by the cross +spectra products. # * The data uses Big-Endian byte ordering (Most Significant Byte f +irst. This means that on Intel platforms, you will need to swap the b +yte order for the variable being read.) # * IEEE floating point values single (4bytes) and double (8byte pr +ecision). # * Two&#700;s complement, integer values. # # Data Type Definitions: # * Uint8 : Unsigned 8bit integer # * Sint8 : Signed 8bit integer # * Uint16 : Unsigned 16bit integer # * Sint16 : Signed 16bit integer # * Uint32 : Unsigned 32bit integer # * Sint32 : Signed 32bit integer # * Uint64 : Unsigned 64bit integer # * Sint64 : Signed 64bit integer # * Float : IEEE single precision floating point number (4bytes +) # * Double : IEEE double precision floating point number (8bytes +) # * Size4 : Unsigned 32bit integer indicating the size of follo +wing data # * Char4 : Four character code (meaning that the next four byt +es make a four character string) # * Char8 : 8byte string zero terminated (zero fill to get 8byt +es max. must have at least one zero) # * Char32 : 32byte string zero terminated (zero fill to get max +. must have at least one zero) # * Char64 : 64byte string zero terminated (zero fill to get max +. must have at least one zero) # * Char256 : 256byte string zero terminated (zero fill to get ma +x. must have at least one zero) # * Complex : 2 IEEE single precision floating point numbers of r +eal and imag pairs (8bytes, 4bytes each float) # # HEADER: # Each File has two major sections. A Header section and a Data sect +ion. The Header section is as follows: # - The header is expandable. Each newer version also contains th +e information used the by older version. # - When reading a CrossSpectra file that is a newer version than + you expect then use the Extent field to skip to the beginning of the + cross spectra data. # - The following Header description is a set of data fields in o +rder where each field description is a value type with implied size, +followed by the field name, and followed by the field&#700;s descript +ion. # * Note. If version is 3 or less, then nRangeCells=31, nDopplerCel +ls=512, nFirstRangeCell=1 # # Version 1: # * SInt16 -> nCsaFileVersion -> File Version 1 to latest. (If gr +eater than 32, it&#700;s probably not a spectra file.) # * UInt32 -> nDateTime -> TimeStamp. Seconds from Jan 1,1904 loc +al computer time at site. The timestamp for CSQ files represents the +start time of the data (nCsaKind = 1). The timestamp for CSS and CSA +files is the center time of the data (nCsaKind = 2). # * SInt32 -> nV1Extent -> Header Bytes extension (Version 4 is + +62 Bytes Till Data) # # Version 2: # * SInt16 -> nCsKind -> Type of CrossSpectra Data. 1 is self spe +ctra for all used channels, followed by cross spectra. Timestamp is s +tart time of data. 2 is self spectra for all used channels, followed +by cross spectra, followed by quality data. Timestamp is center time +of data. # * SInt32 -> nV2Extent -> Header Bytes extension (Version 4 is + +56 Bytes Till Data) # # Version 3: # * Char4 -> nSiteCodeName -> Four character site code 'site' # * SInt32 -> nV3Extent -> Header Bytes extension (Version 4 is + +48 Bytes Till Data) # # Version 4: # * SInt32 -> nCoverageMinutes -> Coverage Time in minutes for th +e data. &#699;CSQ' is normally 5minutes (4.5 rounded). 'CSS' is norma +lly 15minutes average. 'CSA' is normally 60minutes average. # * SInt32 -> bDeletedSource -> Was the &#699;CSQ' deleted by CSP +ro after reading. # * SInt32 -> bOverrideSourceInfo -> If not zero, CSPro used its +own preferences to override the source &#699;CSQ&#700; spectra sweep +settings. # * Float -> fStartFreqMHz -> Transmit Start Freq in MHz # * Float -> fRepFreqHz -> Transmit Sweep Rate in Hz # * Float -> fBandwidthKHz -> Transmit Sweep bandwidth in kHz # * SInt32 -> bSweepUp -> Transmit Sweep Freq direction is up if +non zero, else down. NOTE: CenterFreq is fStartFreqMHz + fBandwidthKH +z/2 * -2^(bSweepUp==0) # * SInt32 -> nDopplerCells -> Number of Doppler Cells (nominally + 512) # * SInt32 -> nRangeCells -> Number of RangeCells (nominally 32 f +or &#699;CSQ', 31 for 'CSS' & 'CSA') # * SInt32 -> nFirstRangeCell -> Index of First Range Cell in dat +a from zero at the receiver. &#699;CSQ' files nominally use zero. 'CS +S' or 'CSA' files nominally use one because CSPro cuts off the first +range cell as meaningless. # * Float -> fRangeCellDistKm -> Distance between range cells in + kilometers. # * SInt32 -> nV4Extent -> Header Bytes extension (Version 4 is + +0 Bytes Till Data) # # Version 5: # * SInt32 -> nOutputInterval -> The Output Interval in Minutes. # * Char4 -> nCreatorTypeCode -> The creator application type co +de. # * Char4 -> nCreatorVersion -> The creator application version. # * SInt32 -> nActiveChannels -> Number of active antennas # * SInt32 -> nSpectraChannels -> Number antenna used in cross sp +ectra # * UInt32 -> nActiveChannelBits -> Bit indicator of which antenn +as are in use msb is ant#1 to lsb #32 # * SInt32 -> nV5Extent -> Header Bytes extension (Version 5 is + +0 Bytes Till Data) If zero then cross spectra data follows, but if th +is file were version 6 or greater then the nV5Extent would tell you h +ow many more bytes the version 6 and greater uses until the data. # # DATA: # The data section is a multi-dimensional array of self and cross s +pectra data. # Repeat For 1 to nRangeCells: # * Float[nDopplerCells] Antenna1 voltage squared amplitude self +spectra. # * Float[nDopplerCells] Antenna2 voltage squared amplitude self +spectra. # * Float[nDopplerCells] Antenna3 voltage squared amplitude self +spectra. # (Warning: Some Antenna3 amplitude values may be negative to i +ndicate noise or interference at those doppler bins. These negative v +alues should be absoluted before use.) # * Complex[nDopplerCells] Antenna 1 to Antenna 2 cross spectra. # * Complex[nDopplerCells] Antenna 1 to Antenna 3 cross spectra. # * Complex[nDopplerCells] Antenna 2 to Antenna 3 cross spectra. # if nCsaKind is 2 then also read or skip # * Float[nDopplerCells] Quality array from zero to one in value. # End Repeat # # Note: To convert self spectra to dBm use: # 10*log10(abs(voltagesquared)) - (-40. + 5.8) # The -40. is conversion loss in the receiver and +5.8 is proces +sing computational gain. # ############################################## # # Author: # Revision: # Date: # use strict; use PDL; # for now hard-wire file, but evenutally this should be either be pass +ed in with ARGV or Getopt::Long my $file = qw( /users/dpath2o/downloads/CSS_SBRD_10_05_01_0045.cs4 ); # various headers depending on which version of software my @hdrnames1 = qw( version timestamp v1_bytes ); my @hdrnames2 = qw( version timestamp v1_bytes cs_type v2_bytes ); my @hdrnames3 = qw( version timestamp v1_bytes cs_type v2_bytes sitena +me v3_bytes ); my @hdrnames4 = qw( version timestamp v1_bytes cs_type v2_bytes sitena +me v3_bytes t_coverage csq_delete csq_pref f_start f_sweep f_bandwidt +h f_up n_Doppler n_range range_i delta_range v4_bytes ); my @hdrnames5 = qw( version timestamp v1_bytes cs_typev 2_bytes sitena +me v3_bytes t_coverage csq_delete csq_pref f_start f_sweep f_bandwidt +h f_up n_Doppler n_range range_i delta_range v4_bytes t_outpu +t creator creator_version n_active_chs n_channels bit_indicator v5_bytes ); # open the file open(my $fh , "< :raw" , $file ) or die "could not open $file: $!\n"; # define buffer my $buffer; # read in the file version and then make a decision on which header ar +ray (above) to use read($fh,$buffer,8) or die "could not read $file: $!\n"; print "characters read to buffer from 8 bit read call: $buffer\n"; my $vers = unpack("s+",$buffer) or die "could not unpack $buffer: $!\n +"; print "characters unpacked (this should define version -- i.e. be a si +gned 16-bit integer): $vers\n"; # condition-based header read/unpack to hash variable %header # read/unpack data to piddles (matrices) # print to text file close($fh) or die "could not close $fh: $!\n";


In reply to binary to piddle by dpath2o

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.