Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl: the Markov chain saw
 
PerlMonks  

comment on

( [id://3333]=superdoc: print w/replies, xml ) Need Help??

So I have been working on reverse engineering a file in hex that some genius thought it would be a great idea to write in both little endian and big endian. Anyway I came across what I believe to be a bug in perl but I wanted to make sure and so far I have my coworkers in league with me.

So I have these files that contain three sorts of records. There is the initial header that occurs only once, contains a totaly of 4096 bytes and is comprised of four byte floats and four byte int32s.

The second record is comprised of 36 bytes, also made up of four byte floats and four byte int32s. It appears to be precursor data to the third record type. It also has a variable that is used to state how many instances of the third record there should be. At which point another instance of this second record can follow as well as another set of third type records

The third record is comprised of a total of 28 bytes and contains(in order) five four-byte floats, one four-byte int32 and two two-byte shorts. The number of instances for this record is stored in record 2.

Now I have all of my unpacking/packing, swapping big to little endian, templates etc etc taken care of. My reading of a float for example is stored as a subroutine...

sub grabFloat { my $FourBytes = 4; my $floatTemp = 'f'; read(IN, my $record, $FourBytes); $record = reverse $record; my $value = unpack($floatTemp, $record); return $value; }

My code for grabbing my short is ...

sub grabInt16 { my $TwoBytes = 2; my $Int16 = 's'; read(IN, my $record, $TwoBytes); $record = reverse $record; my $value = unpack($Int16, $record); return $value; }

Int32s are pretty much the same as float except they are different endian and don't need to be reversed.

So here is the potential bug. I am reading thousands upon thousands of these records with out error. Then for some reason perl decides to skip a single byte. So let's look at record three.

Let's say I am reading the 305th record of the third type, so I should have five floats, one int32 and two int16s. Let's also say that we are starting with offset 0x1000

So what should we expect?

@ 0x1000 we should read 4 bytes for the first float

@ 0x1004 we should read 4 bytes for the Second float

@ 0x1008 we should read 4 bytes for the third float

@ 0x1012 we should read 4 bytes for the fourth float

@ 0x1016 we should read 4 bytes for the fifth float

@ 0x1020 we should read 4 bytes for the only int32

@ 0x1024 we should read 4 bytes for the first int16

@ 0x1026 we should read 4 bytes for the second int16

However, this is not what happens! and I am losing my mind. Here is what goes down...

@ 0x1000 we should read 4 bytes for the first float

@ 0x1004 we should read 4 bytes for the Second float

@ 0x1008 we should read 4 bytes for the third float

@ 0x1012 we should read 4 bytes for the fourth float

Now my next(final) float should be stored between bytes 0x1016 and 0x1019. However, what happens is that byte 0x1016 is discarded/skipped. So the float is now read between 0x1017 and 0x1020!!!!!!! So now everything from this point forward is shifted a byte. As you could see from the code above, I only read an even number of bytes, 2 or 4. If I was off by two I would believe that I made a mistake somewhere and read an extra int16 somewhere, but it is only a single byte! Now I have tried this script on multiple versions of these files with the exact same behavior every time. It occurs at different places for each file, but is at a consistent location for each individual file.

All the files I am working with are classified military files so I can't share. So I hope I was descriptive enough to point someone in the right direction.

I have verified this behavior many times and in many ways and running up to this bug I can print out what it looks like and this is essentially what I get...

print: 1.0 2.0 3.0 4.0 5.0 25 1 0

print: 1.1 2.2 3.3 4.15 5.35 26 2 0

print: 1.2 2.4 3.6 4.25 5.53 25 2 0

print: 1.3 2.6 3.9 4.0 2.58e-044 -7923652397.....


In reply to Missing byte using unpack, pack, read(in terms of bytes) by joemaniaci

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post; it's "PerlMonks-approved HTML":



  • Are you posting in the right place? Check out Where do I post X? to know for sure.
  • Posts may use any of the Perl Monks Approved HTML tags. Currently these include the following:
    <code> <a> <b> <big> <blockquote> <br /> <dd> <dl> <dt> <em> <font> <h1> <h2> <h3> <h4> <h5> <h6> <hr /> <i> <li> <nbsp> <ol> <p> <small> <strike> <strong> <sub> <sup> <table> <td> <th> <tr> <tt> <u> <ul>
  • Snippets of code should be wrapped in <code> tags not <pre> tags. In fact, <pre> tags should generally be avoided. If they must be used, extreme care should be taken to ensure that their contents do not have long lines (<70 chars), in order to prevent horizontal scrolling (and possible janitor intervention).
  • Want more info? How to link or How to display code and escape characters are good places to start.
Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others having an uproarious good time at the Monastery: (4)
As of 2024-04-26 00:21 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found