in reply to Re^6: Geo Package files
in thread Geo Package files

Dealing with floats is messy. Among those issues is what the heck does a 64 bit float mean?

We know that this format is in "little endian". Is your processor also "little endian", e.g. an Intel processor? If it is then maybe we don't have to write completely platform independent code?

I think also relevant to this would be: Is your Perl version 32 or 64 bit? I am running 64 bit Perl on a 64 bit Windows platform. And I have a 64 bit GCC complier. On my 64 bit machine, in C, a simple float is 64 bits. I don't know for sure, but I would suspect that 64 bit Perl's representation of a simple float is also 64 bits? Added: I see trouble if you are using 32 bit Perl.

I don't know where the 6 comes from in 'nCCV(V)6'? Each float is 8 bytes, not 6 bytes.

From what I understand so far,
index 8, length 8 bytes => minx
index 16.length 8 bytes => maxx
index 24,length 8 bytes => miny
index 32,length 8 bytes => maxy
index 40,length 8 bytes => minz
index 48.length 8 bytes => maxz

so,  my $minx = unpack ("d8", substr($geo,8,8)); may work??
I am not sure if f8 would work also?

Update: BTW, what is length ($geo)?

Also, again, I request the binary data dump of $geo, NOT what you think that binary unpacks as.
Dump should be 2 hex digits per byte, with a space between bytes. 0 should be 00 so that columns line up nicely. 16 bytes per line would be fine.

Replies are listed 'Best First'.
Re^8: Geo Package files
by Bod (Parson) on Mar 11, 2022 at 20:20 UTC
    I don't know where the 6 comes from in 'nCCV(V)6'? Each float is 8 bytes, not 6 bytes.

    There are 6 values to extract: minx, maxx, miny, maxy, minz, maxx

    so, my $minx = unpack ("d8", substr($geo,8,8)); may work??
    I am not sure if f8 would work also?

    Both d8 and f8 give fractions that seem nonsensical in this situation (because of the SRSID we are using). f8 gives a negative fraction which is doubly nonsensical.

    However, it could be that the decoded data is right and I am not understanding it correctly. My plan is to ignore the envelope for the moment and press on to the geometry data. That way I can feed the USRN into a USRN finder (the link is for the first entry in the SQLite DB) and sanity check the values against a known entity instead of trying to work out the envelope first.

Re^8: Geo Package files
by Bod (Parson) on Mar 11, 2022 at 22:43 UTC
      I am also very happy that you got this figured out! I very seldom work with floats.
      In my investigations, I found this:
      use Config; # This should print 8 if on a 64 bit machine... print $Config{nvsize}, " float size in bytes\n";
      Might consider checking that as a sanity check that you are on 64 bit machine. I have no idea how to get Perl to do 64 bit floats on a 32 bit machine - but there is probably a way.
Re^8: Geo Package files
by Bod (Parson) on Mar 11, 2022 at 19:52 UTC
    Is your processor also "little endian"

    Yes!
    We use entirely refurbished Dell PCs built on Intel processors. This one is:
    Intel(R) Core(TM) i5-4590 CPU @ 3.30GHz 3.30 GHz

    I would suspect that 64 bit Perl's representation of a simple float is also 64 bits? Added: I see trouble if you are using 32 bit Perl.

    It hadn't occurred to me that the Perl build or the processor would affect this decoding but that does make sense. I am running Strawberry Perl:
    This is perl 5, version 32, subversion 1 (v5.32.1) built for MSWin32-x64-multi-thread

    So, that looks to me like we should be OK there.

      Rather than hardcoding assumptions or conclusions about what the machine is now, why not future-proof it? All you have to do is add the < to the unpack string to force the d (double) to be little-endian, no matter what architecture you happen to be running on. So my $minx = unpack ("d<", substr($geo,8,8)); will force it to be little-endian interpretation. (Or better, since your code already has access to that bit in the fourth byte, use logic based on that bit to determine whether your envelope's unpacks use "d<" or "d>" , so that you can handle input files that made either choice for the envelope:
      my $unpack_string = ($flag&0x1 == 0x1) ? "d<" : "d>"; ... my $minx = unpack($unpack_string, substr($geo,8,8));

      But given that your CPU is little endian, that is what your unpack from Re^4: Geo Package files already showed a value of: 637590.385 is the correct value based on the 8 bytes 50 b8 1e c5 2c 75 23 41 : that is little-endian-double for 637590.385 . The problem is that you don't believe that number is correct. But since you have not apparently tried sanity-checking your interpretation with another tool, like swl suggested in Re^9: Geo Package files, you don't know if your belief is founded in reality or not. Maybe the "min" and "max" in the bounding box might not match the "min" and "max" of the values of the coordinates as stored; the portion of the spec I read wasn't clear on what the min and max were interpreted as. Or maybe the data set you were given just mis-used "min" vs "max". A sanity check in another tool is really going to help you out.

      But based on the spec you have shared, and the 16 bytes you have shared, I am confident that 637590.385 is the valid interpretation of those bytes.

      (I had wanted to say most of this last week, but I kept only thinking about it when I was in a crunched-for-time or on-my-phone-browser-so-read-only situation.... and then with the week-long pause, I forgot to come back.)

        Rather than hardcoding assumptions or conclusions about what the machine is now, why not future-proof it?

        Yes - that is certainly the plan.

        It seems to me that if I need this functionality, someone else will as well. Or, at the very least, would benefit from a head-start in developing code for the next part of dealing with a GeoPackage. So some of the code will probably form a module for CPAN.

        since you have not apparently tried sanity-checking your interpretation with another tool

        I haven't yet as I have been away on training much of this week and haven't had the opportunity! I also feel that an external tool won't be very helpful for checking te envelope data. It will be the next step once I've decoded some fo the actual geometry data.