Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Geo Package files

by Bod (Parson)
on Mar 03, 2022 at 13:45 UTC ( [id://11141799]=perlquestion: print w/replies, xml ) Need Help??

Bod has asked for the wisdom of the Perl Monks concerning the following question:

I suspect this is a bit of a longshot...but...

Has anyone had any dealings with reading a Geo Package .gpkg file using Perl. I cannot find anything in CPAN that might help.

A .gpkg file is a SQLite3 DB file. Of course, Perl can handle this. I have used DBD::SQLite to access the Geo Package. The file I am trying to use only has one layer so it only has one row in the gpkg_contents table. This defines the content I need as being in the openUSRN table.

I've used this code to peek at this table....

use DBD::SQLite; use Data::Dumper; use strict; use warnings; my $dbh = DBI->connect("dbi:SQLite:uri=file:osopenusrn_202203.gpkg?mod +e=rwc"); my $tab = $dbh->prepare("SELECT * FROM openUSRN"); $tab->execute; while (my $n = $tab->fetchrow_hashref) { print Dumper $n; }

There are four fields in the table. Three are human-readable text. However, the one I am most interested in, geometry, is gobbledegook. A quick look at the specification seems to show that it is not going to be trivial to work out how to decode this from scratch. So I am hopeful that someone has trodden this path before to give me a head start or even a solution.

I suspect that the geometry is actually a linestring. I am only interested in getting the co-ordinates of the start and the end of the linestring.

Here is a sample of the output...

$VAR1 = { 'geometry' => 'GP ♣4l ►Îú░ü~▬A ]m&# +9827;↓⌂▬A└¯╔├☼$→A&#94 +92;╚v■[$→A ☺Ý♥ ☻ + ☺Û♥ ☻ ]m♣↓⌂▬A└& +#9562;v■[$→A ░¡Ï▀¤~▬A└¯&#9 +556;├☼$→A ☺Û♥ ☻ ► +Îú░ü~▬A`©▲E+$→A ░¡Ï▀¤~&#96 +44;A└¯╔├☼$→A ', 'usrn' => 2900534, 'id' => 1497, 'street_type' => 'Designated Street Name' };

Replies are listed 'Best First'.
Re: Geo Package files
by soonix (Canon) on Mar 03, 2022 at 15:33 UTC

      Thanks soonix - that has got me much further :)

      But - I am not understanding how to map the GeoPackage spec to Perl's unpack. I've read perlpacktut although I cannot say I understand much of it. So I've made a start based on your information and after a lot of pondering I think I understand why C is the second and third parts of the template

      I don't understand why we start with n
      The spec says "byte2 magic = ox4750" - so it's 2 bytes or 16 bits. Why is it n and not v or even s?

      Spec 2 says it is an "8-bit unsigned integer" so C fits the bill.

      I'm guessing we know 3 is unsigned from looking at the bits laid out in the spec - is that about right?

      Now I get completely lost...
      There is an int32 between 3 and 4 so that's N or V - but how can we tell? I've tried unpacking with those templates options and get 879493120 and 27700 respectively. I cannot tell from those values which is right.

      Next we have 4 double[] envelope
      As spec 3 returns 00000101 (5), the envelope consists of 6 values 2: envelope is [minx, maxx, miny, maxy, minz, maxz], 48 bytes so I am using d6 in the template. The returned values don't seem to make sense!

      Next is spec 5, the GeoPackageBinaryHeader - I'm took an educated guess this is a but the result didn't seem right so I've used N as it seems to be another flag byte.

      Finally we have the geometry which I expect to be text of unknown length. So I tried a100 and A100 but they take me back to the gobbledegook and I cannot work out which one I might need to use.

      This is what I have tried...

      use DBD::SQLite; use Data::Dumper; use strict; use warnings; my $dbh = DBI->connect("dbi:SQLite:uri=file:osopenusrn_202203.gpkg?mod +e=rwc"); my $tab = $dbh->prepare("SELECT * FROM openUSRN"); $tab->execute; my $n = $tab->fetchrow_hashref; my $geo = $n->{'geometry'}; my @test = unpack "nCCVd6C100", $geo; foreach my $t(@test) { print "$t\n"; }

      Am I heading in the right direction here or am I going down a blind alley with my reasoning behind the mapping from the specs to the unpack template?
      Is there a simpler way to understand this mapping and work out the correct template?

        n versus v is determined by the byte order. In the https://libgeos.org/specifications/wkb/ specification mentioned by swl, there is a section on Byte Order which mentions a flag stating the order used, and http://www.geopackage.org/spec120/#gpb_format says this flag is the lowest bit in the "flags" field (second C in the unpack template), so this determines whether to use big-endian (n and N) or big-endian (v and V). Luckily, this flag is a single byte, so that it itself isn't affected by byte-order ;-)

        Looking further at the flag byte, I see the the next three bits tell you how many doubles (probably f format, from Perl 5.10 upwards you can distinguish between big-endian "f>" and little-endian "f<") are in the double[] envelope array at the end of the GeoPackageBinaryHeader.

        Sorry I'm not really into this, so I can't dive very deep into it. Depending on what information comes up in the discussion, it might trigger me to go looking deeper, but don't bet on it :-)
        I don't understand exactly what your binary BLOB (Binary Large OBject) contains. A simple "print $geo" doesn't mean much to me. I would start by making a hex dump of $geo. Something along the lines of: join ' ', unpack '(H2)*', $geo; ... I would put either 10 or 16 bytes per line in the dump. It could be that some option on Data::Dump would work?

        You should wind up with something like: 47 50 XX YY... 0x47 means "G" and 0x50 means "P". The next byte looks like a version number - probably doesn't mean much to you. The very next flag byte means a lot (emphasis added) and we need to know what that is in order to understand what "double[] envelope;" means.

        My code for dealing with binary headers like this usually has a substr() to select an range of bytes then it applies the appropriate unpack template upon that subset of bytes. I suspect that some relatively straightforward, special purpose code will be able to decode your specific blobs. Decoding this binary geo format in a general sense appears to be a "non-trivial" task. Your code will probably wind up depending upon the flag byte being a particular value - your code being specific to decoding segments with that particular set of flags.

        The code to decode the bulk of the BLOB depends critically upon knowing what the flag byte is. Lets see the first 16 bytes of this $geo blob in hex dump format...

        Also don't forget the ord() function. print "Got G!\n" if ( ord(substr($geo,0,1)) == 0x47 ); I think will work.

        I have no idea of how many BLOB's you need to decode or what the performance implications are. I would try to get something functionally working and then worry about performance tweaks later. It could very well be that such tweaks are not necessary.

Re: Geo Package files
by swl (Parson) on Mar 04, 2022 at 00:30 UTC

    You can use Geo::GDAL::FFI to read the contents of a geopackage (untested example code below). The geometry field is stored as Well Known Binary (WKB), or perhaps a variant of it. This can be converted to Well Known Text (WKT) to be human readable.

    A definition of WKB is given in the libgeos docs at https://libgeos.org/specifications/wkb, and WKT at https://libgeos.org/specifications/wkt/.

    The GDAL stack can be a beast to install if the aliens have to compile everything from source (GDAL, Proj, GEOS, libtiff, libsqlite3, optionally also spatialite, freexl and curl). If you are on a unix type machine then install the gdal-dev package using your system package manager, then the GDAL aliens will run system installs.

    Example code:

    # adapted from the Geo::GDAL::FFI docs, untested my $layer_name = 'some_layer'; my $db = Open('test.gpkg'); my $layer = $dataset->GetLayer($layer_name); $layer->ResetReading; while (my $feature = $layer->GetNextFeature) { my $value = $feature->GetField('name'); my $geom = $feature->GetGeomField; say $value, ' ', $geom->AsText; }

    Update: More details of the geopackage geometry format are at http://www.geopackage.org/spec131/index.html#gpb_format. This should be of use if you decide to write your own parser to extract the first and last coordinates of each linestring.

      Alas I am on Windows and the way to install GDAL seems to be via Anaconda.

      If I can suss out the correct template for unpack then I shall do it that way. But if that proves impossible, or at least too tricky for me, then I'll come back to GDAL.

      The WKB link is certainly helpful, thanks.

        You could have a look at the GIS Internals stable releases. They provide precompiled binaries for windows with a broad range of dependencies (including almost all those I listed in my previous post). If the GIS Internals paths can be found by the various gdal-stack aliens then they will run system installs.

        Edit: Be sure to also get the download flagged as "compiled libraries and headers"

        Another edit: I just tested with the GIS Internals files and the alienfile probe for geos does not work (the probe currently uses geos-config, which is not included).

        Rather than try and get things to work on the inferior OS you might find it worth the effort (to say nothing of being less headaches) to install a docker host and see if you can't get a dedicated environment running a decent real OS instead. With that available then you could install the more off-the-shelf tool chain into it more easily (theoretically . . .). Instead of trying to jump through all these hoops getting this (as you say elsewhere in the thread) occasional procedure running "native" you can just spin up a container to process the results and dump the output out as you need it into a shared volume that you then work with from your "normal" platform.

        Edit: Looks like there's an osgeo/gdal available which you probably could build upon.

        The cake is a lie.
        The cake is a lie.
        The cake is a lie.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://11141799]
Front-paged by Corion
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others admiring the Monastery: (4)
As of 2024-03-29 00:31 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found