in reply to Re: Geo::ShapeFile memory problem
in thread Geo::ShapeFile memory problem

Being able to read right from one of these zip files without needing to unzip it to disk first would be a treat, but i realize it maybe a little too much to ask for.

In that kind of mode requiring me to read it sequentially would be no problem, i'd expect that kind of processing to be kinda common ie

my $dir0='h:/active/tiger_data'; my $sfips='42'; # pa # make $base->{$blockid10}{color} via # zipbyline reads a zip member sequentially # as in http://search.cpan.org/~phred/Archive-Zip-1.59/lib/Archive/Z +ip.pm#Low-level_member_data_reading # my $sn=$fips2state->{$sfips}.'2010.sf1'; # my $zf=$dir0.'/sf1/'.$sn.'.zip'; # my $mf=$fips2state->{$fips}.'geo2010.sf1'; ; # my $member=zipbyline_start($zf,$mf); # while (my $line=zipbyline_read($member)){ # ... pull out datums AREALAND AREAWATR POP100, create density # } # line # zipbyline_close($member); # sort by density, total POP100, # break into deciles, # assign a decile color to each $base->{$blockid10}{color} my $dir=$dir0.'/shapes'; my $state='tabblock2010_'.$sfips.'_pophu'; my $shapefn=$dir.'/'.$state.'/'.$state; my $imgfn=$dir0.'/gifs/'.$state.'.gif'; # this points to the unzipped dir now, # be nice to just point to $dir.'/'.$state.'.zip' instead my $sf = Geo::ShapeFile->new ($shapefn); $sf->caching(shp => 0); $sf->caching(dbf => 0); $sf->caching(shx => 0); $sf->caching(shapes_in_area => 0); my $x_min=$sf->x_min(); my $x_max=$sf->x_max(); my $y_max=90-$sf->y_min(); # need to invert 90 is top 0 is bot my $y_min=90-$sf->y_max(); my $totalblocks = $sf->shapes(); # $totalblocks=5000; my $xsize=$x_max-$x_min; my $ysize=$y_max-$y_min; my $imgy=5000; my $yscale=$imgy/$ysize; my $pfx=-0.00923452628555483*($sf->y_min)+ 1.15467278754118; # proje +ction factor my $imgx=$yscale*$xsize*$pfx; my $xscale=$imgx/($xsize); sub xproj { return (($_[0])-($x_min))*$xscale; } sub yproj { return ((90-$_[0])-$y_min)*$yscale; } # create a new image my $im = new GD::Image($imgx+1,$imgy+1); for my $si (1 .. $totalblocks) { my %attr = $sf->get_dbf_record($si); my $blockid10 = $attr{BLOCKID10}; my $color=$base->{$blockid10}{color}; unless ($color) {$color=$yellow;} my $polygon = $sf->get_shp_record($si); for my $pi (1 .. $polygon->num_parts) { my $part = $polygon->get_part($pi); my $poly = new GD::Polygon; for my $hash (@$part){ $poly->addPt(xproj($hash->{X}),yproj($hash->{Y})); } my $first=$part->[0]; my $last =$part->[-1]; if ($first->{X} ne $last->{X} || $first->{Y} ne $last->{Y} ) { + $poly->addPt(xproj($first->{X}),yproj($first->{Y})); } $im->filledPolygon($poly,$color); } # pi } # si outlines (); open (my $img,'>',$imgfn); binmode $img;print $img $im->gif;close $i +mg; exit; sub outlines { my $state='tl_2010_'.$sfips.'_county10'; my $shapefn=$dir.'/'.$state.'/'.$state; # this too points at an unzipped dir, # be nice to point at $dir.'/'.$state.'.zip' instead my $sf = Geo::ShapeFile->new ($shapefn); $sf->caching(shp => 0); $sf->caching(dbf => 0); $sf->caching(shx => 0); $sf->caching(shapes_in_area => 0); my $totalblocks = $sf->shapes(); for my $si (1 .. $totalblocks) { my $polygon = $sf->get_shp_record($si); for my $pi (1 .. $polygon->num_parts) { my $part = $polygon->get_part($pi); my $poly = new GD::Polygon; for my $hash (@$part){ $poly->addPt(xproj($hash->{X}),yproj($hash->{Y})); } my $first=$part->[0]; my $last =$part->[-1]; if ($first->{X} ne $last->{X} || $first->{Y} ne $last->{Y} ) { + $poly->addPt(xproj($first->{X}),yproj($first->{Y})); } $im->openPolygon($poly,$black); } # pi } # si }
PA output at this sendspace link while it lasts
MO output at this sendspace link while it lasts

Replies are listed 'Best First'.
Re^3: Geo::ShapeFile memory problem
by swl (Prior) on Apr 23, 2017 at 00:32 UTC

    Thanks Huck,

    Direct extraction from a zip file is potentially useful, but in my experience not a common use case. (Although perhaps that's because there are no tools to do so...).

    Maybe there is a module that supports reading from archives as a file handle? Archive::Zip has readFromFileHandle() but that would need special handling in Geo::Shapefile each time data are accessed from file.

    Also, one thing to watch for in any plotting code is holes in the polygons. I don't think the Tiger data have holes, but in the shapefile spec they are implied by vertex order instead of being explicitly flagged. https://www.esri.com/library/whitepapers/pdfs/shapefile.pdf

    Shawn.

      Also, one thing to watch for in any plotting code is holes in the polygons. I don't think the Tiger data have holes

      Ooooo they do.. found that pdf about 9 hours before you mentioned it. Im dealing with them now, at a huge cost, maybe more later.

      Ill live with unzipped dirs for now, and look closer into "seek"ing on a zip file and into the format of a raw shp file. to see if my sequential reader would work for sequential access.

      My cheap zip sequencer

      package cheap::zipbyline; use strict; use warnings; use Exporter; use Archive::Zip qw( :ERROR_CODES :CONSTANTS ); our @ISA= qw( Exporter ); # these CAN be exported. our @EXPORT_OK = qw( zipbyline_start zipbyline_read zipbyline_close ); # these are exported by default. our @EXPORT = qw( ); my %zbl; sub zipbyline_start { my $zf=shift; my $mf=shift; my $zip = Archive::Zip->new(); unless ( $zip->read( $zf ) == AZ_OK ) { die 'read error';} my ( $member, $status, $bufferRef ); $member = $zip->memberNamed( $mf ); $member->desiredCompressionMethod( COMPRESSION_STORED ); $status = $member->rewindData(); die "error $status" unless $status == AZ_OK; $zbl{$member}=''; return $member; } # zbl start sub zipbyline_read { my $member=shift; my ( $status, $bufferRef ); my $nl=index($zbl{$member},"\n"); while ( ( $nl == -1) && ! $member->readIsDone() ) { ( $bufferRef, $status ) = $member->readChunk(1000); die "error $status" if $status != AZ_OK && $status != AZ_STREAM_END; # do something with $bufferRef: $zbl{$member}.=$$bufferRef; $nl=index($zbl{$member},"\n"); } # while if ($nl == -1 ) {$zbl{$member}=undef; return $zbl{$member};} my $line=substr($zbl{$member},0,$nl+1); $zbl{$member}=substr($zbl{$member},$nl+1); return $line; } # zbl sub zipbyline_close { my $member=shift; delete $zbl{$member}; } # zbl close
      This will probably get improvements so i can read 2 files out of the same zip at the same time without doing two my $zip=new ... $zip->read($zf) sets, havent needed it yet.

        As someone has already mentioned IO::Uncompress::Unzip has a filehandle interface that hides all the complexity of reading directly from a zip file. Looking at the post from few days ago, the commented block below showed reading from the zip file
        # my $sn=$fips2state->{$sfips}.'2010.sf1'; # my $zf=$dir0.'/sf1/'.$sn.'.zip'; # my $mf=$fips2state->{$fips}.'geo2010.sf1'; ; # my $member=zipbyline_start($zf,$mf); # while (my $line=zipbyline_read($member)){ # ... pull out datums AREALAND AREAWATR POP100, create density # } # line # zipbyline_close($member);

        That would become this with IO::Uncompress::Unzip

        use IO::Uncompress::Unzip; my $sn=$fips2state->{$sfips}.'2010.sf1'; my $zf=$dir0.'/sf1/'.$sn.'.zip'; my $mf=$fips2state->{$fips}.'geo2010.sf1'; ; my $member = IO::Uncompress::Unzip($zf, Name => $mf); while (<$member>) { #... pull out datums AREALAND AREAWATR POP100, create density } close $member;
      Maybe there is a module that supports reading from archives as a file handle?

      I believe the core module IO::Uncompress::Unzip does this, and its objects can be used like filehandles, so I think it'd be fairly transparent.