rhzhang has asked for the wisdom of the Perl Monks concerning the following question:
Perl newbie here trying to make maps. I've written a script to extract the US census block information from the TIGER Line Shapefiles (ESRI format) into a plain tab-delimited spreadsheet. I'm on macOS Sierra. Specifically, I used Jason Kohles' Geo::ShapeFile module to read the shapefile, and copy each Block ID and its associated polygon (coordinates) into a two-column, tab-delimited text file.
Here is the code:
#!/usr/bin/perl use strict; use warnings; use Math::Round; use Geo::ShapeFile; use Geo::ShapeFile::Shape; use Geo::ShapeFile::Point; # Declare variables my $blockid; my $bcount; my $totalblocks; my $progress; my $shapefile; # used for loading the TIGER shapefile my %attr; # used to read the dbf file my $polygon; # used to read the shp file my @points; # used to read the shape into an array of points my $coor; # used to read the coordinates of each point my $x; my $y; my $pcount; # counter for the coordinates being written into output my $totalpoints; # total number of coordinates in each polygon # Variables for time keeping my $start = time; my $duration; # Prepare the TIGER spreadsheet. open (TIGER, ">tiger.txt") or die "Failed to open"; print TIGER "BLOCKID\tPolygon\n"; # make a header # Load the TIGER shapefile $shapefile = 0; $shapefile = Geo::ShapeFile->new ("tabblock2010_42_pophu"); # use the +TIGER census block special release (population & housing units only) $totalblocks = $shapefile->shapes(); for (1 .. $totalblocks) { %attr = $shapefile->get_dbf_record($_); $blockid = $attr{BLOCKID10}; $bcount++; print TIGER "$blockid\t\"geometry\":{\"type\":\"Polygon\",\"coordi +nates\":[["; # The polygon column is in the GEOJSON format $polygon = $shapefile->get_shp_record($_); @points = $polygon->points(); $pcount = 0; $totalpoints = scalar (@points); foreach $coor (@points) { $x = $coor->get_x(); $y = $coor->get_y(); print TIGER "[$x,$y]"; $pcount++; if ($pcount != $totalpoints) { # All but the last coordinate s +hould be followed by a comma print TIGER ","; } } print TIGER "]]}\n"; # finish the entry print "Done $bcount out of $totalblocks.\n"; %attr = (); $blockid = 0; $polygon = 0; @points = (); $totalpoints = 0; } close (TIGER); $shapefile = 0; select()->autoflush(1); # Show stopwatch $duration = time - $start; $duration = round ($duration / 60); print "\n\nThis script took $duration minutes\n\n"; exit;
This has been working well for smaller states like Connecticut. But when I run this for a large shapefile, the program accumulates a huge amount of memory, and cannot go to the end of the script. I have a piece at the end of the script that prints a stopwatch (execution time). For example, Pennsylvania has 421,545 census blocks, the script will take about 7 minutes to run, and it will display:
... Done 421543 out of 421545. Done 421544 out of 421545. Done 421545 out of 421545.
And it will get stuck there, never finishing and never displaying the stopwatch. And in my Activity Monitor will say perl is taking 7 GB of memory. But my output file is perfectly fine. Geo::ShapeFile doesn't have a close file command, so I suspect there's some problem with the ShapeFile remaining open and taking up all of the memory.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Geo::ShapeFile memory problem
by huck (Prior) on Apr 17, 2017 at 00:22 UTC | |
by rhzhang (Initiate) on Apr 18, 2017 at 00:58 UTC | |
|
Re: Geo::ShapeFile memory problem
by swl (Prior) on Apr 17, 2017 at 05:00 UTC | |
by swl (Prior) on Apr 23, 2017 at 00:36 UTC | |
by huck (Prior) on Apr 18, 2017 at 02:23 UTC | |
by swl (Prior) on Apr 23, 2017 at 00:32 UTC | |
by huck (Prior) on Apr 23, 2017 at 03:57 UTC | |
by pmqs (Friar) on Apr 23, 2017 at 19:33 UTC | |
| |
by haukex (Archbishop) on Apr 23, 2017 at 18:34 UTC | |
by rhzhang (Initiate) on Apr 18, 2017 at 01:03 UTC | |
|
Re: Geo::ShapeFile memory problem
by 1nickt (Canon) on Apr 16, 2017 at 21:22 UTC | |
|
Re: Geo::ShapeFile memory problem
by Anonymous Monk on Apr 16, 2017 at 21:57 UTC | |
|
Re: Geo::ShapeFile memory problem
by Anonymous Monk on Apr 16, 2017 at 22:35 UTC | |
by swl (Prior) on Apr 16, 2017 at 23:41 UTC |