in reply to Need an intelligent join algorithm for matching coordinates to shapefiles (.shp)

Not being familiar with Census blocks, would it be fair to say they are a rectangular? Are they space filling and non-overlapping? Are they all the same size or of varying size? The regularity of the problem has direct impacts on how clever you can be with grid organization.

At the least, I note that Geo::ShapeFile->new does file access, and as such it's likely a good idea to preload all shape objects rather that loading them at compare time - how much diskspace is devoted to your shape files?

  • Comment on Re: Need an intelligent join algorithm for matching coordinates to shapefiles (.shp)
  • Download Code

Replies are listed 'Best First'.
Re^2: Need an intelligent join algorithm for matching coordinates to shapefiles (.shp)
by whakka (Hermit) on Dec 17, 2008 at 20:01 UTC
    Census blocks are non-overlapping and are rectangular in shape and regular in size only in urban areas, otherwise (most of the US) they're irregular and of varying size. By "space-filling" I assume you mean points are contained within unique blocks, which is true.

    Pre-loading them is unfortunately not an option as collectively they're about 6GB although I'm sure there are smart ways of doing so, thanks.

      Space filling means that there is no portion of the map which does not have a block associated with it (no gaps). In Geo/ShapeFile, it says it only loads data as it needs it, so it's possible that only some fraction of the 6GB need be loaded for your problem - might be worth running a test script to check.

      Knowing how G-men think, the QuadTree approach Joost suggests is definitely worth trying - IIRC you essentially build a tree of hierarchical rectangles. You can then use this minimal information (which should certainly be less than 6GB) to compare instead of reloading point data in your loop.

      I think you've got that backwards. They should be rectangular in rural areas, like the underlying townships. I know they're certainly irregular in this urban area.

      UPDATE: Doh, sorry, I was thinking of tracts. Blocks are broken by natural boundaries, so you'd have to be in the desert to be square. OTOH, none of the divisions of census data in Boston are rectilinear.

      --
      In Bob We Trust, All Others Bring Data.