http://qs1969.pair.com?node_id=246967

hacker has asked for the wisdom of the Perl Monks concerning the following question:

I've been working with some C code that crunches the 2000 US Census data into CSV files, based on the specified proximity to the origin zipcode. The problem is that the C code is horribly slow, and I can't seem to figure out why. It takes my PIII/1.3Ghz/512mb RAM machine about 20 minutes to crunch the 987k input data file for zipcodes matching within a 0-25 radius of the givin origin zipcode. That seems very slow.

The master 2000 Census data file contains records in this format:

ZIP_CODE ONGITUD ATITUD 00210 71.0132 43.00589 00211 71.0132 43.00589 00212 71.0132 43.00589 00213 71.0132 43.00589 00214 71.0132 43.00589 00215 71.0132 43.00589 ...

My output file, separate for each type of range (0-25.txt for zipcodes within 0-25 miles of the origin, 0-50.txt for zipcodes within 0-50 miles of the origin, etc.), contains entries such as:

00210,00210 00210,00211 00211,00210 00210,00212 00212,00210 00210,00213 ...

For each given zipcode found in the master file (where origin == 00210 in this case, to start with), I want to output a file that contains all matching zipcodes within the specified proximity to that zipcode. So in the example above, all of the zipcodes within 0-25 miles of 00210 would be output to 0-25.txt, a csv file containing the data shown above.

I have the working radii functions which do this, and does work (but is very slow), and looks like:

#define EARTH_RADIUS 3956 static inline float deg_to_rad(float deg) { return (deg * M_PI / 180.0); } /* Function to calculate Great Circle distance between two points. */ static inline float great_circle_distance(float lat1, float long1, float lat2, float long2) { float delta_long, delta_lat, temp, distance; /* Find the deltas */ delta_lat = lat2 - lat1; delta_long = long2 - long1; /* Find the GC distance */ temp = pow(sin(delta_lat / 2.0), 2) + cos(lat1) * cos(lat2) * pow(sin(delta_long / 2.0), 2); distance = EARTH_RADIUS * 2 * atan2(sqrt(temp), sqrt(1 - temp)); return (distance); }

In perl, this would be:

my $distance = sqrt(($x1-$x2)**2+($y1-$y2)**2);

My goal is to convert this over to perl, both so I can gain the speed and efficiency of perl (as well as make this portable to Windows systems, where the current C code doesn't quite run yet), as well as expand my knowledge of perl in general.

Has anyone done this? Any pointers that might be useful here?