comment on

Now that you know how to control the buffering behavior on an output file, you might want to look for ways to speed up your code in general.

How big are these two files? If just one of them is not too big to fit into memory, load all the data from that one file into an array, then go through the other file line by line and compute all the distances relative to the array elements.

If neither file will fit entirely in memory at one time, you could still speed things up a lot by reading a large number of records from the first file into an array, then for each line of the second file, compute all the distances relative to the current array; then read another chunk of file1 into the array and repeat. The point is to reduce the number of times you have to open and read the contents of the second file. Something like this:

while ( !eof FIRST ) {
    my $i = 0;
    my ( @first_lats, @first_lons );
    while ( !eof FIRST and $i < 10000 ) {
        $_ = <FIRST>;
        if ( $csv->parse( $_ )) {
            my ( $lat, lon ) = $csv->fields;
            $first_lats[$i] = $lat;
            $first_lons[$i] = $lon;
            $i++;
        } else {
            # report csv error
        }
    }
    check_distances( $i, \@first_lats, \@first_lons );
}

sub check_distances
{
    my ( $n, $flat, $flon ) = @_;

    open SECOND, "second.file" or die $!;
    while (<SECOND>) {
       if ($csv->parse($_)) {
           my ( $slat, $slon ) = $csv->fields;
           for ( my $i=0; $i<$n; $i++ ) {
               # check distance from $slat,$slon to $$flat[$i],$$flon[
+$i]
           }
       }
    }
}
[download]

(update: fixed problems in array declaration and array indexing -- still not tested, of course)

Another thing that will probably speed it up is to figure out what latitude difference is greater than 1000 feet; if any two points differ by more than that amount in latitude, you can skip the more complicated lat-lon distance computation.

(Doing the same for longitude is a little trickier, because you have to know in advance the highest value for latitude that you'll ever see in the data, and figure out the longitude distance that equals 1000 feet at that latitude. But if you can do that, it will save some run time.)

Finally, if your data files are reliably simple -- just two comma-separated numeric values per line -- you might save a lot of run time by just using "split" or regex matching instead of Text::CSV_XS. I'm not completely sure of that, but if this job is running for hours or days, it would be worth a benchmark test to find out, if you haven't done that already.

In reply to Re: open - Unbuffered Write??? by graff
in thread open - Unbuffered Write??? by awohld

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.