in reply to Millions of line segment intersection calcs: Looking for speed tips

I think that you are checking twice as many edges as you need to.

In your reversed edge test you are checking for {$p2}{$p1}

unless ( ($p1 == $p2) || $found->{$p2}->{$p1} ) {

But you are also setting the found edges with the pairings reversed.

$found->{$p2}->{$p1} = 1;

Which means that you are never detecting a reversed pair. Changing the setting code to

$found->{$p1}->{$p2} = 1;

means that when that pair comes up reversed, your test (as is) will detect them.

That step alone cuts the number of edges in half (as you intended).

You can save a little more there by using a hash instead of a hash ref, and concatenating the pairs so you use a single level hash, rather than a hash of zillions of little hashes. Ie.

my %found; ## instead of my $found = {}; ... unless ( ($p1 == $p2) || $found{ "$p2|$p1" } ) { ... $found->{ "$p1|$p2" } = 1;

Less indirection and less for the garbage collector to do.

Similar saving can be gained by using arrays rather than array references elsewhere, particularly in the building and passing and unwrapping of the paramaters to the SegmentIntersection() sub. You use many levels of indirection build an array ref to an array of arrays, then pass the array ref in to the sub where you have then to unwrap all the nesting and copy them to local vars. Removing several layers of indirection gained several percent performance--and made the code clearer.

Finally, little more can be gained by avoiding copying the parameters in the Determinant() sub. Also using integer math there, assuming that all your coordinates are less than 65536, this may gain a little.

<I am getting different results, but I beleive this is because you were producing reversed duplicates.

Assuming you agree that these results are correct, then the the following code runs in around 1/5th the time of your original.

#!/usr/bin/perl use YAML qw(Load); use Data::Dumper; use Carp; use Benchmark; use strict; my $precision = 7; my $delta = 10 ** (-$precision); my $points; # hash ref holding the raw data {'1' => [X,Y], '2' => [X,Y +], ...} { local $/; open(my $fh, "dump_301.dat") or die "Unable to open file: $!\n"; $points = Load( <$fh> ); croak($@) if $@; close($fh); } my $start_pairs = Benchmark->new; my $edges = find_unique_edges($points); print "Unique edges ", scalar(@{$edges}), "\n"; print "Pairs parsed in :", timestr(timediff(Benchmark->new, $start_pai +rs)), "\n"; my $start_intersects = Benchmark->new; my $neighborEdges = find_unintersected_edges($edges); print "Unintersected edges ", scalar(keys %{$neighborEdges}), "\n"; print "Intersects parsed in :", timestr(timediff(Benchmark->new, $star +t_intersects)), "\n"; sub find_unique_edges { my $MAP = shift; my @edges; # returnable data structure my $start_distance = Benchmark->new; my %found; # p1 is starting or anchor point of the line segment foreach my $p1 ( @{ $points } ) { # p2 is end point of the line segment foreach my $p2 ( @{ $points } ) { # We don't need to caculate if anchor and end are the same # or we have already seen these two pairs [reversed] befor +e unless( $p1 == $p2 or $found{ "$p2|$p1" } ) { my $edge = sqrt( ( $p1->[0] - $p2->[0] ) **2 + ( $p1->[1] - $p2->[1] ) **2 ); push @edges, [ $edge, $p1->[0], $p1->[1], $p2->[0], $p +2->[1] ]; $found{ "$p1|$p2" } = 1; } } } return \@edges; } sub find_unintersected_edges { my $PAIRS = shift; my $neighborEdges; # returnable data structure my $neighboorCNT = 1; foreach my $aref ( sort {$a->[0] <=> $b->[0] } @{$PAIRS}) { my $neighbor = 1; my @line_1_ref; push @line_1_ref, [ $aref->[ 1 ],$aref->[ 2 ] ]; # line 1 poin +t A X,Y push @line_1_ref, [ $aref->[ 3 ],$aref->[ 4 ] ]; # line 1 poin +t B X,Y for( my $n=1; $n <= $neighboorCNT; $n++ ) { if( $neighborEdges->{ $n } ) { my @points; push @points, @line_1_ref; # line 1 points push @points, [ $neighborEdges->{ $n }[ 1 ], $neighbor +Edges->{ $n }[ 2 ] ]; # line 2 point A X,Y push @points, [ $neighborEdges->{ $n }[ 3 ], $neighbor +Edges->{ $n }[ 4 ] ]; # line 2 point B X,Y # If a intersect is found, set false and quit checking if( SegmentIntersection( @points ) ) { $neighbor = 0; last; } } } if( $neighbor ) { $neighborEdges->{ $neighboorCNT } = $aref; $neighboorCNT++; } } return $neighborEdges ; } sub SegmentIntersection { my @p1 = @{ $_[0] }; # p1,p2 = segment 1 my @p2 = @{ $_[1] }; my @p3 = @{ $_[2] }; # p3,p4 = segment 2 my @p4 = @{ $_[3] }; my $d = Determinant( $p2[0] - $p1[0], $p3[0] - $p4[0], $p2[1] - $ +p1[1], $p3[1] - $p4[1] ); if( abs( $d ) < $delta ) { return 0; # parallel } my $n1 = Determinant( $p3[0] - $p1[0], $p3[0] - $p4[0], $p3[1] - $ +p1[1], $p3[1] - $p4[1] ); my $n2 = Determinant( $p2[0] - $p1[0], $p3[0] - $p1[0], $p2[1] - $ +p1[1], $p3[1] - $p1[1] ); if( !( $n1/$d < 1 and $n2/$d < 1 and $n1/$d > 0 and $n2/$d > 0 ) ) + { return 0; } return 1; } sub Determinant { use integer; return $_[0] * $_[1] - $_[2] * $_[3]; } __END__ P:\test>480481.pl "my" variable $edges masks earlier declaration in same scope at P:\tes +t\480481.pl line 25. "my" variable $neighborEdges masks earlier declaration in same scope a +t P:\test\480481.pl line 30. Unique edges 90300 Pairs parsed in : 1 wallclock secs ( 1.72 usr + 0.02 sys = 1.73 CPU) Unintersected edges 1778 Intersects parsed in :786 wallclock secs (763.19 usr + 0.55 sys = 763 +.74 CPU) __second__ P:\test>480481.pl Unique edges 90300 Pairs parsed in : 2 wallclock secs ( 1.77 usr + 0.02 sys = 1.78 CPU) Unintersected edges 3299 Intersects parsed in :729 wallclock secs (711.84 usr + 0.86 sys = 712 +.70 CPU) __third__ P:\test>480481.pl Unique edges 45150 Pairs parsed in : 1 wallclock secs ( 1.28 usr + 0.03 sys = 1.31 CPU) Unintersected edges 1699 Intersects parsed in :223 wallclock secs (217.66 usr + 0.22 sys = 217 +.87 CPU) __fourth__ P:\test>480481.pl Unique edges 45150 Pairs parsed in : 2 wallclock secs ( 1.27 usr + 0.03 sys = 1.30 CPU) Unintersected edges 1699 Intersects parsed in :191 wallclock secs (186.78 usr + 0.22 sys = 187 +.00 CPU) __fifth__ P:\test>480481.pl Unique edges 45150 Pairs parsed in : 1 wallclock secs ( 1.25 usr + 0.05 sys = 1.30 CPU) Unintersected edges 1699 Intersects parsed in :191 wallclock secs (186.95 usr + 0.11 sys = 187 +.06 CPU) __sixth__ P:\test>480481.pl Unique edges 45150 Pairs parsed in : 2 wallclock secs ( 1.30 usr + 0.03 sys = 1.33 CPU) Unintersected edges 1699 Intersects parsed in :177 wallclock secs (173.06 usr + 0.23 sys = 173 +.30 CPU)

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
The "good enough" maybe good enough for the now, and perfection maybe unobtainable, but that should not preclude us from striving for perfection, when time, circumstance or desire allow.

Replies are listed 'Best First'.
Re^2: Millions of line segment intersection calcs: Looking for speed tips (80% saved)
by tmoertel (Chaplain) on Aug 03, 2005 at 19:46 UTC
    I think that you are checking twice as many edges as you need to.

    I think you are right. My DT-based implementation computed 889 output edges for the for the dump_301 data. That's exactly half of the OP's expected 1778. Perhaps the OP's output comprised two equivalent sets of edges differing only in direction.

Re^2: Millions of line segment intersection calcs: Looking for speed tips (80% saved)
by drewhead (Beadle) on Aug 04, 2005 at 02:17 UTC
    That step alone cuts the number of edges in half (as you intended).
    Gotta love it when you have the right idea and botch it's implementation. :) Thanks, you are exactly correct. That drops things down into the 3 minute range for the example.

    My praticle upper bounds at the moment is 3000. I did some test on a 2001 (8 trillion + calcs) coord map with the old code and after 56 hours I hand't returned ... doh! Lets see if this will return now.

    Next I'm going to work on the suggestions extremely made and see what I can get this down to.