Line intersection, scaled to thousands of points

kiz has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Line intersection, scaled to thousands of points by Masem (Monsignor) on Jul 18, 2001 at 15:21 UTC
I'm looking at Sedgewich's Algorithms in C++, where he discusses the general problem of line intersection (which is what you have here). The solution he presents is rather detailed, but the gist is that you want to sort on the y coordinate, and add the line segments (with some identification) to a binary tree when a line segment starts (based on the y position), and remove it when it ends. The tree should be sorted on the 'rightness' of a line segment to another (that is, the entire line being to the right of the other line). Note that 'rightness' is not transitive; if line C is right if line A, and line C is right of line B, line B need not be right of line A (it may intersection). So the key step when this tree is built and taken apart is that when a node is removed, and a new node is moved into place to balance the tree somewhere, you must explicit test the connection between any new node connections, as this is where you'll find the intersections of two lines; if 'rightness' or 'leftness' of a parent and a child node after a valid reconstruction can't be determined, then the two lines must intersect. A intersection-less set of lines will result in an empty tree when all is said and done. I strongly refer to that book or any other good book on computer algorithms for more details. The method should be O(Nlog N), both for the intitial point sorting, and for the building/traversing of the tree. Note that you could also just compare each and every line, but that's O(N^2). I don't think you can cut this down any further while some additional huristics that are based on human observation. Dr. Michael K. Neylon - mneylon-pm@masemware.com \|\| "You've left the lens cap of your mind on again, Pinky" - The Brain*	[reply]
Re: Re: Line intersection, scaled to thousands of points by Anonymous Monk on Jul 18, 2001 at 20:35 UTC
For graphics algorithms, check the Graphic Gems series of books. Many are raster-rendering, but vector-polygon processing is covered too. The old standby was the CALGO Collected Algorithms of the ACM. I used to have that on Microfiche subscription, but I think it's online now.) -- Bill	[reply]
Re: Re: Line intersection, scaled to thousands of points by John M. Dlugosz (Monsignor) on Jul 18, 2001 at 19:33 UTC
I like that way. Upon reading the question, I realized that the normal edge-sorting technique used in rendering does tell when lines cross, but it only samples the shape at the rastors. It makes sence, though, that you can compare intervals instead.	[reply]
Re: Line intersection, scaled to thousands of points by mattr (Curate) on Jul 18, 2001 at 16:44 UTC
Even these calculations are going to take a while if you are comparing every line segment to every other one. The binary tree is a very cool way to do it, and this kind of thing is done quite a bit in 3D (Z-Sorting and use of trees, for example quad trees which will break down the rendering problem by dividing cubes of space into recursively smaller areas to cover). But maybe there is a quicker way. I think you might be able to simplify this problem by preprocessing your data set. You shouldn't be trying to do some 200 million intersection tests. A (human generated) map is usually made of lots of relatively small line segments, mainly because it has to convey a good amount of information while remaining readable. I would start by giving every line segment an id number and put them in a database (or flat file, hash is fine). You should be able to quickly find what length line segment is most popular and what range of lengths around that covers say 80% of all the lines. Also, if you search starting with the longer ones, it should save you time, that is if long segments intersect more lines than do short segments. This characterization would give you a guide to creating a grid of cells across the map, so that you can sort your segments by spatial distribution, in effect reducing the number of pairs. It is possible that you could define some bounding boxes of your own (say around cities?) which would do this efficiently for many of the longer line segments as well. You can probably also do away with the bounding box calls and that epsilon stuff. You can also mark each segment in the database which has already been found to intersecting some other line, in effect taking that line segment out of circulation. One thing to note is your definition of intersection. I think the algorithm below will find two lines which touch at their endpoints to be intersecting lines, but your example of the lake seems to indicate that such a pair of line segments should not be said to intersect. In that case you can eliminate a lot of computation by doing such a test first. There is another approach which might be interesting. You could select a minimum resolution which would be needed to represent your map reasonably well, and provide a two dimensional array to represent this canvas. By drawing all the segments on this map with a routine which draws differently upon intersection, you can "download" the final answer by simply reading off the canvas at the end. Possibly you could do something with C-based drawing functions and various paint modes (and GIMP might be cool for that indeed, using an additive drawing mode) or collision detection functions (say in OpenGL), but I'm thinking here of just writing a line segment rendering function in Perl which "draws" to a data structure. The neat part is that each "pixel", being an array element, could contain a list of the id numbers of all lines which crossed that pixel, which would provide a very rich output. You only have to execute the drawing code 14000 times, which sounds pretty good compared to 200 million of anything, but I'm not sure which approach is faster. Perhaps the most interesting thing is that you can run it at arbitrary resolutions, so you could for example run it once at a low resolution, then divide the canvas into an arbitrary number of cells and then perhaps use mathematical approach afterwards, or perhaps skip making cells and just use math to check the questionable pairs. The pixel rendering approach is more interesting if you can grasp the entire image visually in a gestalt, for example by using colored filters. You might like to check out Gimp even just for fun since it is quite Perl-compatible. Lastly I think there is one more approach to consider, if this is a one-shot job. If you just draw all the lines and open up the image in Gimp or Photoshop, you can use a combination of using the magic wand, paint bucket, and zoom tool to try solving this by hand. You can also assign each line a unique RGB color combination and read its id with an eye-dropper (or Perl call if using Gimp). Finding the exact point of intersection is too much work if all we care about is whether two lines intersect at all. The intersection, if any, can be found by examining the signs of the two cross products (p2 - p0) × (p1 - p0) and (p3 - p0)× (p1 - p0). The line_intersect() subroutine returns a simple true or false value indicating whether two lines intersect: # line_intersect( $x0, $y0, $x1, $y1, $x2, $y2, $x3, $y3 ) # Returns true if the two lines defined by these points intersect. # In borderline cases, it relies on epsilon to decide. sub line_intersect { my ( $x0, $y0, $x1, $y1, $x2, $y2, $x3, $y3 ) = @_; my @box_a = bounding_box( 2, $x0, $y0, $x1, $y1 ); my @box_b = bounding_box( 2, $x2, $y2, $x3, $y3 ); # If even the bounding boxes do not intersect, give up right now. return 0 unless bounding_box_intersect( 2, @box_a, @box_b ); # If the signs of the two determinants (absolute values or lengths # of the cross products, actually) are different, the lines # intersect. my $dx10 = $x1 - $x0; my $dy10 = $y1 - $y0; my $det_a = determinant( $x2 - $x0, $y2 - $y0, $dx10, $dy10 ); my $det_b = determinant( $x3 - $x0, $y3 - $y0, $dx10, $dy10 ); return 1 if $det_a < 0 and $det_b > 0 or $det_a > 0 and $det_b < 0; if ( abs( $det_a ) < epsilon ) { if ( abs( $det_b ) < epsilon ) { # Both cross products are "zero". return 1; } elsif (abs($x3-$x2)<epsilon and abs($y3-$y2)<epsilon){ # The other cross product is "zero" and # the other vector (from (x2,y2) to (x3,y3)) # is also "zero". return 1; } } elsif ( abs( $det_b < epsilon ) ) { # The other cross product is "zero" and # the other vector is also "zero". return 1 if abs( $dx10 ) < epsilon and abs( $dy10 ) < epsilon; } return 0; # Default is no intersection. } # determinant( $x0, $y0, $x1, $y1 ) [from p. 432] # Computes the determinant given the four elements of a matrix # as arguments. # sub determinant { $_[0] * $_[3] - $_[1] * $_[2] } [download]	[reply] [d/l]
Re: Line intersection, scaled to thousands of points by grinder (Bishop) on Jul 18, 2001 at 15:10 UTC
Fascinating question, but I don't have any hard answers. I assume you're looking at the simpler `line_intersect` function on page 439 (1st ed). Have you run Devel::DProf on a semi-reasonable sample to see where the time is being burnt? Glancing at the code, I would assume that `bounding_box` and `determinant` would be good candidats for memoization, which will get you the on the RAM side of the space/time trade-off. You do have a lot of RAM, right? -- `g r i n d e r`	[reply]
Re: Line intersection, scaled to thousands of points by jmcnamara (Monsignor) on Jul 18, 2001 at 15:31 UTC
The chapter in question from "Mastering Algorithms with Perl", is available for download from the O'Reilly site. John. --	[reply]
PDL (Re: Line intersection, scaled to thousands of points) by Ea (Chaplain) on Jul 18, 2001 at 18:54 UTC
Here's another turtle for the fire: PDL (``Perl Data Language'') gives standard Perl the ability to compactly store and speedily manipulate the large N-dimensional data arrays which are the bread and butter of scientific computing. The gubbins that do the manipulation are written in C for a speed boost, which might help if you can't get around your scaling problems.	[reply]
Re: Line intersection, scaled to thousands of points by spudzeppelin (Pilgrim) on Jul 18, 2001 at 21:57 UTC
Well, the first part of this problem to consider, that makes it much more difficult, is that you are actually talking about the intersection of line segments, not lines themselves. This defeats the elementary proposition that two lines in a plane WILL intersect somewhere if their slopes are inequal. However, there is a "brute-force" approach that will eliminate most of the possible pairs to consider in a pairwise approach: instead of looking at the segments pairwise to determine whether they intersect, look at the bounding rectangles of the segments pairwise to see if they overlap. Then, run the pairwise segment comparison only on this smaller subset. For the purpose of this proposition, suppose that in some orthogonal, two-dimensional coordinate system (can you tell I really was a geometer at one point in my life?), the endpoints of a line segment are described by (a,b) and (p,q). Then the bounding rectangle of the line segment {(a,b),(p,q)} is the rectangle described by the point-set {(a,b),(a,q),(p,q),(p,b)}. It is a trivial exercise to demonstrate that if two line segments intersect, their bounding rectangles overlap; what we are actually interested in then is the contrapositive of that result: if their bounding rectangles do not overlap, two line segments do not intersect. Spud Zeppelin * spud@spudzeppelin.com	[reply]
Re: Line intersection, scaled to thousands of points by Zaxo (Archbishop) on Jul 18, 2001 at 17:18 UTC
If your shoreline data already knows which point is next down the beach, this becomes a problem in graph topology - grouping connected pieces of the shoreline "network". Pick a point and walk the list till you see the same point again. Pick an unvisited point, either an island or a neighboring lake, and repeat. This algorithm is linear in the number of points. Chapter 8 of the wolf book covers this, but is geared to more complex graphs. After Compline, Zaxo	[reply]
Re: Line intersection, scaled to thousands of points by ambrus (Abbot) on Feb 09, 2009 at 17:21 UTC
Chapter 33.2 of the Cormen–Leiserson–Rivest–Stein: Introduction to Algorithms book gives a good algorithm for solving this.	[reply]

g r i n d e r

`g r i n d e r`