Re: AoA data merging

You have a clear and fairly simple (though not trivial) criterion that applies to a portion of a very large and complex data structure. One approach would be to design a relational table schema to store the full data structure; this could have one table with a record for each "road chunk", another table for each X,Y point referenced by a road chunk, and perhaps a third table that relates the chunk records to the point records. By structuring the data this way, it will be easier to identify and manipulate just the information that you need to solve the joining problem (in fact, you probably want yet another table to hold the "chunk-chain" findings).

In fact, just breaking the problem down into separate data structures (tables) this way might clarify what the algorithm needs to do, whether or not you actually end up using an RDBMS.

It may be sufficient just to have the set of X,Y points as a unified data structure, with info on which road-chunk(s) each point belongs to; joining the road chunks is now just a matter of determining, for each point, how many road chunks contain it, and whether two (or more?) chunks have a given point as a terminus.

In other words, you started with an array of chunks, with each chunk containing an array of points -- try making a different structure the other way around: an array of points, and each one cites one or more road-chunks that it is a part of. This would be easiest if the point array is actually a hash, keyed by the X,Y coordinates.

my %pointdata;
my @chunkends;
for ( my $i=0; $i<@data; $i++ ) # road-chunks
{
  for ( my $j=0; $j<@{$data[$i]; $j++ ) # points in a chunk
  {
    my $key = join(",",$data[$i][$j]->{X},$data[$i][$j]->{Y});
    push @{$pointdata{$key}}, sprintf("%5.5d:%5.5d",$i,$j);
    $chunkends[$i] = $key if ( $j == 0 );
  }
  $chunkends[$i] .= "-$key";
}
# Now the %pointdata array contains all the information
# about junctures between chunks (these not necessarily 
# all end-point junctures -- perhaps some road chunks could
# intersect at the mid-points?)  Also, the @chunkends array
# cites the %pointdata keys for the endpoints of each chunk;
# if you only want to find end-point junctures, identify the
# %pointdata keys with multi-element entries and grep for
# each one among the strings in @chunkends.
[download]

None of that is tested, but I hope it might be helpful.

Comment on Re: AoA data merging Download Code

Replies are listed 'Best First'.
Re: Re: AoA data merging by jasonk (Parson) on Mar 11, 2003 at 14:37 UTC
In other words, you started with an array of chunks, with each chunk containing an array of points -- try making a different structure the other way around: an array of points, and each one cites one or more road-chunks that it is a part of. This would be easiest if the point array is actually a hash, keyed by the X,Y coordinates. Although this is a good idea, the problem with this approach is that order matters, if the points of the arrayrefs get rearranged it can change the results. This does suggest an alternate approach that I will have to play with though, if I were to put just the ends of each chunk into a simpler data structure, I could order them using only two points per chunk, then it would be pretty easy to identify which chunks those end-points are associated with, it's likely that would eliminate some of the problems I've been having with my earlier approach. Update: In a node I hadn't read yet, someone else had already thought of this approach, and posted code. I love perlmonks! Thanks for all the help guys, I have several things to try out now based on all your suggestions.	[reply]

Replies are listed 'Best First'.

Re: Re: AoA data merging
by jasonk (Parson) on Mar 11, 2003 at 14:37 UTC

Although this is a good idea, the problem with this approach is that order matters, if the points of the arrayrefs get rearranged it can change the results. This does suggest an alternate approach that I will have to play with though, if I were to put just the ends of each chunk into a simpler data structure, I could order them using only two points per chunk, then it would be pretty easy to identify which chunks those end-points are associated with, it's likely that would eliminate some of the problems I've been having with my earlier approach.

Update: In a node I hadn't read yet, someone else had already thought of this approach, and posted code. I love perlmonks! Thanks for all the help guys, I have several things to try out now based on all your suggestions.

[reply]