cpan Data structure

michaelg has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks

This one is a bit problematic. I need to read a file into a data structure that enables quick sorting, searching etc. The data look like this:

AAAAAAA
p 1 6
1105720 2453315
1106720 2453315
1106720 2453950
1106715 2453950
1106715 2454525
1105720 2454525
p 2 6
1105720 2456125
1106720 2456125
1106720 2456760
1106715 2456760
1106715 2457335
1105720 2457335
p 3 6
1105720 2458940
1106720 2458940
1106720 2459575
1106715 2459575
1106715 2460150
1105720 2460150
BBBBBBB
p 1 6
1105720 2461750
1106720 2461750
1106720 2462385
1106715 2462385
1106715 2462960
1105720 2462960
p 2 6
1112965 2466125
1113330 2466125
1113330 2466120
1114175 2466120
1114175 2467120
1112965 2467120
p 3 6
1118210 2453315
1119210 2453315
1119210 2454525
1118215 2454525
1118215 2454170
1118210 2454170
[download]

where AAA and BBB are first level keys, and the next are vertexes (x,y) of a polygon

At the end I need to compare the polygons from 2 files.
I thought to use hash to input the first level keys and then every set of polygons will be saved in a B tree structure.
Does any one of you knows a better way of doing that? If so, is there any fit module at the CPAN?

Edited by BazB fixed code tag, tidy/fixed formatting..and added readmore.

Comment on cpan Data structure Download Code

Replies are listed 'Best First'.
Re: cpan Data sturcture by Abigail-II (Bishop) on Jan 27, 2004 at 17:38 UTC
What an appropriate datastructure is entirely depends on what you want to store, and what kind of queries you want to perform. What do you want to store? Polygons? Points? Coordinates? Labels? And what kind of queries do you want to perform? Stabbing queries with the polygons? Exact matches on vertices? Line intersection queries? Range queries? Abigail	[reply]
Re: Re: cpan Data sturcture by Fletch (Bishop) on Jan 27, 2004 at 18:10 UTC
See also mjd's excelent Why Questions go Unanswered.	[reply]
Re: Re: cpan Data sturcture by Anonymous Monk on Jan 27, 2004 at 18:54 UTC
Hi Well I need to compare polyg ons to polygons , at one stage If the polygons does not match I consider it an error , o in data insertion and O(1) for comparison. Now if I need to check boolean operations between polygon , well here I have a problem .	[reply]
Re: cpan Data structure by punkish (Priest) on Jan 27, 2004 at 19:06 UTC
You need to tell us more about the context of your query... here are a few questions you might want to preempt -- what are these polygons? (I am assuming geographic as from a GIS or as created by a drawing program such as Omnigraffle or Illustrator, etc.) how are you getting these values? what do you mean by "compare the polygons from 2 files"? compare for what? congruency? similarity? If the above are simply x,y coords, the polys will be different if even one coord pair mismatches -- heck, just loop over the two data files and compare line by line -- after all, each coord is on its own line. Please provide more context, and I sure you will find help here. Also, you might want to retitle your post -- it is as much about comparing files as it is about asking for appropriate modules on CPAN. Which answer are you looking for? Good luck.	[reply]
Re: cpan Data structure by Vautrin (Hermit) on Jan 27, 2004 at 19:01 UTC
You know you might want to grab a copy of MySQL and store all the data in a set of tables using the DBI. The reason for this are simple: Databases have lots of functions you can use without having to code your own custom data structures. This saves lots of time. Most databases are optimized in ways you could never do in your life time even if you wrote your program in straight assembler. The reason for this is there are a lot of very smart people who spend their time figuring out how to make things quicker and working on databases. For instance, I couldn't code MySQL if I had an entire lifetime. Databases are very quick. I have a database with a hundred thousand records and I can get very specialized SELECTs done in a few hundreths of a second. Databases cache information for you. So if you perform the same calculation twice you don't need to worry about it.	[reply]
Re: cpan Data structure by Abigail-II (Bishop) on Jan 27, 2004 at 22:45 UTC
I'm a bit amazed by this answer. It seems like you think that MySQL is a magic wand of some kind. The OP is quite unclear what kind of queries he wants to perform - and until he clears that up, I don't see how a relational database is going to help you here. Let's adress your reasoning. Databases have lots of functions you can use without having to code your own custom data structures. This saves lots of time. Actually, relational databases have lots of functions that deal with tabular data. Polygons are not not tabular data - it's geometrical data. Most databases are optimized in ways you could never do in your life time even if you wrote your program in straight assembler. The reason for this is there are a lot of very smart people who spend their time figuring out how to make things quicker and working on databases. For instance, I couldn't code MySQL if I had an entire lifetime. Most databases are optimized, yes. But they are optimized to deal with transactions, and to perform queries in a relational structure. Things that don't have much to do with geometrical data. Databases are very quick. I have a database with a hundred thousand records and I can get very specialized SELECTs done in a few hundreths of a second. Actually, database aren't quick. Databases spend a lot of resources in being consistent, robust and atomic. They can find records reasonable quickly if they can make use of simple indices. I don't see what kind of indices you can use. Databases cache information for you. So if you perform the same calculation twice you don't need to worry about it. Yes, but it's not at all clear whether the OP's problem makes this relevant. Perhaps I'm seeing this all wrong, and you have a brilliant set of tables, indices are relations in mind. In that case, I'd like to see the schema. Abigail	[reply]