michaelg has asked for the wisdom of the Perl Monks concerning the following question:

Hi monks

This one is a bit problematic. I need to read a file into a data structure that enables quick sorting, searching etc. The data look like this:

AAAAAAA p 1 6 1105720 2453315 1106720 2453315 1106720 2453950 1106715 2453950 1106715 2454525 1105720 2454525 p 2 6 1105720 2456125 1106720 2456125 1106720 2456760 1106715 2456760 1106715 2457335 1105720 2457335 p 3 6 1105720 2458940 1106720 2458940 1106720 2459575 1106715 2459575 1106715 2460150 1105720 2460150 BBBBBBB p 1 6 1105720 2461750 1106720 2461750 1106720 2462385 1106715 2462385 1106715 2462960 1105720 2462960 p 2 6 1112965 2466125 1113330 2466125 1113330 2466120 1114175 2466120 1114175 2467120 1112965 2467120 p 3 6 1118210 2453315 1119210 2453315 1119210 2454525 1118215 2454525 1118215 2454170 1118210 2454170
where AAA and BBB are first level keys, and the next are vertexes (x,y) of a polygon

At the end I need to compare the polygons from 2 files.
I thought to use hash to input the first level keys and then every set of polygons will be saved in a B tree structure.
Does any one of you knows a better way of doing that? If so, is there any fit module at the CPAN?

Edited by BazB fixed code tag, tidy/fixed formatting..and added readmore.

Replies are listed 'Best First'.
Re: cpan Data sturcture
by Abigail-II (Bishop) on Jan 27, 2004 at 17:38 UTC
    What an appropriate datastructure is entirely depends on what you want to store, and what kind of queries you want to perform. What do you want to store? Polygons? Points? Coordinates? Labels? And what kind of queries do you want to perform? Stabbing queries with the polygons? Exact matches on vertices? Line intersection queries? Range queries?

    Abigail

      Hi

      Well I need to compare polyg ons to polygons , at one stage If the polygons does not match I consider it an error , o in data insertion and O(1) for comparison.


      Now if I need to check boolean operations between polygon , well here I have a problem .

Re: cpan Data structure
by punkish (Priest) on Jan 27, 2004 at 19:06 UTC
    You need to tell us more about the context of your query... here are a few questions you might want to preempt --

    • what are these polygons? (I am assuming geographic as from a GIS or as created by a drawing program such as Omnigraffle or Illustrator, etc.)
    • how are you getting these values?
    • what do you mean by "compare the polygons from 2 files"? compare for what? congruency? similarity?
    If the above are simply x,y coords, the polys will be different if even one coord pair mismatches -- heck, just loop over the two data files and compare line by line -- after all, each coord is on its own line.

    Please provide more context, and I sure you will find help here. Also, you might want to retitle your post -- it is as much about comparing files as it is about asking for appropriate modules on CPAN. Which answer are you looking for?

    Good luck.

Re: cpan Data structure
by Vautrin (Hermit) on Jan 27, 2004 at 19:01 UTC

    You know you might want to grab a copy of MySQL and store all the data in a set of tables using the DBI. The reason for this are simple:

    1. Databases have lots of functions you can use without having to code your own custom data structures. This saves lots of time.
    2. Most databases are optimized in ways you could never do in your life time even if you wrote your program in straight assembler. The reason for this is there are a lot of very smart people who spend their time figuring out how to make things quicker and working on databases. For instance, I couldn't code MySQL if I had an entire lifetime.
    3. Databases are very quick. I have a database with a hundred thousand records and I can get very specialized SELECTs done in a few hundreths of a second.
    4. Databases cache information for you. So if you perform the same calculation twice you don't need to worry about it.
      I'm a bit amazed by this answer. It seems like you think that MySQL is a magic wand of some kind. The OP is quite unclear what kind of queries he wants to perform - and until he clears that up, I don't see how a relational database is going to help you here.

      Let's adress your reasoning.

      Databases have lots of functions you can use without having to code your own custom data structures. This saves lots of time.
      Actually, relational databases have lots of functions that deal with tabular data. Polygons are not not tabular data - it's geometrical data.
      Most databases are optimized in ways you could never do in your life time even if you wrote your program in straight assembler. The reason for this is there are a lot of very smart people who spend their time figuring out how to make things quicker and working on databases. For instance, I couldn't code MySQL if I had an entire lifetime.
      Most databases are optimized, yes. But they are optimized to deal with transactions, and to perform queries in a relational structure. Things that don't have much to do with geometrical data.
      Databases are very quick. I have a database with a hundred thousand records and I can get very specialized SELECTs done in a few hundreths of a second.
      Actually, database aren't quick. Databases spend a lot of resources in being consistent, robust and atomic. They can find records reasonable quickly if they can make use of simple indices. I don't see what kind of indices you can use.
      Databases cache information for you. So if you perform the same calculation twice you don't need to worry about it.
      Yes, but it's not at all clear whether the OP's problem makes this relevant.

      Perhaps I'm seeing this all wrong, and you have a brilliant set of tables, indices are relations in mind. In that case, I'd like to see the schema.

      Abigail