Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
I have a file of items with their x,y coordinates.
e.g.
id,x,y
thingy1,70459.318061,95537.539359
.
.
.
thingy79036,70459.318061,95537.539359
I have a file with the lower left and upper right x,y coordinate pairs of a series of grids.
e.g.
map,grid,subgrid,x0,y0,x1,y1
044,15,2,061866,133466,062933,134399
.
.
.
107,14,4,105600,090533,106666,091466
I need to determine in which map-grid-subgrid is thingy*. Please note that there are ~100,000 thingies to match with a file of ~32,000 map-grid-subgrids.
Advice or redirection to the proper node would be appreciated.
|
|---|
| Replies are listed 'Best First'. | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
|
Re: Search file of coordinates
by BrowserUk (Patriarch) on Oct 17, 2002 at 02:47 UTC | ||||||||||||||||
Interesting problem:^), could do with a little more sample data to go on though. Anyway, the method I would use is to set up a 2-dimensional array of hashes of the grid data. To do this you need to map the grid x's and y's to a zero-based index (to conserve space). Unfortunately the data you supplied doesn't give me much clue on the spacing of the grid. It appears to be (approximately) 1066.66... in the x dimension and 933.33... in the y dimension. But this doesnt quite map exactly. However using this to divide the left-X and the top-Y and then substract 58 and 97 respectively should map the coordinates to 0..n indices. Then I store a hash of all the grid information.. Once the grid array has been built, it is then just a case of reading the place information one line at a time converting the coordinates in the same way and then using that to look up the required info. However, as the mapping isn't exact, you then need to compare the real (x,y) from the place info against that stored in the mapped hash and adjust by one up or down on each axis as necessary. That done, you have the mapped info to use as required. Note: For obvious reasons the code below is untested, but it compiles clean and should give you a starting point. I'd also be interested to know
P.s. I think that this should work regardless of whether the sources files are sorted. The neat bit (from an ex-C and other languages programmer's point of view) is the autovivication of the 2-dimensional array. Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring! | [reply] [d/l] | |||||||||||||||
|
Re: Search file of coordinates
by Anonymous Monk on Oct 17, 2002 at 23:46 UTC | ||||||||||||||||
Forgot to ask the Drafting Supervisor today. The increments are 1066.666......, and 933.333....... Here is a little more data: 062,25,6,030933,112933,032000,113866 062,25,7,028800,112000,029866,112933 062,25,8,029866,112000,030933,112933 062,25,9,030933,112000,032000,112933 063,01,1,032000,125066,033066,125999 063,01,2,033066,125066,034133,125999 The maps are arranged 17 wide by 12 high. Some maps around the edges are nonexistant because the area in question is not rectangular. Map 001, which happens to be one of the missing, would be in the upper right corner. There are 25 grids per map and 9 subgrids per grid. Thanks | [reply] | |||||||||||||||
by BrowserUk (Patriarch) on Oct 19, 2002 at 09:09 UTC | ||||||||||||||||
Sorry, but unless there are errors in either your supplied data or your description, I cannot make sense of the layout of your data. I know you said that map 001 would be top right, but the contiguous values you supplied for maps 62/63 show that the X values increase as the map/grid/subgrid numbers increase (within any given horizontally adjacent set of maps/grids/subgrids), so if the maps are arranged right-left, top-bottom, then both the grids and the subgrids must also be so and the X-coordinate values must run right-left also. So, as it will make no difference if I reverse all the schemes and model this as having X-coords, maps, grids and subgrids running left-right, (being simply the mirror image) and as this is more intuative for my brain, that is what I have done for the rest of this analysis. I tried laying the map numbers out on a 17x12 grid, left-right, top-bottom and reassuringly maps 62 and 63 are adjacent. However, map 44 comes out one row up and one to the left of map 62 but that didn't fit with the coordinates you supplied, its x-values being higher than those of map62 & 63. I also noted that as each subgrid has an x-dimension of 1066.67, then a map must have a x-dim of 1066.67*3*5= 16000, and as the boundary between cells 62 and 63 was at 32000, this placed 0 in the X-axis roughly central in your 17x12 grid., which didn't make sense. Then I thought maybe you made a typo and meant to say 'The maps are arranged 17 high by 12 wide' instead of "The maps are arranged 17 wide by 12 high", So I tried that. Again maps 62 and 63 are adjacent and this time they are only one map away from the left hand edge, which fitted perfectly with the x-coords supplied. Whats more, map 44 was way over to the right, which made sense from the supplied data. Looking good.
154000 +---+---+---+---+---+---+---+---+
|037|038|039|040|041|042|043|044|
140000 +---+---+---+---+---+---+---+---+
|049|050|051| | +->128000
126000 +---+---+---+ +->112000
|061|062|063|
112000 +---+---+---+
| | | +048000
| | +->032000b
| +->016000
+->000000
However, as you can see from the above, this layout puts the x-boundaries of map 44 at 112000 and 128000. The data you supplied was 044, 15, 2, 061866, 133466, 062933, 134399 A quick cross-reference with map 107 showed that its x-boundaries should be at 150000 and 166000 but again this doesn't fit the supplied data. I also found a caveat with my previous attempt at a solution to this problem. I attempted to create a 2D array of hashes each containing 7 elements as I had proposed, but on my machine (with 256MB ram) this resulted in an 'Out of memory' error from perl. (I'm not sure why it doesn't just get paged as other programs do. I can, (albeit slowly) edit a 500MB file with my in-memory editor without problems, but that's a side issue!). Anyway, whilst I could see ways of reducing the memory requirements of that solution, it was already halfway to the possible solution below. So I abondoned that in favour of what follows. In theory, if your layout is regular, you shouldn't need to use the grids file to map your thingy coordinates to Map/Grid/subgrid. It should be possible to derive them mathematically. If your data fitted my model (:^)..then the following code would map your coords directly. As its stands (with MapXN=12 and MapYN=17) it maps those subgrids supplied for maps 62/63 perfectly, but gets the map # wrong on those supplied for map 44 and 107 although the grid and subgrid numbers again are calculated correctly. If you switch these back to your stated 17x12, all the map numbers are wrong, but again the grid and subgrid calculate correctly. Read more... (3 kB)
I suspect that the reason is that your statement that Some maps around the edges are nonexistant because the area in question is not rectangular. Map 001, which happens to be one of the missing, would be in the upper right corner. is somewhat abiguous and that instead of numbering the maps 1-17 on the first row and 18-34 on the second and not using the map numbers for those outside the area. Instead, only those squares (maps) that contain part of the area in question are numbered. A 'picture' might clarify that.
And this is why I cannot make sense of the sample data. If this is the case, then you still don't need to have a file containing the mapping for all the individual map/grid/subgrid triples. You only need a mapping from the calculated map number to the actual map number. For example, in the above case these mapping would take considerably less memory.
I hope that this is of some help in getting you started on a solution. Thankyou for presenting an interesting problem. If you have any problems implementing this (or an alternative solution), come back with your code and I'm sure you'll get further assistance here, especially if you take the time to sign up and become a member. It's free, and well worth the insignificant trouble. Good luck. Cor! Like yer ring! ... HALO dammit! ... 'Ave it yer way! Hal-lo, Mister la-de-da. ... Like yer ring! | [reply] [d/l] [select] | |||||||||||||||