baxy77bax has asked for the wisdom of the Perl Monks concerning the following question:
on the other side i have another set of coordinatesstart,stop 22,25 >uname1 344,360 >uname2 433,540 >uname3 432,532 >uname4
what i'm trying to figure out is which users at specific intervals ran a specific job. so im trying to map job id's intervals to uname intervals. where rules are that even if only part of the job_id_interval crosses the uname interval, this should be reported. the thing, is there are over 20 million of such intervals(intervals overlap) in each group and the size of allowed interval in both cases is the same and spans from 1 to 20 million.start,stop 21,23 >job_id1 255,345 >job_id2 345,355 >job_id3 356,366 >job_id4
now what i was thinking about is to, using a Bit::Vector libraries, create vector field and map the both coordinated on the the vector space, then see where they overlap and just remove the non-overlapping fields. but then it hit me how will i track down which unames and job_ids those overlaps belong to. then i thought about hashing. but how will i the find a coordinate in my hash key that is less then X
i mean i would need to sort hash keys and the loop through them to find hash keys that are >= to some start key(id) and <= to some stop key(id).find: my $uname = $hash{> then $start and <= then stop} #???????
and now i'm stuck and crying to you for help.
so let me summarize my problem : i need to map if possible job_id's intervals to uname intervals and preserve >uname1 >job_id2 tags. keep in mind that those datasets have piled up over the years and are quite large. so some simple loop within a loop would not be a good solution
thank you
baxy
PS
max for the coordinates in both cases is 20000000
PPS
to moritz
this is a fraction from the real data set but don't worry about that since. the dataset, as i said large, and i cannot by hand pick real representative data to illustrate the problem
and these are unames :this corresponds to the job lines 14230157,14230182,3445:7:3:707:620 3437306,3439308,3445:7:3:990:634 14593103,14593128,3445:7:3:537:287 16948765,16948768,127305:7:3:49:800 12044820,12044845,127303:7:3:686:44 11310494,11310519,127340:7:3:67:320 19408728,19408753,127438:7:3:508:614 17007683,17007685,127439:7:3:481:403
as i said this is probably not a a good example for the problem illustration so please do refer to the example above :)16820359,16821584,5:7:3:1:5 17979480,17999505,4:7:3:948:200 12491787,14491812,4:7:3:784:575 17389967,18389969,34:7:3:617:920 11671837,19671839,34:7:3:516:921
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: mapping coordinates- suggestion needed
by moritz (Cardinal) on Oct 14, 2010 at 16:32 UTC | |
|
Re: mapping coordinates- suggestion needed
by jethro (Monsignor) on Oct 14, 2010 at 17:31 UTC | |
|
Re: mapping coordinates- suggestion needed
by BrowserUk (Patriarch) on Oct 14, 2010 at 16:30 UTC | |
|
Re: mapping coordinates- suggestion needed
by BrowserUk (Patriarch) on Oct 14, 2010 at 16:49 UTC | |
by baxy77bax (Deacon) on Oct 14, 2010 at 16:58 UTC | |
by BrowserUk (Patriarch) on Oct 14, 2010 at 17:31 UTC | |
by baxy77bax (Deacon) on Oct 15, 2010 at 06:17 UTC | |
by BrowserUk (Patriarch) on Oct 15, 2010 at 15:57 UTC | |
|
Re: mapping coordinates- suggestion needed
by jandrew (Chaplain) on Oct 15, 2010 at 03:38 UTC |