Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Venerable monks

Lets say i had a information about a set of lines. The information i have is the length of a line in terms of its position along the x axis. So for each line i have properties start_x and end_x

I also have information about a set or coordinates which just have the property x_pos which is their location on the x axis

Lets say i wanted to find out if any of the co-ordinates reside in any of the lines. This result could be represented by a notional property of the coordinate called isWithinLine

What is the best way of finding if a coordinate is within a line? All of my data is in a database and can be loaded into any perl structures you recommend. I have tried doing this within a database using sql and it is very very slow.

Please bear in mind I am only a part-time perl programmer and have limited knowledge of the extent of all perls features. I'm more than happy to look them up if you could point me in the right direction.

thanks a lot
  • Comment on fast way of working with numerical positions

Replies are listed 'Best First'.
Re: fast way of working with numerical positions
by moritz (Cardinal) on Apr 27, 2011 at 15:02 UTC
    My replies to Optimizing a double loop show how a very similar problem can be solved with inversion lists in perl. However
    I have tried doing this within a database using sql and it is very very slow.

    makes me think that what you really need is a suitable index over the columns you are searching.

Re: fast way of working with numerical positions
by runrig (Abbot) on Apr 27, 2011 at 14:47 UTC
    Search on CPAN for IntSpan. Something there should work for you I think (though if your numbers are not integers, you might have to scale them to be integers). I haven't used any of them, so I can't recommend anything. Array::IntSpan or Set::IntSpan perhaps. Or maybe one that claims to be "::Fast" :-)
Re: fast way of working with numerical positions
by chrestomanci (Priest) on Apr 27, 2011 at 15:04 UTC

    You say co-ordinates, but you only mention an X axis. Are you working in a 1 dimensional co-ordinate system, or is it a 2 or 3 (or more) dimensional system?

    If you are working in more than one dimension, then you will find that an arbitrary point will rarely be exactly on an arbitrary line due to floating point precision issues. Instead you will need to test how close the point is to the line. There are a number of mathematical ways to do that. Off the top of my head an easy way would be to work out the area of the triangle composed of the point and lines's endpoints and then divide that by the line's length. That will give you half the distance of the point from the line.

    If you are working in only one dimension then a simple numerical comparison should suffice:

    say "point is on line" if( point_x >= line_start && point_x <= line_end )
Re: fast way of working with numerical positions
by BrowserUk (Patriarch) on Apr 27, 2011 at 16:26 UTC

    I don't believe we can offer you an efficient solution based on that scant information, without asking a few questions:

    • Are your coordinates integers or floating point values?
    • What is the minimum/maximum range of your coordinate space?
    • How many pairs of line coordinates are you dealing with?
    • How many single positions will you need to look up?

    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
      more information:

      positions are integers; there are millions of single positions; there are millions of lines; the length of the lines is typically in the 1000s

      I have tried this in a database with judicious use of joins. I'm happy to stick with the database but i thought due to the large numbers involved there might be a quicker solution in perl.

        You've omitted the overall range.

        A very efficient solution for a total range of 0 .. 100,000, becomes impractical for a range of 0 .. 4 billion.

        I'm happy to stick with the database

        'nuff said. You have your solution.

        but i thought due to the large numbers involved there might be a quicker solution in perl.

        There quite probably is, but if what you have is good enough, why expend the effort?


        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.
Re: fast way of working with numerical positions
by Anonymous Monk on Apr 27, 2011 at 14:56 UTC
    Forgive me if I don't see the problem here, but this sounds like a database join, not something you want to be loading into perl one row at a time. More like you want to do the join (is it SQL?) in the DB and capture the results if any. Approx:
    SELECT * from lines, coords WHERE x_pos >= start_x AND x_pos <= end_x
Re: fast way of working with numerical positions
by fidesachates (Monk) on Apr 27, 2011 at 15:39 UTC
    I have a feeling we're making your problem bigger than it really is. While I'm of the opinion that this really should be fixed with a better index on the database side, you've asked for a perl solution so I'll provide as basic one as I can.

    Grab the information for the lines and concat them into a string separated by a comma. Simply add it on to a an array.
    $string = "$a,$b"; push(@array, $string);
    Then once you've loaded all lines into the array, you can grab the point data and loop through them. You can store the points into an array and since it's just a number, just store it as is (no need to turn it into a string). For each point you'll want to do something like this
    foreach my $value (@array) { my($a,$b) = split(/,/, $value); if($x >= $a && $x <= $b) { print "This point is on the line"; } }


    Please let us know if you need assistance grabbing the data from the database.