in reply to Re: indexing segments
in thread indexing segments

This effectively give the tree hassle to the database.

Well, it's more saying to the database "please perform a linear search for me". Kind of an overkill to use a database for that, you might as well use a grep statement in Perl. And no, creating indices on start_value and end_value isn't going to help you in this case - it may still lead to a table scan (and hence a linear search), and just report a single match.

Abigail

Replies are listed 'Best First'.
Re: Re: indexing segments
by EvdB (Deacon) on Oct 10, 2003 at 13:03 UTC
    I don't think it is a straight linear scan - there is possible confusion over what the data is:
    I have a sizeable array of segments (> 100K), each element basically has a length of about 100-1000 and a start/end value anywhere between 1 and a few hundred million.

    If it is just a list of single values then 'a length of about 100-1000' and 'a start/end value anywhere between 1 and a few hundred million' are mutually exclusive.

    I may be completely wrong but it would appear that each array element is more involved than just a single value but can be summarised with a start and end value. This make the database more suitable.

    There are also the below mentioned storage considerations.

    That said if it is just a list of values you are correct - grep or similar would be the way forward.

    --tidiness is the memory loss of environmental mnemonics

      If it is just a list of single values then 'a length of about 100-1000' and 'a start/end value anywhere between 1 and a few hundred million' are mutually exclusive.

      I guess I phrased this poorly. It is not a list of single values, each element has two values: a start position, and an end position. What I meant was that the space these elements occupy can go up to a few hundred million (ie, with only a few hundred thousand elements it is fairly sparsely populated), but the length of each element (end - start) is usually no more than a few thousand.

      Hope that clarifies it.

      That said if it is just a list of values you are correct - grep or similar would be the way forward.

      grep is the slow way to go, it not being good enough is why I am seeking the wisdom in the first place :)

        Just mulling on this now.

        If you have a start and end value for each element then you could plot each element on an xy chart. Then to get the ones you want you could select an area on the chart which matches your criteria.

        Now my copy of 'Mastering Algorithm with Perl' has chapters on both graphs and sets. This might be a fruitful line of study.

        On the database front Postgres has lots of builtin coordinate functions - although you're better off in Perl me thinks.

        I should have taken CS instead of Physics - alas I can be of no more help until I hav eread up some more...

        --tidiness is the memory loss of environmental mnemonics