in reply to indexing segments

Not really an answer to your question but a solution:

You could create a database table such as:

create table elements ( position int8, start_value int8, end_value int8 );

Then, after filling the table, this query would give you the desired result:

select position from element where end_value >= 50000 and start_value +<= 1000000;

This effectively gives the tree hassle to the database.

UPDATE: Changed the SQL query to reflect the original query in the question (and a typo)

--tidiness is the memory loss of environmental mnemonics

Replies are listed 'Best First'.
Re: indexing segments
by Abigail-II (Bishop) on Oct 10, 2003 at 12:53 UTC
    This effectively give the tree hassle to the database.

    Well, it's more saying to the database "please perform a linear search for me". Kind of an overkill to use a database for that, you might as well use a grep statement in Perl. And no, creating indices on start_value and end_value isn't going to help you in this case - it may still lead to a table scan (and hence a linear search), and just report a single match.

    Abigail

      I don't think it is a straight linear scan - there is possible confusion over what the data is:
      I have a sizeable array of segments (> 100K), each element basically has a length of about 100-1000 and a start/end value anywhere between 1 and a few hundred million.

      If it is just a list of single values then 'a length of about 100-1000' and 'a start/end value anywhere between 1 and a few hundred million' are mutually exclusive.

      I may be completely wrong but it would appear that each array element is more involved than just a single value but can be summarised with a start and end value. This make the database more suitable.

      There are also the below mentioned storage considerations.

      That said if it is just a list of values you are correct - grep or similar would be the way forward.

      --tidiness is the memory loss of environmental mnemonics

        If it is just a list of single values then 'a length of about 100-1000' and 'a start/end value anywhere between 1 and a few hundred million' are mutually exclusive.

        I guess I phrased this poorly. It is not a list of single values, each element has two values: a start position, and an end position. What I meant was that the space these elements occupy can go up to a few hundred million (ie, with only a few hundred thousand elements it is fairly sparsely populated), but the length of each element (end - start) is usually no more than a few thousand.

        Hope that clarifies it.

        That said if it is just a list of values you are correct - grep or similar would be the way forward.

        grep is the slow way to go, it not being good enough is why I am seeking the wisdom in the first place :)

Re: Re: indexing segments
by glwtta (Hermit) on Oct 10, 2003 at 13:21 UTC
    Hm, I'm pretty sure this would do a seq scan on the table.

    In any case, that doesn't help me; my problem is that a linear search in memory would be too slow, throwing a database in there is definitely not going to help.