Is it prohibitively expensive to pull your data set into a database and do SQL queries on it? Your 'loci' file could probably be placed directly into a table with no additions. Your region file would probably need an ID column that is unique, and auto-incrementing. You could then "SELECT min_pos, max_pos FROM region WHERE id=0000001;". Next, "SELECT * FROM loci WHERE pos < ? AND pos > ?;", and execute with your bind values of $min_pos and $max_pos.

I usually get SQL wrong on the first try, so you would need to craft your own. But this type of approach gives you the full power of a relational database. The biggest problem would be the time it takes to populate the database. That may be an issue.

Through the magic of DBI you can still enjoy the pleasure of using Perl in your solution. And through the deep magic of DBIx::Class and DBIx::Class::Schema::Loader, you could even get away with not having to worry about the SQL.

If the dataset is short-lived enough that it proves to be prohibitively expensive to pull it all into a database, this entire post becomes just a filler between your question and the next post that provides a 100% Perl approach. ;)


Dave


In reply to Re: Best approach for large-scale data processing by davido
in thread Best approach for large-scale data processing by iangibson

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.