Is it prohibitively expensive to pull your data set into a database and do SQL queries on it? Your 'loci' file could probably be placed directly into a table with no additions. Your region file would probably need an ID column that is unique, and auto-incrementing. You could then "SELECT min_pos, max_pos FROM region WHERE id=0000001;". Next, "SELECT * FROM loci WHERE pos < ? AND pos > ?;", and execute with your bind values of $min_pos and $max_pos.
I usually get SQL wrong on the first try, so you would need to craft your own. But this type of approach gives you the full power of a relational database. The biggest problem would be the time it takes to populate the database. That may be an issue.
Through the magic of DBI you can still enjoy the pleasure of using Perl in your solution. And through the deep magic of DBIx::Class and DBIx::Class::Schema::Loader, you could even get away with not having to worry about the SQL.
If the dataset is short-lived enough that it proves to be prohibitively expensive to pull it all into a database, this entire post becomes just a filler between your question and the next post that provides a 100% Perl approach. ;)
Dave
In reply to Re: Best approach for large-scale data processing
by davido
in thread Best approach for large-scale data processing
by iangibson
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |