If they are as the previous post said fasta or qual format then bioperl is next to none for parsing sequences formats (although will complain they don't look like sequences if they have funny characters in there 'looks like you're using scores').
You say: 'I cannot store anything in arrays or variables since I have to parse 3 GB file.'.
'The possible size of an array is only limited by how much memory you have'.
Could you possibly post the code you've tried so we can see how you're storing stuff.
You
may be able to combat the problem by using references.
Storing ref's of nucleotides:
'Example, would need to see your code to tailor it better':
my $a_ref = \'A';
my $c_ref = \'C';
my $t_ref = \'T';
# etc...
# Then storing these values in an array reference:
my $base_ref;
while ( <$fh> ) {
# Get the correct values you need
my $nuc_ref = $base eq 'A' ? $a_ref
: $base eq 'C' ? $c_ref
: $base eq 'G' ? $g_ref
: $base eq 'T' ? $t_ref
: $n_ref;
push @{$base_ref}, $nuc_ref;
}
What is this doing?
Well now each element in the array is now just a
reference to ( A, T, C, G, N ), and
not a char in each element. See perlreftut for more info*
You may still have some trouble with upper limit of arrays etc. but i've read in about 10,000 files to a single data structure before without a hitch. It's all about how you do it.
Update: If I've seriously overlooked something please say.
If you could post some examples of what you've tried we may be able to streamline it.
Hope that helps-
john
Ps. First post had a good idea about database
* See
perlreftut for more information not sure if I can explain it that well
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.