Re: How do I backtrack while reading a file line-by-line?
by ikegami (Patriarch) on Oct 13, 2006 at 18:59 UTC
|
tell to save your spot, seek to return to it.
| [reply] |
|
|
| [reply] |
Re: How do I backtrack while reading a file line-by-line?
by grep (Monsignor) on Oct 13, 2006 at 19:35 UTC
|
my @array = qw/ foo bar baz blah bar blah baz/;
my $save = 0;
my %done;
for (my $x = 0; $x <= $#array; $x++) {
$save = $x if ($array[$x] eq 'bar' );
print "X:$x SAVE:$save $array[$x]\n";
if ( $array[$x] eq 'blah' and !defined($done{$x}) ) {
$done{$x}++;
$x = $save;
}
}
grep
|
One dead unjugged rabbit fish later |
| [reply] [d/l] [select] |
|
|
That section on memory usage is very misleading. Tie::File keeps the index of every encountered lines (i.e. every lines up to the highest one read/written) in memory. In other words, if you do $tied[-1] or push @tied, ..., the index of every line in the file is loaded into memory (if they haven't already been loaded).
Tie::File is still a very useful module.
| [reply] [d/l] [select] |
|
|
from the POD:
memory - This is an upper limit on the amount of memory that Tie::File will consume at any time while managing the file. This is used for two things: managing the read cache and managing the deferred write buffer
I didn't find that misleading. It says to me that only chunks of the file data are loaded into memory. In fact, I assumed that it loaded a full index of the lines at instantiation.
If the OP knows about how much data an average (or the largest) backtrack is, the read cache could optimized for memory usage/speed. Plus you get a layer of abstraction to hide any nastiness.
grep
|
One dead unjugged rabbit fish later |
| [reply] |
|
|
Re: How do I backtrack while reading a file line-by-line?
by madbombX (Hermit) on Oct 13, 2006 at 19:37 UTC
|
Every time to you come to a line that starts with '>', you could push it onto an array. Then refer back to the array each time you want to access previous lines that started with '>'. I don't know how many times you come across lines that (since it can create quite a large array).
That being said, to add onto ikegami's idea, you can use tell to tell you where the line is, push that on an array. Then when you want to go back X number of times, then you can always seek to the line ($lines[-1] .. $lines[-4]). | [reply] [d/l] |
|
|
| [reply] |
|
|
Unfortunately, I am reading in files that contain genome data, at the lines starting with '>' correspond to the start of a new chromosome. So, a ~500 Mb file will contain less than 50 lines starting with '>'. So, reading everything inbetween them into the buffer almost defeats the purpose of the buffer itself.
Thanks anyway tho.
Matt
| [reply] |
Re: How do I backtrack while reading a file line-by-line?
by BrowserUk (Patriarch) on Oct 13, 2006 at 21:41 UTC
|
Sounds very much like you're trying to read a Fasta format sequence file?
You could use Bio::SeqIO, or if that is giving you problems you might try my crude Fasta load routine. It's the last code snippet in Re^5: Memory Usage in Regex On Large Sequence. That post/thread also shows a problem with the cpan module along with one reason why it's performance is not so good. Though that might have been fixed by now.
Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
Lingua non convalesco, consenesco et abolesco. -- Rule 1 has a caveat! -- Who broke the cabal?
"Science is about questioning the status quo. Questioning authority".
In the absence of evidence, opinion is indistinguishable from prejudice.
| [reply] |
|
|
I am indeed trying to read Fasta files.
In fact, what I'm doing is creating a search function that, given a variable number of user-input DNA sequences (such as amino acid motifs, or transcription factor binding sites), it searches a user specified Fasta file for all hits, either totally, or within $interval bases of each other, and then outputs all the hits both as .html format and as .fasta format, and the .html would have all the matches highlighted in various colors.
So far, I haven't read up at all on modules, so I suppose that's the next step in my Perl learning curve.
Thanks for the pointer. I'll definitely check it out.
Matt
| [reply] |
Re: How do I backtrack while reading a file line-by-line?
by holli (Abbot) on Oct 13, 2006 at 19:55 UTC
|
| [reply] [d/l] |
Re: How do I backtrack while reading a file line-by-line?
by blazar (Canon) on Oct 14, 2006 at 10:17 UTC
|
Nothing to do with your question, but...
while ($newline = <FILEHANDLE>){
use strict;
use warnings;
and then
while (my $newline = <FILEHANDLE>){
if ($newline = /^>/) {
This is most probably not what you want, since you're assigning to $newline. You want
if ($newline =~ /^>/) {
instead.
$stuff = $newline;
&play_with($stuff);
Unless play_with() modifies its argument, you may want to pass $newline directly to it, without passing through an intermediate variable. But more importantly, the &-form of sub call is now obsolete and likely not to do what one may think, so unless you do know, don't! | [reply] [d/l] [select] |
|
|
Thanks.
I didn't know about the & form being deprecated, so i will get rid of that.
And &play_with($stuff) does indeed modify $stuff, so I guess I am doing the right thing there, although I can't take credit for doing it on purpose. ;)
Thanks for the info though.
Matt
| [reply] |
|
|
if($newline = /^>/) {
was NOT deliberate. just a typo. In my actual code, it reads:
if($newline =~ /^>/) {
| [reply] [d/l] [select] |