Re: How to avoid using array in concatenating string of multiple lines
by Zaxo (Archbishop) on Dec 09, 2004 at 10:58 UTC
|
You can accumulate the current string in a scalar and then print and reset it as soon as you know you'll want to. The code will look a lot like what you have.
#!/usr/bin/perl -w
use strict;
my ($string, $count);
while(<DATA>){
s/\s//g;
next if !$count and /^>/;
if (/^>/ and $count) {
print $count, ' : ', $string, $/;
$string = '';
$count++;
next;
}
$count || $count++;
$string .= $_;
}
print $count, ' : ', $string, $/;
__DATA__
> Seq 1 (two lines)
AAAAAAAAAAAAA
CCAAAAAAAAAAA
> Seq 2 (two lines)
AAAAAAAAAAAAA
AAAAAAAAAAAAA
> Seq 3 (one line)
TTTTTTTTTTTTAACTGAAGATTCGC
I've removed a fencepost error by checking that the > line is not the first.
Having a leading marker instead of a trailing one makes this a little awkward.
| [reply] [d/l] |
Re: How to avoid using array in concatenating string of multiple lines
by reneeb (Chaplain) on Dec 09, 2004 at 12:02 UTC
|
use Bio::FASTASequence::File;
my $file = '/path/to/seq.fa';
my $obj = Bio::FASTASequence::File->new($file);
my $result_ref = $obj->get_result();
my $counter = 1;
foreach(keys(%{$result_ref})){
print $counter,": ",$result_ref->{$_}->getSequence(),"\n";
$counter++;
}
You sequences:
>Seq1 (two lines)
AAAAAAAAAAAAA
CCAAAAAAAAAAA
>Seq2 (two lines)
AAAAAAAAAAAAA
AAAAAAAAAAAAA
>Seq3 (one line)
TTTTTTTTTTTTAACTGAAGATTCGC
| [reply] [d/l] [select] |
Re: How to avoid using array in concatenating string of multiple lines
by snowcrash (Friar) on Dec 09, 2004 at 11:01 UTC
|
#!/usr/bin/perl -w
use strict;
my $i = 0;
while(<DATA>){
s/\s//g;
if (/^>/) {
print "\n" if $i;
print ++$i, " : ";
next;
}
chomp;
print;
}
print "\n";
| [reply] [d/l] |
Re: How to avoid using array in concatenating string of multiple lines
by stajich (Chaplain) on Dec 09, 2004 at 13:07 UTC
|
I'd suggest using already written modules or looking at code from people who have already solved this problem. Bio::SeqIO for one. See the code in the next_seq method. The beauty is if you want to change the file format to genbank you replace 'fasta' with 'genbank'.
use Bio::SeqIO;
my $in = Bio::SeqIO->new(-format => 'fasta', -fh => \*DATA);
my $i =1;
while( my $s = $in->next_seq ) {
print $i++, " : ", $seq->seq(), "\n";
}
__DATA__
> Seq 1 (two lines)
AAAAAAAAAAAAA
CCAAAAAAAAAAA
> Seq 2 (two lines)
AAAAAAAAAAAAA
AAAAAAAAAAAAA
> Seq 3 (one line)
TTTTTTTTTTTTAACTGAAGATTCGC
| [reply] [d/l] |
|
|
Thanks so much for your reply stajich.
Indeed the module is very very useful.
However I find a problem while extending the usage. Perhaps you can give some advice.
Suppose I am taking a file as input (the content of the file is similar to my OP),
and wish to process that file in *multiple trials*. So I have the following code.
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
my $file = $ARGV[0];
open INFILE, "<$file" or die "$0: Can't open file $file: $!";
for (my $trial = 1; $trial <=2; $trial++)
{
seek(INFILE,0,0); #This is line 10
print "Trial $trial\n";
my $i =1;
my $in = Bio::SeqIO->new(-format => 'fasta', -fh => \*INFILE);
while( my $seq = $in->next_seq ) {
print $i++, " : ", $seq->seq(), "\n";
}
}
The my code above (especially in Bio::SeqIO method) encounter this warning
while arriving at second trial.
Trial 1
1 : TGCAATCACTAGCAAGCTCTCGCTGCCGTCACTAGCCTGTGG
2 : GGGGCTAGGGTTAGTTCTGGANNNNNNNNNNNNNNNNNNNNN
seek() on closed filehandle INFILE at test.pl line 10.
Trial 2
readline() on closed filehandle INFILE at /usr/lib/perl5/site_perl/5.8
+.0/Bio/Root/IO.pm line 440.
I know that I can avoid this warnings by replacing SEEK function with "open INFILE.."
But I am curious how can I solve this problem if I intend to keep the SEEK function.
Since I found the solution is neater that way. Hope to hear from you again.
| [reply] [d/l] [select] |
|
|
Then open the file within the for-loop:
#!/usr/bin/perl -w
use strict;
use Bio::SeqIO;
my $file = $ARGV[0];
for (my $trial = 1; $trial <=2; $trial++)
{
open INFILE, "<$file" or die "$0: Can't open file $file: $!";
seek(INFILE,0,0); #This is line 10
print "Trial $trial\n";
my $i =1;
my $in = Bio::SeqIO->new(-format => 'fasta', -fh => \*INFILE);
while( my $seq = $in->next_seq ) {
print $i++, " : ", $seq->seq(), "\n";
}
}
| [reply] [d/l] |
|
|
|
|
while( my $seq = $in->next_seq ) {
print $i++, " : ", $seq->seq(), "\n";
}
will read until the end of the filehandle. (That is why the loop ended in the first place). So subsequently calling next_seq on the $in object will give you the error mesg you are seeing.
You can
- open SeqIO object outside of trial loop, and put seek(INFILE,0) inside the loop. Note that if the SeqIO object gets destroyed (or goes out of scope), you will need to add the flag -noclose => 1 option when initing the SeqIO object or else the filehandle is closed. But if you are going to do this, just move the initialization of the SeqIO object outside the loop.
- re-open the file each time in the list (move the Bio::SeqIO initialization into the trial loop)
- Or read all the sequences in at once and keep them in memory. (put them into an array).
my @seqs;
while(my $s = $in->next_seq ) { push @seqs, $s; }
# now do your loop of trials
#3 might not work depending on how many sequences and how big they are. | [reply] [d/l] [select] |
|
|
|
|
Re: How to avoid using array in concatenating string of multiple lines
by Anonymous Monk on Dec 09, 2004 at 10:43 UTC
|
$/ = "\n>";
That is also clumsy, for several reasons:
- You have to delete the trailing ">", at the end of every string (but the last)
- You have to delete the leading ">", at the start of every line
- You still have to delete every newline, and chomp won't work, because of the changed setting for $/.
Oh well. Such is life.
s/^>//;
s/>$//;
tr/<n//d;
| [reply] [d/l] [select] |
Re: How to avoid using array in concatenating string of multiple lines
by pingo (Hermit) on Dec 09, 2004 at 10:44 UTC
|
This is most likely not the best or prettiest way of doing it, but I think it works. :-)
my $counter = 1;
my $tmp = '';
foreach(<DATA>) {
chomp;
if (!/^>/) {
$tmp .= $_;
} else {
print $counter++, " : $tmp\n" if length $tmp;
$tmp = "";
}
}
print $counter, " : $tmp\n" if length $tmp;
Update: Oops, missing chomp.
| [reply] [d/l] |
Re: How to avoid using array in concatenating string of multiple lines
by BrowserUk (Patriarch) on Dec 09, 2004 at 11:19 UTC
|
perl -ple"BEGIN{$/=qq'\n>'}s[>? Seq (\d+).*$][$1 : ]m; tr[\n.][]d" in
+>out
"But you should never overestimate the ingenuity of the sceptics to come up with a counter-argument." -Myles Allen
"Think for yourself!" - Abigail
"Time is a poor substitute for thought"--theorbtwo
"Efficiency is intelligent laziness." -David Dunham
"Memory, processor, disk in that order on the hardware side. Algorithm, algorithm, algorithm on the code side." - tachyon
| [reply] [d/l] |
Re: How to avoid using array in concatenating string of multiple lines
by sasikumar (Monk) on Dec 09, 2004 at 11:49 UTC
|
Hi,
I Feel this is better.
use strict;
open(DATA,"< c:\\temp.txt") || die "Sorry";
print grep (!/^>/,<DATA>);
As usual there are lot of ways to do the same in perl
Thanks
Sasi Kumar
Oops sorry i miss understood
Here its goes
use strict;
open(DATA,"< c:\\temp.txt") || die "Sorry";
while(<DATA>){
if ($_=~s/(^>\s+)//){
print "\n";
}
else
{
chomp;
print;
}
}
| [reply] [d/l] [select] |
Re: How to avoid using array in concatenating string of multiple lines
by si_lence (Deacon) on Dec 09, 2004 at 11:09 UTC
|
Yet another version
si_lence
use strict;
my $count=1;
while(<DATA>){
s/\s//g;
chomp;
/^>/ ? print "\n " . $count++ . ": " : print;
}
__DATA__
> Seq 1 (two lines)
AAAAAAAAAAAAA
CCAAAAAAAAAAA
> Seq 2 (two lines)
AAAAAAAAAAAAA
AAAAAAAAAAAAA
> Seq 3 (one line)
TTTTTTTTTTTTAACTGAAGATTCGC
| [reply] [d/l] |
Re: How to avoid using array in concatenating string of multiple lines
by Fletch (Bishop) on Dec 09, 2004 at 11:12 UTC
|
$ grep '^[ACGT]' foo | cat -n
1 AAAAAAAAAAAAA
2 CCAAAAAAAAAAA
3 AAAAAAAAAAAAA
4 AAAAAAAAAAAAA
5 TTTTTTTTTTTTAACTGAAGATTCGC
TMTOWTDI, some of them not involving perl at all . . .
Update: D'oh! Never mind me. That'll teach me to try and make a cogent point on 5 hours sleep . . .
| [reply] [d/l] |
|
|
| [reply] |