Re: concatenating multiple lines without using . operator
by Corion (Patriarch) on Jun 13, 2012 at 13:24 UTC
|
No. That's why Perl has the concatenation operator.
Maybe you can explain to us the situation you have, then we can come up with interesting or applicable solutions to achieve your intended goal.
For example a combination of join and s/\r\n//g could work.
| [reply] [d/l] |
|
|
while (<IN>) {
chomp;
if (/^>chr(\S*)$/) {
$chr = $1;
#print STDERR "[$chr]\n";
}
else {
chomp $_;
$_ =~ s/\s\r\n\t//g;
$seq{$chr} = join ("",$_);
#$seq{$chr} = `perl -pe 'chomp; END {print "\n" }' $file`;
#$seq{$chr} .= $_;
print "$seq{$chr}\n";
}
#print OUT "$seq{$chr}\n";
}
Thanks
| [reply] [d/l] |
|
|
$_ =~ s/\s\r\n\t//g;
See perlre and YAPE::Regex::Explain.
Q:\>perl -MYAPE::Regex::Explain -we "print for YAPE::Regex::Explain->n
+ew(shift)->explain;" "\s\r\n\t"
The regular expression:
(?-imsx:\s\r\n\t)
matches as follows:
NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\s whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
\r '\r' (carriage return)
----------------------------------------------------------------------
\n '\n' (newline)
----------------------------------------------------------------------
\t '\t' (tab)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------
| [reply] [d/l] [select] |
|
|
| [reply] [d/l] [select] |
|
|
It seems that what you are looking for is the "s" modifier for regex. Try:
$_ =~ s/[\s\r\n\t]//sg;
Also note - since the \s \r .. etc are alternative characters, and NOT a sequence, I have placed them in [brackets].
From the docs:
s Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which
normally it would not match.
Update: Ignore the suggestion to use the "s" modifier. It is not necessary. See jwkrahn note below.
I hope life isn't a big joke, because I don't get it.
-SNL
| [reply] [d/l] |
|
|
|
|
|
|
|
|
|
Re: concatenating multiple lines without using . operator
by marto (Cardinal) on Jun 13, 2012 at 13:27 UTC
|
The "good way" is to use the concatenation operator which you want to avoid for some reason (this sounds like one of those interview questions). join could be used:
#!/usr/bin/perl
use strict;
use warnings;
my $foo = "123";
my $bar = "456";
$foo = join "", $foo, $bar;
print "$foo\n";
| [reply] [d/l] |
Re: concatenating multiple lines without using . operator
by solegaonkar (Beadle) on Jun 13, 2012 at 13:29 UTC
|
If you have to, you can use something like $concatenated = "$line1$line2" ...
But, as Cornion said, knowing why you want to do this might be helpful in solving the problem...
| [reply] |
A reply falls below the community's threshold of quality. You may see it by logging in. |
Re: concatenating multiple lines without using . operator
by frozenwithjoy (Priest) on Jun 14, 2012 at 06:23 UTC
|
I think the safer approach to accomplishing this is to use something like this:
#!/usr/bin/env perl
use strict;
use warnings;
use Bio::SeqIO;
use v5.10; #or later... or change 'say' to 'print' X_x
my $fasta_in = "input.fa";
open my $fasta_out, ">", "output.fa";
my $seqio_in = Bio::SeqIO->new(
-file => $fasta_in,
-format => 'Fasta',
);
my ( $seq_obj, %seq_hash );
while ( my $seq_obj = $seqio_in->next_seq() ) {
my $seq_id = $seq_obj->display_id(); #this is the sequence ID
my $seq = $seq_obj->seq(); #this is the actual sequen
+ce
$seq_hash{$seq_id} = $seq; #and hashed!
#to print them to your screen in a "consolidated" FASTA format:
say ">$seq_id";
say $seq_hash{$seq_id};
#to save to a file in a "consolidated" FASTA format:
say $fasta_out ">$seq_id";
say $fasta_out $seq_hash{$seq_id};
}
exit;
You can trim some of the stuff inside the while loop depending on what you actually want to do. For example, if you don't need to use the hash later, there is no point making it, etc.
I've tested this and it works. A sample input and corresponding output can be found here: https://gist.github.com/2928252. | [reply] [d/l] |
|
|
#!/usr/bin/perl
use strict;
use warnings;
use Bio::SeqIO;
my $in = Bio::SeqIO->new( -file => "input1.txt" ,
-format => 'fasta');
my $out = Bio::SeqIO->new( -file => '>test.dat',
-format => 'fasta');
while ( my $seq = $in->next_seq() ) {
if ($seq->id() =~ /^chr(\S*)$/) {
$seq->display_id($1); # change id
}
$out->write_seq($seq);
}
__END__
*** input 1
>chr1
AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGC
CAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAAT
TTTATCTTTAGGCGGTATGCACTTTTAACAAAAAANNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
GCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAAC
CAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCNNNN
>chrM
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT
TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG
GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT
CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA
AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT
GTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAANAATTTCCACC
>GJKKTUG01DYDGC
GGGTATTCCTTCTCCACCTTGCAGCTAACATCAGTGTTTCGTCTACTCAAGCACGCCAAC
ACGCCCTAGAGCGCCCTGTCCAGGGGATGGCAACCAACTCTGACCCTGCAAGTGCAGCAG
ACATGAGGAATACAAACTACAATCTTTTACTTGATGATGCAATGCCGGACAAACTCTAGA
>F0Z7V0F01EDB3V
AAGGCGAGNGGTATCACGCAGTAAGTTACGGTTTTCGGGTAACGCGTCNGNGGNACTAAC
CCACGGNGGGTAACCCGTCNCTACCGGTATAGGACTAAGGTTACCGGAACGTCGTGGGGT
ACCCCCCGGACGGGGACCGTCCCCTCATANAGTCAACNGTNTGAGATGGACTAACTCAAA
CCTAGTTTCAAGTACTATTTAACTTACTTACGTTACCCGTAATTTCGGCGTTTAGAGGCG
Output:
>1
AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAACCCCAA
AAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTAGGCGGTATGC
ACTTTTAACAAAAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN
NNNNNNNNNNNNNNNNNNNNGCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCA
TACCCCGAACCAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCNNNN
>M
GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTT
CGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTC
GCAGTATCTGTCTTTGATTCCTGCCTCATTCTATTATTTATCGCACCTACGTTCAATATT
ACAGGCGAACATACCTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATA
ACAATTGAATGTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAANAATTTCCACC
>GJKKTUG01DYDGC
GGGTATTCCTTCTCCACCTTGCAGCTAACATCAGTGTTTCGTCTACTCAAGCACGCCAAC
ACGCCCTAGAGCGCCCTGTCCAGGGGATGGCAACCAACTCTGACCCTGCAAGTGCAGCAG
ACATGAGGAATACAAACTACAATCTTTTACTTGATGATGCAATGCCGGACAAACTCTAGA
>F0Z7V0F01EDB3V
AAGGCGAGNGGTATCACGCAGTAAGTTACGGTTTTCGGGTAACGCGTCNGNGGNACTAAC
CCACGGNGGGTAACCCGTCNCTACCGGTATAGGACTAAGGTTACCGGAACGTCGTGGGGT
ACCCCCCGGACGGGGACCGTCCCCTCATANAGTCAACNGTNTGAGATGGACTAACTCAAA
CCTAGTTTCAAGTACTATTTAACTTACTTACGTTACCCGTAATTTCGGCGTTTAGAGGCG
Chris | [reply] [d/l] [select] |
|
|
My impression is that s/he wanted the sequence to be on a single line, whereas write_seq auto-formats fasta output to columns of 60 of nucleotides/amino acids. That's why I settled with:
say $fasta_out $seq_hash{$seq_id};
You should be able to set the width with $seq_obj->Bio::SeqIO::fasta::width($new_width). I'm able to set a new width and $seq_obj->Bio::SeqIO::fasta::width() returns this new width; however, I can't get it to actually print using the new width... it just reverts to 60. Any suggestions?
-Mike
edit: btw, the code I posted does keep the sequences in Fasta format.
| [reply] [d/l] [select] |
|
|
|
|