concatenating multiple lines without using . operator

anonym has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: concatenating multiple lines without using . operator by Corion (Patriarch) on Jun 13, 2012 at 13:24 UTC
No. That's why Perl has the concatenation operator. Maybe you can explain to us the situation you have, then we can come up with interesting or applicable solutions to achieve your intended goal. For example a combination of join and `s/\r\n//g` could work.	[reply] [d/l]
Re^2: concatenating multiple lines without using . operator by anonym (Acolyte) on Jun 13, 2012 at 13:31 UTC
Thanks.Yep, I tried $_ =~ s/\r\n//g; $seq{$chr} = join ("", $_); already but it does nt concatenate as expected .Below is my code: while (<IN>) { chomp; if (/^>chr(\S*)$/) { $chr = $1; #print STDERR "[$chr]\n"; } else { chomp $_; $_ =~ s/\s\r\n\t//g; $seq{$chr} = join ("",$_); #$seq{$chr} = `perl -pe 'chomp; END {print "\n" }' $file`; #$seq{$chr} .= $_; print "$seq{$chr}\n"; } #print OUT "$seq{$chr}\n"; } [download] Thanks	[reply] [d/l]
Re^3: concatenating multiple lines without using . operator by Corion (Patriarch) on Jun 13, 2012 at 13:36 UTC
Why didn't you show this code when you asked your initial question? Please also explain what this regular expression in your code is supposed to do: `$_ =~ s/\s\r\n\t//g;` [download] See perlre and YAPE::Regex::Explain. Q:\>perl -MYAPE::Regex::Explain -we "print for YAPE::Regex::Explain->n +ew(shift)->explain;" "\s\r\n\t" The regular expression: (?-imsx:\s\r\n\t) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- \r '\r' (carriage return) ---------------------------------------------------------------------- \n '\n' (newline) ---------------------------------------------------------------------- \t '\t' (tab) ---------------------------------------------------------------------- ) end of grouping ---------------------------------------------------------------------- [download]	[reply] [d/l] [select]
Re^3: concatenating multiple lines without using . operator by AnomalousMonk (Archbishop) on Jun 13, 2012 at 14:37 UTC
`$seq{$chr} = join ("",$_);` [download] The join Perl built-in joins a list of strings. In the statement above, the list consists in the single string contained in the `$_` scalar, so the output of `join` will be exactly the same as the input.	[reply] [d/l] [select]
Re^3: concatenating multiple lines without using . operator by NetWallah (Canon) on Jun 13, 2012 at 13:45 UTC
~~It seems that what you are looking for is the "s" modifier for regex.~~ Try: `$_ =~ s/[\s\r\n\t]//sg;` [download] Also note - since the \s \r .. etc are alternative characters, and NOT a sequence, I have placed them in [brackets]. From the docs: s Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match. Update: Ignore the suggestion to use the "s" modifier. It is not necessary. See jwkrahn note below. I hope life isn't a big joke, because I don't get it. -SNL	[reply] [d/l]
Re^4: concatenating multiple lines without using . operator by jwkrahn (Abbot) on Jun 13, 2012 at 19:36 UTC
Re^5: concatenating multiple lines without using . operator by NetWallah (Canon) on Jun 13, 2012 at 23:25 UTC
Re^4: concatenating multiple lines without using . operator by anonym (Acolyte) on Jun 13, 2012 at 13:54 UTC
Re^5: concatenating multiple lines without using . operator by NetWallah (Canon) on Jun 13, 2012 at 15:49 UTC
Some notes below your chosen depth have not been shown here
Re: concatenating multiple lines without using . operator by marto (Cardinal) on Jun 13, 2012 at 13:27 UTC
The "good way" is to use the concatenation operator which you want to avoid for some reason (this sounds like one of those interview questions). join could be used: `#!/usr/bin/perl use strict; use warnings; my $foo = "123"; my $bar = "456"; $foo = join "", $foo, $bar; print "$foo\n";` [download]	[reply] [d/l]
Re: concatenating multiple lines without using . operator by solegaonkar (Beadle) on Jun 13, 2012 at 13:29 UTC
If you have to, you can use something like $concatenated = "$line1$line2" ... But, as Cornion said, knowing why you want to do this might be helpful in solving the problem...	[reply]
A reply falls below the community's threshold of quality. You may see it by logging in.
Re: concatenating multiple lines without using . operator by frozenwithjoy (Priest) on Jun 14, 2012 at 06:23 UTC
I think the safer approach to accomplishing this is to use something like this: #!/usr/bin/env perl use strict; use warnings; use Bio::SeqIO; use v5.10; #or later... or change 'say' to 'print' X_x my $fasta_in = "input.fa"; open my $fasta_out, ">", "output.fa"; my $seqio_in = Bio::SeqIO->new( -file => $fasta_in, -format => 'Fasta', ); my ( $seq_obj, %seq_hash ); while ( my $seq_obj = $seqio_in->next_seq() ) { my $seq_id = $seq_obj->display_id(); #this is the sequence ID my $seq = $seq_obj->seq(); #this is the actual sequen +ce $seq_hash{$seq_id} = $seq; #and hashed! #to print them to your screen in a "consolidated" FASTA format: say ">$seq_id"; say $seq_hash{$seq_id}; #to save to a file in a "consolidated" FASTA format: say $fasta_out ">$seq_id"; say $fasta_out $seq_hash{$seq_id}; } exit; [download] You can trim some of the stuff inside the while loop depending on what you actually want to do. For example, if you don't need to use the hash later, there is no point making it, etc. I've tested this and it works. A sample input and corresponding output can be found here: https://gist.github.com/2928252.	[reply] [d/l]
Re^2: concatenating multiple lines without using . operator by Cristoforo (Curate) on Jun 14, 2012 at 19:40 UTC
To keep everything in 'fasta' format, you probably want to use Bio::SeqIO's write_seq(). Sample showing output writing: #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; my $in = Bio::SeqIO->new( -file => "input1.txt" , -format => 'fasta'); my $out = Bio::SeqIO->new( -file => '>test.dat', -format => 'fasta'); while ( my $seq = $in->next_seq() ) { if ($seq->id() =~ /^chr(\S)$/) { $seq->display_id($1); # change id } $out->write_seq($seq); } __END__ ** input 1 >chr1 AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGC CAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAAT TTTATCTTTAGGCGGTATGCACTTTTAACAAAAAANNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAAC CAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCNNNN >chrM GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT GTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAANAATTTCCACC >GJKKTUG01DYDGC GGGTATTCCTTCTCCACCTTGCAGCTAACATCAGTGTTTCGTCTACTCAAGCACGCCAAC ACGCCCTAGAGCGCCCTGTCCAGGGGATGGCAACCAACTCTGACCCTGCAAGTGCAGCAG ACATGAGGAATACAAACTACAATCTTTTACTTGATGATGCAATGCCGGACAAACTCTAGA >F0Z7V0F01EDB3V AAGGCGAGNGGTATCACGCAGTAAGTTACGGTTTTCGGGTAACGCGTCNGNGGNACTAAC CCACGGNGGGTAACCCGTCNCTACCGGTATAGGACTAAGGTTACCGGAACGTCGTGGGGT ACCCCCCGGACGGGGACCGTCCCCTCATANAGTCAACNGTNTGAGATGGACTAACTCAAA CCTAGTTTCAAGTACTATTTAACTTACTTACGTTACCCGTAATTTCGGCGTTTAGAGGCG [download] Output: >1 AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAACCCCAA AAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTAGGCGGTATGC ACTTTTAACAAAAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNGCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCA TACCCCGAACCAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCNNNN >M GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTT CGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTC GCAGTATCTGTCTTTGATTCCTGCCTCATTCTATTATTTATCGCACCTACGTTCAATATT ACAGGCGAACATACCTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATA ACAATTGAATGTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAANAATTTCCACC >GJKKTUG01DYDGC GGGTATTCCTTCTCCACCTTGCAGCTAACATCAGTGTTTCGTCTACTCAAGCACGCCAAC ACGCCCTAGAGCGCCCTGTCCAGGGGATGGCAACCAACTCTGACCCTGCAAGTGCAGCAG ACATGAGGAATACAAACTACAATCTTTTACTTGATGATGCAATGCCGGACAAACTCTAGA >F0Z7V0F01EDB3V AAGGCGAGNGGTATCACGCAGTAAGTTACGGTTTTCGGGTAACGCGTCNGNGGNACTAAC CCACGGNGGGTAACCCGTCNCTACCGGTATAGGACTAAGGTTACCGGAACGTCGTGGGGT ACCCCCCGGACGGGGACCGTCCCCTCATANAGTCAACNGTNTGAGATGGACTAACTCAAA CCTAGTTTCAAGTACTATTTAACTTACTTACGTTACCCGTAATTTCGGCGTTTAGAGGCG [download] Chris	[reply] [d/l] [select]
Re^3: concatenating multiple lines without using . operator by frozenwithjoy (Priest) on Jun 16, 2012 at 03:46 UTC
My impression is that s/he wanted the sequence to be on a single line, whereas `write_seq` auto-formats fasta output to columns of 60 of nucleotides/amino acids. That's why I settled with: `say $fasta_out $seq_hash{$seq_id};` You should be able to set the width with `$seq_obj->Bio::SeqIO::fasta::width($new_width)`. I'm able to set a new width and `$seq_obj->Bio::SeqIO::fasta::width()` returns this new width; however, I can't get it to actually print using the new width... it just reverts to 60. Any suggestions? -Mike edit: btw, the code I posted does keep the sequences in Fasta format.	[reply] [d/l] [select]
Re^4: concatenating multiple lines without using . operator by Cristoforo (Curate) on Jun 16, 2012 at 16:08 UTC
Re^5: concatenating multiple lines without using . operator by frozenwithjoy (Priest) on Jun 16, 2012 at 16:51 UTC