anonym has asked for the wisdom of the Perl Monks concerning the following question:

Hi all, Is there a good way to concatenate multiple lines into a single line but without using concatenation operator(.) on the $_ variable. Thanks.

  • Comment on concatenating multiple lines without using . operator

Replies are listed 'Best First'.
Re: concatenating multiple lines without using . operator
by Corion (Patriarch) on Jun 13, 2012 at 13:24 UTC

    No. That's why Perl has the concatenation operator.

    Maybe you can explain to us the situation you have, then we can come up with interesting or applicable solutions to achieve your intended goal.

    For example a combination of join and s/\r\n//g could work.

      Thanks.Yep, I tried $_ =~ s/\r\n//g; $seq{$chr} = join ("", $_); already but it does nt concatenate as expected .Below is my code:

      while (<IN>) { chomp; if (/^>chr(\S*)$/) { $chr = $1; #print STDERR "[$chr]\n"; } else { chomp $_; $_ =~ s/\s\r\n\t//g; $seq{$chr} = join ("",$_); #$seq{$chr} = `perl -pe 'chomp; END {print "\n" }' $file`; #$seq{$chr} .= $_; print "$seq{$chr}\n"; } #print OUT "$seq{$chr}\n"; }

      Thanks

        Why didn't you show this code when you asked your initial question?

        Please also explain what this regular expression in your code is supposed to do:

        $_ =~ s/\s\r\n\t//g;

        See perlre and YAPE::Regex::Explain.

        Q:\>perl -MYAPE::Regex::Explain -we "print for YAPE::Regex::Explain->n +ew(shift)->explain;" "\s\r\n\t" The regular expression: (?-imsx:\s\r\n\t) matches as follows: NODE EXPLANATION ---------------------------------------------------------------------- (?-imsx: group, but do not capture (case-sensitive) (with ^ and $ matching normally) (with . not matching \n) (matching whitespace and # normally): ---------------------------------------------------------------------- \s whitespace (\n, \r, \t, \f, and " ") ---------------------------------------------------------------------- \r '\r' (carriage return) ---------------------------------------------------------------------- \n '\n' (newline) ---------------------------------------------------------------------- \t '\t' (tab) ---------------------------------------------------------------------- ) end of grouping ----------------------------------------------------------------------
        $seq{$chr} = join ("",$_);

        The join Perl built-in joins a list of strings. In the statement above, the list consists in the single string contained in the  $_ scalar, so the output of  join will be exactly the same as the input.

        It seems that what you are looking for is the "s" modifier for regex. Try:
        $_ =~ s/[\s\r\n\t]//sg;
        Also note - since the \s \r .. etc are alternative characters, and NOT a sequence, I have placed them in [brackets].

        From the docs:

        s     Treat string as single line. That is, change "." to match any character whatsoever, even a newline, which normally it would not match.
        Update: Ignore the suggestion to use the "s" modifier. It is not necessary. See jwkrahn note below.

                     I hope life isn't a big joke, because I don't get it.
                           -SNL

Re: concatenating multiple lines without using . operator
by marto (Cardinal) on Jun 13, 2012 at 13:27 UTC

    The "good way" is to use the concatenation operator which you want to avoid for some reason (this sounds like one of those interview questions). join could be used:

    #!/usr/bin/perl use strict; use warnings; my $foo = "123"; my $bar = "456"; $foo = join "", $foo, $bar; print "$foo\n";
Re: concatenating multiple lines without using . operator
by solegaonkar (Beadle) on Jun 13, 2012 at 13:29 UTC
    If you have to, you can use something like $concatenated = "$line1$line2" ... But, as Cornion said, knowing why you want to do this might be helpful in solving the problem...
    A reply falls below the community's threshold of quality. You may see it by logging in.
Re: concatenating multiple lines without using . operator
by frozenwithjoy (Priest) on Jun 14, 2012 at 06:23 UTC
    I think the safer approach to accomplishing this is to use something like this:
    #!/usr/bin/env perl use strict; use warnings; use Bio::SeqIO; use v5.10; #or later... or change 'say' to 'print' X_x my $fasta_in = "input.fa"; open my $fasta_out, ">", "output.fa"; my $seqio_in = Bio::SeqIO->new( -file => $fasta_in, -format => 'Fasta', ); my ( $seq_obj, %seq_hash ); while ( my $seq_obj = $seqio_in->next_seq() ) { my $seq_id = $seq_obj->display_id(); #this is the sequence ID my $seq = $seq_obj->seq(); #this is the actual sequen +ce $seq_hash{$seq_id} = $seq; #and hashed! #to print them to your screen in a "consolidated" FASTA format: say ">$seq_id"; say $seq_hash{$seq_id}; #to save to a file in a "consolidated" FASTA format: say $fasta_out ">$seq_id"; say $fasta_out $seq_hash{$seq_id}; } exit;

    You can trim some of the stuff inside the while loop depending on what you actually want to do. For example, if you don't need to use the hash later, there is no point making it, etc.

    I've tested this and it works. A sample input and corresponding output can be found here: https://gist.github.com/2928252.

      To keep everything in 'fasta' format, you probably want to use Bio::SeqIO's write_seq().

      Sample showing output writing:

      #!/usr/bin/perl use strict; use warnings; use Bio::SeqIO; my $in = Bio::SeqIO->new( -file => "input1.txt" , -format => 'fasta'); my $out = Bio::SeqIO->new( -file => '>test.dat', -format => 'fasta'); while ( my $seq = $in->next_seq() ) { if ($seq->id() =~ /^chr(\S*)$/) { $seq->display_id($1); # change id } $out->write_seq($seq); } __END__ *** input 1 >chr1 AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGC CAAACCCCAAAAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAAT TTTATCTTTAGGCGGTATGCACTTTTAACAAAAAANNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN GCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCATACCCCGAAC CAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCNNNN >chrM GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCAT TTGGTATTTTCGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTG GAGCCGGAGCACCCTATGTCGCAGTATCTGTCTTTGATTCCTGCCTCATT CTATTATTTATCGCACCTACGTTCAATATTACAGGCGAACATACCTACTA AAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATAACAATTGAAT GTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAANAATTTCCACC >GJKKTUG01DYDGC GGGTATTCCTTCTCCACCTTGCAGCTAACATCAGTGTTTCGTCTACTCAAGCACGCCAAC ACGCCCTAGAGCGCCCTGTCCAGGGGATGGCAACCAACTCTGACCCTGCAAGTGCAGCAG ACATGAGGAATACAAACTACAATCTTTTACTTGATGATGCAATGCCGGACAAACTCTAGA >F0Z7V0F01EDB3V AAGGCGAGNGGTATCACGCAGTAAGTTACGGTTTTCGGGTAACGCGTCNGNGGNACTAAC CCACGGNGGGTAACCCGTCNCTACCGGTATAGGACTAAGGTTACCGGAACGTCGTGGGGT ACCCCCCGGACGGGGACCGTCCCCTCATANAGTCAACNGTNTGAGATGGACTAACTCAAA CCTAGTTTCAAGTACTATTTAACTTACTTACGTTACCCGTAATTTCGGCGTTTAGAGGCG
      Output:
      >1 AACCCCCCCCTCCCCCCGCTTCTGGCCACAGCACTTAAACACATCTCTGCCAAACCCCAA AAACAAAGAACCCTAACACCAGCCTAACCAGATTTCAAATTTTATCTTTAGGCGGTATGC ACTTTTAACAAAAAANNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNNN NNNNNNNNNNNNNNNNNNNNGCCCATCCTACCCAGCACACACACACCGCTGCTAACCCCA TACCCCGAACCAACCAAACCCCAAAGACACCCCCCACAGTTTATGTAGCTTACCTCNNNN >M GATCACAGGTCTATCACCCTATTAACCACTCACGGGAGCTCTCCATGCATTTGGTATTTT CGTCTGGGGGGTGTGCACGCGATAGCATTGCGAGACGCTGGAGCCGGAGCACCCTATGTC GCAGTATCTGTCTTTGATTCCTGCCTCATTCTATTATTTATCGCACCTACGTTCAATATT ACAGGCGAACATACCTACTAAAGTGTGTTAATTAATTAATGCTTGTAGGACATAATAATA ACAATTGAATGTCTGCACAGCCGCTTTCCACACAGACATCATAACAAAANAATTTCCACC >GJKKTUG01DYDGC GGGTATTCCTTCTCCACCTTGCAGCTAACATCAGTGTTTCGTCTACTCAAGCACGCCAAC ACGCCCTAGAGCGCCCTGTCCAGGGGATGGCAACCAACTCTGACCCTGCAAGTGCAGCAG ACATGAGGAATACAAACTACAATCTTTTACTTGATGATGCAATGCCGGACAAACTCTAGA >F0Z7V0F01EDB3V AAGGCGAGNGGTATCACGCAGTAAGTTACGGTTTTCGGGTAACGCGTCNGNGGNACTAAC CCACGGNGGGTAACCCGTCNCTACCGGTATAGGACTAAGGTTACCGGAACGTCGTGGGGT ACCCCCCGGACGGGGACCGTCCCCTCATANAGTCAACNGTNTGAGATGGACTAACTCAAA CCTAGTTTCAAGTACTATTTAACTTACTTACGTTACCCGTAATTTCGGCGTTTAGAGGCG

      Chris

        My impression is that s/he wanted the sequence to be on a single line, whereas write_seq auto-formats fasta output to columns of 60 of nucleotides/amino acids. That's why I settled with:

        say $fasta_out $seq_hash{$seq_id};

        You should be able to set the width with $seq_obj->Bio::SeqIO::fasta::width($new_width). I'm able to set a new width and $seq_obj->Bio::SeqIO::fasta::width() returns this new width; however, I can't get it to actually print using the new width... it just reverts to 60. Any suggestions?

        -Mike

        edit: btw, the code I posted does keep the sequences in Fasta format.