sugar has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, This query is also with reference to the post "printing the length of string variables". the program works for single line string, but not for multiple lines. However, there is a chomp for the strings which doesnt seem to work i think. Please guide.
printing the length of string variables use strict; use warnings; my ($head); while(<DATA>){ my ($x, $str, $len, $s); $s = $_; if($s=~m/^>/){ chomp($s); ($head)=(split(/ /,$s))[0]; } else{ chomp($s); $len=length($s); } print "$head length=$len\n$s\n" if($s !~m/^>/); } __DATA__ >IDnumber1 length=350 AGCTG AAGTCGCT >IDnumber2 length=350 AGAACGT ACC >IDnumber3 length=350 AGC ACTTCGCTAACT Expected output: ----------------- >IDnumber1 length=13 AGCTGAAGTCGCT >IDnumber2 length=10 AGAACGTACC >IDnumber3 length=15 AGCACTTCGCTAACT

Replies are listed 'Best First'.
Re: chopping new line while counting length
by ikegami (Patriarch) on Jan 15, 2009 at 10:00 UTC

    [ Link to other thread for reference: printing the length of string variables ]

    The following uses a single-line lookahead, and it happens to only keep one sequence in memory at a time.

    use strict; use warnings; my $fh = \*DATA; my $line = <$fh>; while (defined($line)) { my $head = $line; $line = <$fh>; die("Premature EOF\n") if !defined($line); chomp( my $seq = $line ); for (;;) { $line = <$fh>; last if !defined($line) || $line =~ /^>/; chomp( $seq .= $line ); } my $len = length($seq); $head =~ s/(length=)\d+/$1$len/; print("$head$seq\n"); } __DATA__ >IDnumber1 length=350 AGCTG AAGTCGCT >IDnumber2 length=350 AGAACGT ACC >IDnumber3 length=350 AGC ACTTCGCTAACT
Re: chopping new line while counting length
by johngg (Canon) on Jan 15, 2009 at 10:21 UTC

    A few points about your code spring to mind.

    • Inside the while loop you have if and else code branches. The first thing you do in each is chomp the string. It would make more sense to do the chomp before the if.

    • You assign $_ to $s. If you are doing this for readability then choose a variable name that is more meaningful, otherwise just operate on $_.

    • You are having difficulty with multiple-line data because you operate on each data line individually rather than accumulating the data lines then processing them all once you reach the next header.

    • What do you do to process the last data item when you reach end of file?

    I think this code will do what you want. I have moved the process of getting the data length and printing the item into a subroutine.

    use strict; use warnings; my $header = q{}; my $dataAccumulator; while( <DATA> ) { chomp; if( m{^>} ) { printDataItem() if $header; ( $header ) = m{^.(\S+)}; $dataAccumulator = q{}; } else { $dataAccumulator .= $_; } } printDataItem(); sub printDataItem { print qq{>$header length=}, length $dataAccumulator, qq{\n$dataAccumulator\n}; } __DATA__ >IDnumber1 length=350 AGCTG AAGTCGCT >IDnumber2 length=350 AGAACGT ACC >IDnumber3 length=350 AGC ACTTCGCTAACT

    The output.

    >IDnumber1 length=13 AGCTGAAGTCGCT >IDnumber2 length=10 AGAACGTACC >IDnumber3 length=15 AGCACTTCGCTAACT

    I hope this is of use.

    Cheers,

    JohnGG

    Update: Corrected typo.

Re: chopping new line while counting length
by andye (Curate) on Jan 15, 2009 at 11:51 UTC
    An alternative way to do it:

    #!/usr/bin/perl -w undef $/; use strict; $_ = <DATA>; while (m/\G(>IDnumber\d+ length=)350\n([^\n]+)\n([^\n]+)\n/g) { print $1.length($2.$3)."\n$2$3\n"; } __DATA__ >IDnumber1 length=350 AGCTG AAGTCGCT >IDnumber2 length=350 AGAACGT ACC >IDnumber3 length=350 AGC ACTTCGCTAACT

    Hope that's helpful to you.

    Best wishes, andye