de2425 has asked for the wisdom of the Perl Monks concerning the following question:

I'm having some difficulty with the print function. I am setting up a spreadsheet of different companies and the products that they carry which matches with another company. For instance, if company A has the same product as the company that I'm comparing it to, the column would have a Y printed in it. If company A doesn't have this product, a N would be printed. With doing the actual comparison, I am having no difficulty. However, I would then like to print a summary count of the respective "y"'s and "N"'s. I have established a counter and the number is coming out correct but I cannot seem to get it to print where I want it to print without having numerous duplications of the statement. If I print within the while loop, I end up with several hundred lines from the counter all saying the same thing. If I print at the end of all of my while loops, I end up with the line printed on line #921, which is far from where I need it. Basically what I'm asking is if there's a way to place the print function in the while loop and get it to print only one time?

!#/usr/bin/perl -w open (IN, "C:/Work/Cytokine/ING_cytokines_20080805.txt"); while (<IN>){ #start while loop chomp; @t=split(/\t/,$_); #splits file and stores values in $_ $ING{$t[9]}=1; #selects one column from input file } # end while loop close IN; open (OUT, ">C:/work/Cytokine/Cytokine.txt") or die "cannot open"; open (IN, "C:/work/Cytokine/CytokineArrays.txt") or die "cannot open"; while(<IN>){ #start of while loop chomp; @cytokine=split(/\t/,$_); #splits input file and stores + in $_ #begin if statements if($cytokine[1]=~/\S+/ and exists $ING{$cytokine[1]}){ #grou +p 1 - compares second row of input file to $ING print OUT "$cytokine[0]\t$cytokine[1]\tY\t"; $SABioY++;} elsif ($cytokine[1] =~ /\d+/) { print OUT "$cytokine[0]\t$cytokine[1]\tN\t"; $SABioN++;} else { print OUT "$cytokine[0]\t$cytokine[1]\t\t"; } if ($SABioY ge 0) {print OUT "SA Biosciences has $SABioY prod +ucts."}; if ($SABioN ge 0) {print OUT "SA Biosciences does not have $SABi +oN products."}; if($cytokine[3]=~/\S+/ and exists $ING{$cytokine[3]}){ #grou +p 2 print OUT "$cytokine[2]\t$cytokine[3]\tY\t";} elsif ($cytokine[3] =~ /\d+/) { print OUT "$cytokine[2]\t$cytokine[3]\tN\t"; }else { print OUT "$cytokine[2]\t$cytokine[3]\t\t"; } if($cytokine[5]=~/\S+/ and exists $ING{$cytokine[5]}){ #grou +p 3 print OUT "$cytokine[4]\t$cytokine[5]\tY\t";} elsif ($cytokine[5] =~ /\d+/) { print OUT "$cytokine[4]\t$cytokine[5]\tN\t"; }else { print OUT "$cytokine[4]\t$cytokine[5]\t\t"; } if($cytokine[7]=~/\S+/ and exists $ING{$cytokine[7]}){ #grou +p 4 print OUT "$cytokine[6]\t$cytokine[7]\tY\t";} elsif ($cytokine[7] =~ /\d+/) { print OUT "$cytokine[6]\t$cytokine[7]\tN\t"; }else { print OUT "$cytokine[6]\t$cytokine[7]\t\t"; } if($cytokine[9]=~/\S+/ and exists $ING{$cytokine[9]}){ #grou +p 5 print OUT "$cytokine[8]\t$cytokine[9]\tY\t";} elsif ($cytokine[9] =~ /\d+/) { print OUT "$cytokine[8]\t$cytokine[9]\tN\t"; }else { print OUT "$cytokine[8]\t$cytokine[9]\t\t"; } if($cytokine[11]=~/\S+/ and exists $ING{$cytokine[11]}){ #gr +oup 6 print OUT "$cytokine[10]\t$cytokine[11]\tY\t";} elsif ($cytokine[11] =~ /\d+/) { print OUT "$cytokine[10]\t$cytokine[11]\tN\t"; }else { print OUT "$cytokine[10]\t$cytokine[11]\t\t"; } if($cytokine[13]=~/\S+/ and exists $ING{$cytokine[13]}){ #gr +oup 7 print OUT "$cytokine[12]\t$cytokine[13]\tY\t";} elsif ($cytokine[13] =~ /\d+/) { print OUT "$cytokine[12]\t$cytokine[13]\tN\t"; }else { print OUT "$cytokine[12]\t$cytokine[13]\t\t"; } if($cytokine[15]=~/\S+/ and exists $ING{$cytokine[15]}){ #gr +oup 8 print OUT "$cytokine[14]\t$cytokine[15]\tY\t";} elsif ($cytokine[15] =~ /\d+/) { print OUT "$cytokine[14]\t$cytokine[15]\tN\t"; }else { print OUT "$cytokine[14]\t$cytokine[15]\t\t"; } if($cytokine[17]=~/\S+/ and exists $ING{$cytokine[17]}){ #gr +oup 9 print OUT "$cytokine[16]\t$cytokine[17]\tY\t";} elsif ($cytokine[17] =~ /\d+/) { print OUT "$cytokine[16]\t$cytokine[17]\tN\t"; }else { print OUT "$cytokine[16]\t$cytokine[17]\t\t"; } if($cytokine[19]=~/\S+/ and exists $ING{$cytokine[19]}){ #g +roup 10 print OUT "$cytokine[16]\t$cytokine[19]\tY\t";} elsif ($cytokine[19] =~ /\d+/) { print OUT "$cytokine[18]\t$cytokine[19]\tN\t"; }else { print OUT "$cytokine[18]\t$cytokine[19]\t\t"; } if($cytokine[21]=~/\S+/ and exists $ING{$cytokine[21]}){ #gr +oup 11 print OUT "$cytokine[20]\t$cytokine[21]\tY\t";} elsif ($cytokine[21] =~ /\d+/) { print OUT "$cytokine[20]\t$cytokine[21]\tN\t"; }else { print OUT "$cytokine[20]\t$cytokine[21]\t\t"; } if($cytokine[23]=~/\S+/ and exists $ING{$cytokine[23]}){ #gr +oup 12 (\n used in the final group for formatting) print OUT "$cytokine[22]\t$cytokine[23]\tY\t\n";} elsif ($cytokine[23] =~ /\d+/) { print OUT "$cytokine[22]\t$cytokine[23]\tN\t\n"; }else { print OUT "$cytokine[22]\t$cytokine[23]\t\t\n"; } } # end of while loop close IN; close OUT;

Replies are listed 'Best First'.
Re: Print function
by Jenda (Abbot) on Sep 02, 2008 at 15:58 UTC

    Looks to me like all the sections except the first one look the same, right? Why do you repeat the code then? Use a loop:

    for my $id (1..11) { if($cytokine[2*$id+1]=~/\S+/ and exists $ING{$cytokine[2*$id+1]}) { print OUT "$cytokine[2*$id]\t$cytokine[2*$id+1]\tY\t"; } elsif ($cytokine[2*$id+1] =~ /\d+/) { print OUT "$cytokine[2*$id]\t$cytokine[2*$id+1]\tN\t"; } else { print OUT "$cytokine[2*$id]\t$cytokine[2*$id+1]\t\t"; } }
    Also please do be consistent with the code formatting.

    This aside ... I do not understand when do you want to print the counts. Line #921 doesn't mean anything to me. Do you want to pring it after you process all the lines from CytokineArrays.txt? Or earlier? And if so, when?

      Thank you for responding. I very much appreciate it as I'm a complete novice at this. I am sorry if I'm not more clear. When I was referring to line #921, what I was trying to say is that there are approximately 900 lines of blank space between where my output ends and the totals are printed when I place the total print statement outside the initial while loop. If I place the print statement for the count inside the while statement, it prints the statements approximately 900 times. I cannot figure out why it is doing this. I would prefer to print totals after each different company is printed.

        Are those lines empty or full of tab characters? Maybe the CytokineArrays.txt file ends by some 900 empty lines. Try to add

        next unless /\S/;
        after the chomp; in the second loop.

        Also, there is no difference between $var =~ /\S+/ and $var =~ /\S/. Except that the later will probably be quicker. Both return true whenever there is at least one non-whitespace-character anywhere in the $var. Same with the \d. Maybe you wanted $var =~ /^\S+$/. Which means ... make sure the $var contains only non-whitespace-characters and is not empty.

        Update: fixed the typo noticed by jwkrahn. I meant \S+ and wrote \s+.