AWallBuilder has asked for the wisdom of the Perl Monks concerning the following question:

Hi all,

I am trying to write a perl script that loops through the files in a directory. First I want to find all the files that have 'MG1' in the name, then I want to count the lines in three different files (based on the filename). I then want to loop through this until MG72. And output the data as a table.

I think I have a few errors. I'm not sure about the pattern matching for the file name, and I don't think I'm calling wc-l properly.

Any advice? thanks

#!/usr/bin/perl -w my $Num_Bact_Virus_Chimera=0; my $Num_Bact_Bact_Chimera=0; my $Num_Virus_Virus_Chimera=0; my $outfile='MC38_ChimeraTable.txt'; my $MG_Num=1; open(OUT, ">$outfile") || die "Can't open outputfile $!\n"; print OUT join("\t",qw (MG_Num Num_Bact_Virus_Chimeras Num_Bact_Bact_C +himera Num_Virus_Virus_Chimera), "\n") ; while ($MG_Num <= 72) { @files=</outputMC38/*.MG$MG_Num.*>; foreach $file (@files) { if ($file=qr/HybridViralBactContigsList.txt/){ $Num_Bact_Virus_Chimera = qx/wc -l $file/; $Num_Bact_Virus_Chimera=$Num_Bact_Virus_Chimera-1; } if ($file=qr/HybridOnlyBactContigsList.txt/){ $Num_Bact_Bact_Chimera= qx/wc -l $file/; $Num_Bact_Bact_Chimera=$Num_Bact_Bact_Chimera-1; } if ($file=qr/HybridOnlyVirusContigsList.txt/){ $Num_Virus_Virus_Chimera=qx/wc -l $file/; $Num_Virus_Virus_Chimera=$Num_Virus_Virus_Chimera-1; } print OUT join("\t",$MG_Num,$Num_Bact_Virus_Chimera,$Num_Bact_Bact_Chimera,$Num_ +Virus_Virus_Chimera,"\n"); } } $MG_Num=$MG_Num+1;

Replies are listed 'Best First'.
Re: Help with pattern matching and calling wc -l
by roboticus (Chancellor) on Mar 10, 2010 at 12:58 UTC

    AWallBuilder:

    You should show some sample input, the actual output and expected output to make it simpler for us to see the error. As it is, we need to either:

    • Be lucky and spot the error immediately.
    • Write the code to a file and execute it, making up data as we go along and guessing about the output you want, or
    • Give up and go to the next question.

    Also, if you indent your code to show the structure, it makes things a bit easier for us to see how it works. It should also help you when you're creating/modifying/debugging it. It even makes some errors pretty obvious, like so:

    #!/usr/bin/perl -w my $Num_Bact_Virus_Chimera=0; my $Num_Bact_Bact_Chimera=0; my $Num_Virus_Virus_Chimera=0; my $outfile='MC38_ChimeraTable.txt'; my $MG_Num=1; open(OUT, ">$outfile") || die "Can't open outputfile $!\n"; print OUT join("\t",qw ( MG_Num Num_Bact_Virus_Chimeras Num_Bact_Bact_Chimera Num_Virus_Virus_Chimera ), "\n") ; while ($MG_Num <= 72) { @files=</outputMC38/*.MG$MG_Num.*>; foreach $file (@files) { if ($file=qr/HybridViralBactContigsList.txt/){ $Num_Bact_Virus_Chimera = qx/wc -l $file/; $Num_Bact_Virus_Chimera=$Num_Bact_Virus_Chimera-1; } if ($file=qr/HybridOnlyBactContigsList.txt/){ $Num_Bact_Bact_Chimera= qx/wc -l $file/; $Num_Bact_Bact_Chimera=$Num_Bact_Bact_Chimera-1; } if ($file=qr/HybridOnlyVirusContigsList.txt/){ $Num_Virus_Virus_Chimera=qx/wc -l $file/; $Num_Virus_Virus_Chimera=$Num_Virus_Virus_Chimera-1; } print OUT join("\t", $MG_Num,$Num_Bact_Virus_Chimera, $Num_Bact_Bact_Chimera, $Num_Virus_Virus_Chimera, "\n"); } } $MG_Num=$MG_Num+1;

    Now that the code has some visual structure, one error immediately sticks out: Your while loop is checking when $MG_Num reaches 72, but you don't modify $MG_Num inside your loop. You only change it after the end of your while loop. I'm sure that's only one of your errors, but as you don't have sample data, expected and actual output and the error message, I'll just stop here.... ;^)

    ...roboticus

      Thanks everyone, Okay, as cranky as roboticus sounded the indenting DID help, and so did the wc -l < $file. The script is working fine now.
      #!/usr/bin/perl -w my $Num_Bact_Virus_Chimera=0; my $Num_Bact_Bact_Chimera=0; my $Num_Virus_Virus_Chimera=0; my $outfile='MC38_ChimeraTable.txt'; my $MG_Num=1; open(OUT, ">$outfile") || die "Can't open outputfile $!\n"; print OUT join("\t",qw (MG_Num Num_Bact_Virus_Chimeras Num_Bact_Bact_Chimera Num_Virus_Virus_Chimera), "\n") ; while ($MG_Num <= 72) { @files=</g/bork6/mende/MGSimulation/hybridcontigs/outputMC38/*.MG$MG +_Num.*>; foreach $file (@files) { if ($file=~/HybridViralBactContigsList/){ $Num_Bact_Virus_Chimera=qx/wc -l < $file/; $Num_Bact_Virus_Chimera=$Num_Bact_Virus_Chimera-1; } if ($file=~/HybridOnlyBactContigsList/){ $Num_Bact_Bact_Chimera=qx/wc -l < $file/; $Num_Bact_Bact_Chimera=$Num_Bact_Bact_Chimera-1; } if ($file=~/HybridOnlyVirusContigsList/){ $Num_Virus_Virus_Chimera=qx/wc -l < $file/; $Num_Virus_Virus_Chimera=$Num_Virus_Virus_Chimera-1; } } print OUT join("\t",$MG_Num,$Num_Bact_Virus_Chimera,$Num_Bact_Bact_Chimera,$ +Num_Virus_Virus_Chimera,"\n"); $MG_Num=$MG_Num+1; }

      Output file

      MG_Num Num_Bact_Virus_Chimeras Num_Bact_Bact_Chimera Num_Virus_Viru +s_Chimera 1 0 3502 0 2 0 3356 4 3 0 3363 17 4 1 3374 41 5 2 3499 93 6 0 3498 89 7 0 4005 0 8 1 4415 4 9 0 3986 17 10 4 4382 41 11 3 4381 80 12 1 4415 80 13 0 1 0 14 0 0 4 15 0 0 16 16 0 0 44 17 0 1 86 18 0 1 87 19 0 3133 0 20 0 3301 3 21 0 3120 9

        Ah, well, I didn't intend to sound cranky. But I'm glad that it was some help to you.

        ...roboticus

Re: Help with pattern matching and calling wc -l
by Corion (Patriarch) on Mar 10, 2010 at 12:25 UTC

    Why do you think that you have a few errors?

    What is the following line supposed to do?

    if ($file=qr/HybridViralBactContigsList.txt/){

    (hint: You might want to read perlop for the difference between = and =~, and perlre for the meaning of . and anchors, and File::Basename for the general approach)

      Thanks, I guess I should use =~ I also removed the . What I want the line of script to do: when I find a file with the string'HybridViralBact' I wanted to count the number of lines in that file. I think my main error now is with the wc -l syntax. Ther error is.

      Argument "3503 /outputMC38..." isn't numeric in subtraction (-) at Perl_Chimera_File_Summ.pl line 30.

      #!/usr/bin/perl -w my $Num_Bact_Virus_Chimera=0; my $Num_Bact_Bact_Chimera=0; my $Num_Virus_Virus_Chimera=0; my $outfile='MC38_ChimeraTable.txt'; my $MG_Num=1; open(OUT, ">$outfile") || die "Can't open outputfile $!\n"; print OUT join("\t",qw (MG_Num Num_Bact_Virus_Chimeras Num_Bact_Bact_C +himera Num_Virus_Virus_Chimera), "\n") ; while ($MG_Num <= 72) { @files=</g/bork6/mende/MGSimulation/hybridcontigs/outputMC38/*.MG$MG_N +um.*>; foreach $file (@files) { if ($file=~/HybridViralBactContigsList/){ $Num_Bact_Virus_Chimera=qx/wc -l $file/; $Num_Bact_Virus_Chimera=$Num_Bact_Virus_Chimera-1; } if ($file=~/HybridOnlyBactContigsList/){ $Num_Bact_Bact_Chimera=qx/wc -l $file/; $Num_Bact_Bact_Chimera=$Num_Bact_Bact_Chimera-1; } if ($file=~/HybridOnlyVirusContigsList/){ $Num_Virus_Virus_Chimera=qx/wc -l $file/; $Num_Virus_Virus_Chimera=$Num_Virus_Virus_Chimera-1; } print OUT join("\t",$MG_Num,$Num_Bact_Virus_Chimera,$Num_Bact_Bact_Chimera,$Num_ +Virus_Virus_Chimera,"\n"); } } $MG_Num=$MG_Num+1;
        $Num_Bact_Virus_Chimera=$Num_Bact_Virus_Chimera-1;

        Maybe you should check that $Num_Bact_Virus_Chimera contains what you think it does. See print:

        print "Got '$Num_Bact_Virus_Chimera' lines.
Re: Help with pattern matching and calling wc -l
by cdarke (Prior) on Mar 10, 2010 at 12:51 UTC
    For the wc command, wc -l returns the number of lines in the file, but also the file name. A sneaky circimvention is:
    qx/wc -l < $file/;
    which means it reads the file from stdin. Downside is that it uses a shell process for the redirection, so it is slow if you have a large number of files. The alternative is to open the file yourself and count the lines. Alternatively split off the filename from the wc output before using it as a numeric.