in reply to explanation of passages of a Perl script

I'm glad you haven't written this yourself because it is not in a good way. There are 2 problems with the layout which don't help with analysis:

  1. Inconsistent indenting and padding
  2. No comments or documentation whatsoever

We can fix the first by using perltidy. This gives the much more legible:

$file = shift; print "Window length\n"; $kmer = <STDIN>; chomp ($kmer); print "Minimum quality score cut-off\n"; $cut_off = <STDIN>; chomp ($cut_off); open (MYFILE, ">R.txt"); if (open (FASTQ, $file)) { while ($header1 = <FASTQ>) { $dna = <FASTQ>; $header2 = <FASTQ>; $qual = <FASTQ>; @dna = split ('', $dna); @qual = split ('', $qual); @num_value = (); @scores = (); foreach $qscore (@qual) { $num_value = ord ($qscore) - 33; push (@num_value, $num_value); } foreach $value (@num_value) { if ($value < $cut_off) { pop (@qual); } else { last; } } $sub1 = substr ($dna, -$#qual); $sub11 = substr ($qual, -$#qual); @qscopy = reverse @num_value; foreach $value (@qscopy) { if ($value < $cut_off) { pop (@qual); } else { last; } } $sub2 = substr ($sub1, 0, $#qual); $sub22 = substr ($sub11, 0, $#qual); for ($i = 0; $i <= $#qual - ($kmer - 1); $i++) { @scores = @num_value[$i .. $i + $kmer - 1]; $sum = 0; foreach $score (@scores) { $sum += $score; } if (($sum / $kmer) < $cut_off) { last; } } $sub3 = substr ($sub2, 0, ($i - 1)); $sub33 = substr ($sub22, 0, ($i - 1)); print MYFILE "$header1$sub3\n$header2$sub33\n\n"; } } else { print "Error!\n"; } close MYFILE;

See how much more structured that appears?. Now, here's what's wrong with the actual code:

  1. No strict
  2. No warnings
  3. No sanitising of arguments or inputs
  4. No declarations of variables
  5. Use of multiple variables ($sub1, $sub2, $sub3) where arrays would be better
  6. Use of C-ish rather than perlish "for" loops
  7. 2-argument form of open where 3-argument would be better
  8. Bareword instead of lexical filehandles (this might be because it's very old code but it's an easy one to update)

As to why they have written it like this, I could not say. Without comments, documentation, sample inputs and so on it is a guessing game. If there's one point in particular which you don't understand then maybe highlight that?

Replies are listed 'Best First'.
Re^2: explanation of passages of a Perl script
by uuuiii (Novice) on Feb 06, 2018 at 15:10 UTC
    Thank you for answering

    I have doubts about all the script because like you I think it is a bit confusing... I'm a new learner of Perl and in general of programming language and as exercise I'm trying to modify this script in order to make it better.

    I understand for what it has been done (it is a quality trimming for a FASTQ file) but if I were the programmer I dont' think I would have written the script like that.

    Maybe this will be a stupid question but I have serious problems to understand the variables (like $sub) that have been used, in particular when we have the feature "substr". I have to find more info in the manual and I will do that for sure... also the variables -$#qual: why I have to write them like that?

    Moreover, I noted the absence of the use strict, use warnings and probably of use diagnostics... In fact I rewrite the script using them and with use script I don't have any problem but when I use warnings and diagnostics I have an error of unitialized value in operation that I can't really resolve (if you prefer I could write you down my script but it is practically the same thing only with "my". I have also changed the way to read the FASTQ file preferring the simple way to write directly in the script the name of the FASTQ file of my interest).

    I'm really sorry to annoy all of you with these problems but I would really like to understand how to work with this script and how I could ameliorate it...

    Thank you very much
      I have serious problems to understand the variables (like $sub) that have been used, in particular when we have the feature "substr".

      substr() is a built-in function in perl. All of the built-in functions are documented and if your machine has the perl documentation installed you can simply run perldoc -f substr to read it. Alternatively, it is available online here.

      also the variables -$#qual: why I have to write them like that?

      $#qual is a special syntax, it returns the index of the last element in the array @qual. See the Arrays: A Tutorial/Reference for a discussion of this and many other aspects of using arrays in perl.

      when I use warnings and diagnostics I have an error of unitialized value in operation that I can't really resolve

      You can find the offending term and then surround it with a suitable "if" statement. Suppose $foo might sometimes be undefined, but you want to print it, you might then use:

      if (defined $foo) { print $foo; }

      or, equivalently

      print $foo if defined $foo;

      Perl is a very big language with lots to learn. I think you would benefit from reading some of the Tutorials here covering the areas that you know less well. Stick at it - you will soon be amazed at what you can achieve.

      Further to hippo's excellent comments, I would add with regard to

      $dna = <FASTQ>; ...; $qual = <FASTQ>; @dna = split ('', $dna); @qual = split ('', $qual);
      and similar from the tidied source that the code uses the same name for variables of different type. IMHO, it's not, in general, good practice to do this; eventually, you'll give yourself a terrible headache. Don't do that. In general, use informative, distinctive names for variables, subroutines, etc.


      Give a man a fish:  <%-{-{-{-<

Re^2: explanation of passages of a Perl script
by Anonymous Monk on Feb 06, 2018 at 19:45 UTC
    Lets see you rewrite that cish foreach loop.... Yeah thought so
      for my $i (0 .. @qual - $kmer) {

      Q.E.D.