explanation of passages of a Perl script

uuuiii has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: explanation of passages of a Perl script by hippo (Archbishop) on Feb 06, 2018 at 14:24 UTC
I'm glad you haven't written this yourself because it is not in a good way. There are 2 problems with the layout which don't help with analysis: Inconsistent indenting and padding No comments or documentation whatsoever We can fix the first by using perltidy. This gives the much more legible: $file = shift; print "Window length\n"; $kmer = <STDIN>; chomp ($kmer); print "Minimum quality score cut-off\n"; $cut_off = <STDIN>; chomp ($cut_off); open (MYFILE, ">R.txt"); if (open (FASTQ, $file)) { while ($header1 = <FASTQ>) { $dna = <FASTQ>; $header2 = <FASTQ>; $qual = <FASTQ>; @dna = split ('', $dna); @qual = split ('', $qual); @num_value = (); @scores = (); foreach $qscore (@qual) { $num_value = ord ($qscore) - 33; push (@num_value, $num_value); } foreach $value (@num_value) { if ($value < $cut_off) { pop (@qual); } else { last; } } $sub1 = substr ($dna, -$#qual); $sub11 = substr ($qual, -$#qual); @qscopy = reverse @num_value; foreach $value (@qscopy) { if ($value < $cut_off) { pop (@qual); } else { last; } } $sub2 = substr ($sub1, 0, $#qual); $sub22 = substr ($sub11, 0, $#qual); for ($i = 0; $i <= $#qual - ($kmer - 1); $i++) { @scores = @num_value[$i .. $i + $kmer - 1]; $sum = 0; foreach $score (@scores) { $sum += $score; } if (($sum / $kmer) < $cut_off) { last; } } $sub3 = substr ($sub2, 0, ($i - 1)); $sub33 = substr ($sub22, 0, ($i - 1)); print MYFILE "$header1$sub3\n$header2$sub33\n\n"; } } else { print "Error!\n"; } close MYFILE; [download] See how much more structured that appears?. Now, here's what's wrong with the actual code: No strict No warnings No sanitising of arguments or inputs No declarations of variables Use of multiple variables (`$sub1, $sub2, $sub3`) where arrays would be better Use of C-ish rather than perlish "for" loops 2-argument form of open where 3-argument would be better Bareword instead of lexical filehandles (this might be because it's very old code but it's an easy one to update) As to why they have written it like this, I could not say. Without comments, documentation, sample inputs and so on it is a guessing game. If there's one point in particular which you don't understand then maybe highlight that?	[reply] [d/l] [select]
Re^2: explanation of passages of a Perl script by uuuiii (Novice) on Feb 06, 2018 at 15:10 UTC
Thank you for answering I have doubts about all the script because like you I think it is a bit confusing... I'm a new learner of Perl and in general of programming language and as exercise I'm trying to modify this script in order to make it better. I understand for what it has been done (it is a quality trimming for a FASTQ file) but if I were the programmer I dont' think I would have written the script like that. Maybe this will be a stupid question but I have serious problems to understand the variables (like $sub) that have been used, in particular when we have the feature "substr". I have to find more info in the manual and I will do that for sure... also the variables -$#qual: why I have to write them like that? Moreover, I noted the absence of the use strict, use warnings and probably of use diagnostics... In fact I rewrite the script using them and with use script I don't have any problem but when I use warnings and diagnostics I have an error of unitialized value in operation that I can't really resolve (if you prefer I could write you down my script but it is practically the same thing only with "my". I have also changed the way to read the FASTQ file preferring the simple way to write directly in the script the name of the FASTQ file of my interest). I'm really sorry to annoy all of you with these problems but I would really like to understand how to work with this script and how I could ameliorate it... Thank you very much	[reply]
Re^3: explanation of passages of a Perl script by hippo (Archbishop) on Feb 06, 2018 at 15:42 UTC
I have serious problems to understand the variables (like $sub) that have been used, in particular when we have the feature "substr". substr() is a built-in function in perl. All of the built-in functions are documented and if your machine has the perl documentation installed you can simply run `perldoc -f substr` to read it. Alternatively, it is available online here. also the variables -$#qual: why I have to write them like that? `$#qual` is a special syntax, it returns the index of the last element in the array `@qual`. See the Arrays: A Tutorial/Reference for a discussion of this and many other aspects of using arrays in perl. when I use warnings and diagnostics I have an error of unitialized value in operation that I can't really resolve You can find the offending term and then surround it with a suitable "if" statement. Suppose $foo might sometimes be undefined, but you want to print it, you might then use: `if (defined $foo) { print $foo; }` [download] or, equivalently `print $foo if defined $foo;` [download] Perl is a very big language with lots to learn. I think you would benefit from reading some of the Tutorials here covering the areas that you know less well. Stick at it - you will soon be amazed at what you can achieve.	[reply] [d/l] [select]
Re^3: explanation of passages of a Perl script by AnomalousMonk (Archbishop) on Feb 06, 2018 at 16:32 UTC
Further to hippo's excellent comments, I would add with regard to `$dna = <FASTQ>; ...; $qual = <FASTQ>; @dna = split ('', $dna); @qual = split ('', $qual);` [download] and similar from the tidied source that the code uses the same name for variables of different type. IMHO, it's not, in general, good practice to do this; eventually, you'll give yourself a terrible headache. Don't do that. In general, use informative, distinctive names for variables, subroutines, etc. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^2: explanation of passages of a Perl script by Anonymous Monk on Feb 06, 2018 at 19:45 UTC
Lets see you rewrite that cish foreach loop.... Yeah thought so	[reply]
Re^3: explanation of passages of a Perl script by hippo (Archbishop) on Feb 06, 2018 at 19:57 UTC
`for my $i (0 .. @qual - $kmer) {` [download] Q.E.D.	[reply] [d/l]
Re: explanation of passages of a Perl script by Eily (Monsignor) on Feb 06, 2018 at 16:20 UTC
I'd really like to understand this script That's probably a bad idea. Either you just want to understand how it works, and you've already seen by yourself that it's probably not the good way to do it, so you shouldn't learn from it ; or you want to use it as a base for custom modifications, and I'm pretty sure that writing from scratch will be easier (there's only 78 lines of code, it's like a lot of work went into it). The thread that poj referenced contains a link to the Bio::SeqIO::fastq module, which might be the best starting point for you if you want to reuse something that works. Or if you want to rewrite the whole thing (that's one way to learn), it will be easier for us to help you on specific blocking points rather than this quite obscure script.	[reply]
Re: explanation of passages of a Perl script by poj (Abbot) on Feb 06, 2018 at 15:48 UTC
This looks very similar to this recent post Quality trimming of fastq file. Why ? Update : crossposted here https://www.biostars.org/p/297182/ poj	[reply]
Re^2: explanation of passages of a Perl script by uuuiii (Novice) on Feb 06, 2018 at 15:54 UTC
don't know, I found the script online. I will read this post, maybe I found something useful. Thanks Update: yes, I wrote also there in order to have more info. Is that a problem? if it is I will remove the post.	[reply]
Re^3: explanation of passages of a Perl script by AnomalousMonk (Archbishop) on Feb 06, 2018 at 16:42 UTC
... I [crossposted] to have more info. Is that a problem? Crossposting is not a problem, but it is considered polite to note and link to the crosspost so that the efforts of those trying to help you are not needlessly duplicated at the far-flung corners of the InterWebs. Please see What shortcuts can I use for linking to other information? Update: Is that a problem? if it is I will remove the post. In general, removing a post is Frowned Upon. Updating a post to add more info is ok — like this update! Please see How do I change/delete my post? Give a man a fish: `<%-{-{-{-<`	[reply] [d/l]
Re^4: explanation of passages of a Perl script by uuuiii (Novice) on Feb 06, 2018 at 16:45 UTC
Re^5: explanation of passages of a Perl script by AnomalousMonk (Archbishop) on Feb 06, 2018 at 16:53 UTC
Re: explanation of passages of a Perl script by Anonymous Monk on Feb 06, 2018 at 14:40 UTC
The `last` verb, used quite-frequently here, causes the enclosing `for` or `foreach` loop to end immediately. Like you, I find the code as-written to be cryptic and confusing. Variable names like `$sub2` and `$sub22` do not convey any semantic meaning or clues. Simple comments, directed to "the next programmer who will be looking at this," would help enormously. "Talk to me ... tell me what you are doing here, and why. Use variable-names that are suggestive." Use `use strict; use warnings;` and declare all of the variables that you use. Such niceties might mean nothing to the Perl compiler, but they mean everything to real people.	[reply]