Trouble skipping lines using Perl

LeBran has asked for the wisdom of the Perl Monks concerning the following question:

Hello all, I'm struggling to get Perl to work in skipping lines containing certain features which means I don't need to use them, basically I'm just unsure of the syntax, though it appears to work in some places and not in others

Looking at this following table;

 chrM    2928    A    G    17    0    0%    A    46    105    69.54%  
+  R    Somatic    1    1.23052720566294E-008    17    29    50    55 
+   4    13    0    0 
 chr1    108310    T    C    9    0    0%    T    3    5    62.5%    Y
+    Somatic    1    0.0090497738    2    1    2    3    7    2    0  
+  0 
 chr1    726958    T    A    11    0    0%    T    9    4    30.77%   
+ W    Somatic    1    0.0672877847    6    3    1    3    5    6    0
+    0 
 chr1    1412720    C    A    33    0    0%    C    22    6    21.43% 
+   M    Somatic    1    0.0067850063    10    12    3    3    14    1
+9    0    0 
chr1    1822396    C    G    6    0    0%    C    4    4    50%    S  
+  Somatic    1    0.0699300699    3    1    2    2    4    2    0    
+0 
 chr1    1822457    C    T    10    0    0%    C    4    4    50%    Y
+    Somatic    1    0.022875817    3    1    2    2    6    4    0   
+ 0
[download]

I'm trying to skip the lines which are chrM rather than chr1, the following code is what I came up with (I'm also extracting different columns)

 while (my $line = <FILE>) {
    next if ($. == 1);
    chomp $line;
    my @sepline = split ("\t", $line);
    my $chromosome = $sepline[0];
    my $chrpos = $sepline[1];
    my $nmreads = $sepline[8];
    my $mutants = $sepline[9];
    my $totalreads = $nmreads + $mutants;
    next if ($chromosome = /^chrM/);
    print ("$nmreads $mutants $totalreads\n");
    

}
[download]

So the script works fine if I # out the lower next if function towards the bottom of the script, I've also tried "next if ($chromosome = "chrM");" and "if ($chromosome = "chrM") { next;} but neither will work. Is there something incorrect about my syntax or am I simply going about it completely the wrong way? Appreciate any help, cheers

Comment on Trouble skipping lines using Perl Select or Download Code

Replies are listed 'Best First'.
Re: Trouble skipping lines using Perl by haukex (Archbishop) on Nov 21, 2017 at 16:01 UTC
`next if ($chromosome =~ /^chrM/);` You're very close, you just need to use the "binding operator" `=~` instead of `=` and your code works. See also perlrequick and perlretut. Note that if you're using warnings, which you should, you should have gotten a warning "`Use of uninitialized value $_ in pattern match (m//)`". See also Use strict and warnings. BTW, please enclose your sample input data in `<code>` tags as well (not `<p>`).	[reply] [d/l] [select]
Re^2: Trouble skipping lines using Perl by LeBran (Initiate) on Nov 21, 2017 at 16:11 UTC
Hi haukex, Thanks very much, works perfectly now. I went with `next if ($chromosome =~ "chrM");` I was using both Strict and Warnings and using the quotes instead of the regex doesn't produce any warnings :D My apologies about the input data format Cheers	[reply] [d/l]
Re^3: Trouble skipping lines using Perl by haukex (Archbishop) on Nov 21, 2017 at 16:19 UTC
I went with `next if ($chromosome =~ "chrM");` That works, but personally I wouldn't write it that way, because writing a regex like for example `/chrM/` or `m{chrM}` makes it more visually clear what you want to do (and also allows you to add modifiers). I was using both Strict and Warnings and using the quotes instead of the regex doesn't produce any warnings Are you sure? `next if ($chromosome = "chrM");` should have given you the warning "`Found = in conditional, should be ==`". Perhaps you're not enabling warnings correctly? Update: My apologies about the input data format You can edit your posts (please mark updates as such), see How do I change/delete my post?	[reply] [d/l] [select]
Re^4: Trouble skipping lines using Perl by LeBran (Initiate) on Nov 21, 2017 at 16:55 UTC
Re^3: Trouble skipping lines using Perl by roboticus (Chancellor) on Nov 21, 2017 at 16:30 UTC
LeBran: It didn't give you any warnings because the expression `$chromosome = /^chrM/` is perfectly fine. It just doesn't do what you want it to. Instead of checking whether $chromosome starts with "chrM", it instead checks whether $_ starts with "chrM", and then sets $chromosome to a true value if it does, and a false value otherwise. Since you're not using $_ while parsing your lines, it never starts with "chrM" and always returns a false value. It's a common enough mistake that I could see a case being made for "if ($var = /rex/)" generating a warning, as I expect that "if ($var = ($_ =~ /rex/))" is pretty uncommon (at least, when looking at my code). ...roboticus When your only tool is a hammer, all problems look like your thumb.	[reply] [d/l]
Re^4: Trouble skipping lines using Perl by LeBran (Initiate) on Nov 21, 2017 at 17:03 UTC
Re^5: Trouble skipping lines using Perl by AnomalousMonk (Archbishop) on Nov 21, 2017 at 17:23 UTC
Re^3: Trouble skipping lines using Perl by AnomalousMonk (Archbishop) on Nov 21, 2017 at 17:49 UTC
I went with `next if ($chromosome =~ "chrM");` Also note that `$chromosome =~ "chrM"` matches if `"chrM"` is found anywhere in the `$chromosome` string: `c:\@Work\Perl\monks>perl -wMstrict -le "my $chromosome = 'foo xx chrM yy bar'; print 'found a chrM' if $chromosome =~ 'chrM'; " found a chrM` [download] This is because you no longer anchor the match to the start of the string (as you do in the code in the OP) with the `^` match anchor regex operator. The match `/^chrM/` would IMHO be better for what you seem to want. Another point is that the data posted in the OP has a leading space or spaces in some cases. Leading whitespace will cause the `/^chrM/` match to fail. If leading whitespace may be present in real data, I would recommend something along the lines of `/^\s*chrM/` instead. Give a man a fish: `<%-{-{-{-<`	[reply] [d/l] [select]
Re^4: Trouble skipping lines using Perl by LeBran (Initiate) on Nov 22, 2017 at 11:03 UTC
Re: Trouble skipping lines using Perl by Eily (Monsignor) on Nov 21, 2017 at 16:16 UTC
`$chromosome =~ /^chrM/` may do what you need if you want $chromosome to start with chrM (that's what ^ means, it's the start of the string). Note that if you want to check if it is equal to "chrM" this is done with the eq operator, so `$chromosome eq "chrM"` (see Relational Operators and Equality Operators) Also FYI if you want to check that a string is contained in another, you can use index which is easier to use and less tricky than regular expressions.	[reply] [d/l] [select]
Re^2: Trouble skipping lines using Perl by LeBran (Initiate) on Nov 21, 2017 at 17:05 UTC
Hi, Yeah that `eq` operator actually seems really useful, and presumably would deal with certain warnings in relation to using a = Cheers	[reply] [d/l]
Re: Trouble skipping lines using Perl by Laurent_R (Canon) on Nov 21, 2017 at 20:01 UTC
You've been given complete answers to your question, but I would comment that: You don't need to chomp your line, since you are not using its end You're not using $chrpos anywhere, so don't need to process it In the code below, there is also in principle no use for $chromosome, but I left it because maybe you actually want to use it It is better to discard useless lines at the beginning of the loop, rather than splitting them, etc., and then not use the result of this work; So the code could be made significantly shorter (and probably slightly faster) as follows, without losing any clarity: `while (my $line = <FILE>) { next if $. == 1; next if $line =~ /^\schrM/; my ($chromosome, $nmreads, $mutants) = (split "\t", $line)[0, 8, 9 +]; # note: $chromosome is not used, this could be simplified further my $totalreads = $nmreads + $mutants; print ("$nmreads $mutants $totalreads\n"); }` [download] I'm not saying that your code was bad, it wasn't, but I'm just trying to show some opportunities for improvement. Update:* Fixed a typo (`s/loosing/losing/)`, thanks to 1nickt for letting me know.	[reply] [d/l] [select]
Re^2: Trouble skipping lines using Perl by LeBran (Initiate) on Nov 22, 2017 at 11:06 UTC
Hi, Yeah, I can see the logic there, is certainly a lot neater and the time saving aspect (though probably minor in this case) is something I should try to remember I did actually use $chromosome and $chrpos in another print statement later, I just hadn't reached that part yet because the next if loop was bugging me haha :D Thanks for the help	[reply]