rattytatty has asked for the wisdom of the Perl Monks concerning the following question:

I have a problem with a pattern match. I want to print what the match is whether it be by using $1 or assigning the match to another variable. It seems to be taking a scalar of the value instead of the value and I don't know why unless perl 5.10.1 is an issue

my $procount2 = 1; foreach my $readingframe2(@protein){ my $protein; print "string is $readingframe2\n"; my $fada; if($readingframe2 =~ /MHGR/){ print "$1\n"; ($protein)= $readingframe2 =~ /MHGR/; $fada = length($protein); print "motif is $protein\n"; if($fada >=1){ print "length of motif is $fada\n"; print OUT1 "Protein $procount2\n$protein\n"; $procount2++; } } }

Here is what comes up in the console:

$ perl readingframe.pl Enter file name t.txt Enter output file name o.txt string is MHGRRRRRRRRRRRRRRRRRRRRRRRRRRRRRD_MHGRRRRRRRRRRRRRRRRRRRRRRR +RRRRRRD_ Use of uninitialized value $1 in concatenation (.) or string at readin +gframe.pl line 68, <> line 2. motif is 1 length of motif is 1 string is CMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVTECMAAAAAAAAAAAAAAAAAAAAAAAA +AAAAAVT string is AWPPPPPPPPPPPPPPPPPPPPPPPPPPPPP_LNAWPPPPPPPPPPPPPPPPPPPPPPPP +PPPPP_L

Replies are listed 'Best First'.
Re: Regular expression problems
by brx (Pilgrim) on Apr 23, 2012 at 16:10 UTC
    if($readingframe2 =~ /MHGR/){ print "$1\n"; ($protein)= $readingframe2 =~ /MHGR/;
    $1 contains what you capture with parens: no parens here in regex, so nothing in $1.
    What would you like to find in $prottein ?

    example: $ perl -e '$s="pilpoil";if ($s=~/p(.)lp(.)il/) { print "h".$1."h".$2;} +' hiho
      Thanks for the help. I used this as an example but I will be using it to find an ORF (A Start codon'M', followed by any of the rest of the Amino Acids until a Stop codon'_' I think that will be: =~/(M)[GAVLIFWPSTCYNQDEKRH]+(_)/ or =~/(M[GAVLIFWPSTCYNQDEKRH]+_)/ I'm not sure but I will try both. Thanks again.
        You can want to add 'M' and '_' in parens to capture in one shot or concatanate 'M'.$1.'_' depending of what you want to do. One way to do it:
        #!/usr/bin/perl -w use strict; my $seqnum=1; while (my $seq=<DATA>) { chomp $seq; print "sequence #",$seqnum++,":\n"; while ($seq =~ /M([GAVLIFWPSTCYNQDEKRH]+)_/g) { print "\t",$1,"\n"; # or: print "\t","M${1}_","\n"; } } __DATA__ MHGRRRRRRRRRRRRRRRRRRRRRRRRRRRRRD_MHGRRRRRRRRD_ CMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVTECMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAVT AWPPPPPPPPPPPPPPPPPPPPPPPPPPPPP_LNAWPPPPPPPPPPPPPPPPPPPPPPPPPPPPP_L FOOBARMNOTTHISONE_XYZMTHISYES_MTHISNOT_
Re: Regular expression problems
by choroba (Cardinal) on Apr 23, 2012 at 16:14 UTC
    If you want to populate $1, you have to use parentheses:
    $readingframe =~ /(MHGR)/; $protein = $1;
    or you can assign it directly:
    ($protein) = $readingframe =~ /(MHGR)/;
    In your case, though, you can just output MHGR, as there is nothing dynamic in your regular expression.

    See also Regexp Quote Like Operators, perlretut, perlre.

Re: Regular expression problems
by pvaldes (Chaplain) on Apr 23, 2012 at 16:21 UTC

    Use of uninitialized value $1... Because, $1 is in fact undefined in your example.

    my $procount2 = 1; foreach my $readingframe2(@protein){ my $protein; print "string is $_\n"; if(/MHGR/){ print "$1\n","motif is $_\n"; ##### <----- if(length($_)>=1){ print "length of motif is length($_)\n"; print OUT1 "Protein $procount2\n$_\n"; $procount2++; } } }