seaver has asked for the wisdom of the Perl Monks concerning the following question:

Update: Typo found by toolic and subsequently Anon. Thanks!

Dear all

This seems like it should be pretty obvious, and it works for everything else, so I'm convinced I'm missing something. I do a match on a string, and then change that string into a URL.

Sample string: LOC100282561 [Source:RefSeq peptide;Acc:NP_001148941]
Desired match: LOC100282561
Desired substitution: <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term=LOC100282561" target="_blank">LOC100282561</a>

Now, the code works in many other instances, but this is the first time I've started matching this particular text, and it's finding the match, but failing to make the substitution, so I'm convinced there's some hidden characters that I'm missing here, but I don't know what.

use strict; use warnings; my $Text="LOC100282561 [Source:RefSeq peptide;Acc:NP_001148941]"; my %VisitedLinks=(); #Searching for NCBI Entrez Gene IDs + + $_ = $Text; my @OriginalArray = /(LOC\d{9})/g; for (my $i=0; $i < @OriginalArray; $i++) { if (!defined($VisitedLinks{$OriginalArray[$i]})) { $VisitedLinks{$OriginalArray[$i]} = 1; my $Link = EntrezGeneLinks($OriginalArray[$i]); my $Find = $OriginalArray[$i]; $Text =~ s/$Find$/$Link/g; } } print $Text,"\n"; sub EntrezGeneLinks { my ($ID) = @_; return '<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene& +term='.$ID.'" target="_blank">'.$ID.'</a>'; }

Replies are listed 'Best First'.
Re: Matches but not substituting
by toolic (Bishop) on Jun 03, 2011 at 14:53 UTC
    It would be easier to diagnose if you provide a self-contained code sample that anyone can run. Regardless, my best guess is that
    $Text =~ s/$Find$/$Link/g;
    should be:
    $Text =~ s/$Find/$Link/;
    If LOC100282561 is not at the end of the $Text string, then the $ anchor prevents the substitution.

    Update: with your updated code and my proposed fix, here is the output I get:

    <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term=LOC1002 +82561" target="_blank">LOC100282561</a> [Source:RefSeq peptide;Acc:N +P_001148941]
    Is that what you expect?

      Oh boy, I knew the answer would be obvious! That "$" was a typo introduced many moons ago, but for the first time, I had to parse text that didn't have the match at the end of the string, so it never threw up on me until now.

      Thanks for pointing it out, problem solved, phew! I was really hitting my head on the desk over this one.

        This sounds like an excellent time to introduce a test suit for the script. Actually any time is an excellent time to introduce unit tests, but right now is mostly the best time.

        True laziness is hard work
      Maybe $Text =~ s/\Q$Find\E/$Link/g;
        \Q and \E are not needed here because the only characters in the regular expression part of s/// are LOC0123456789, and none of those is a metacharacter.
Re: Matches but not substituting
by kennethk (Abbot) on Jun 03, 2011 at 14:55 UTC
    The problem is likely that [ and ] are Metacharacters in regular expressions used to define character classes. You can avoid you issue by escaping them using \Q and \E (see Quote and Quote like Operators):

    $Text =~ s/\Q$Find\E$/$Link/g;

Re: Matches but not substituting
by NetWallah (Canon) on Jun 04, 2011 at 05:07 UTC
    Others have determined the problem. My suggestion is to use a more perlish style. I offer:
    #Searching for NCBI Entrez Gene IDs my @OriginalArray = ( $Text =~ /(LOC\d{9})/g ); for my $loc( @OriginalArray){ next if $VisitedLinks{$loc}; $VisitedLinks{$loc} = 1; my $Link = $self->EntrezGeneLinks($loc); $Text =~ s/\Q$loc\E/$Link/g; }

                "XML is like violence: if it doesn't solve your problem, use more."

Re: Matches but not substituting
by Anonymous Monk on Jun 03, 2011 at 14:53 UTC

      I updated my code so that it should run as a stand-alone piece of code.

        Aha, you anchor using $ which makes it fail
        #!/usr/bin/perl -- use strict; use warnings; my $Text="LOC100282561 [Source:RefSeq peptide;Acc:NP_001148941]"; my %VisitedLinks=(); #Searching for NCBI Entrez Gene IDs + + $_ = $Text; my @OriginalArray = /(LOC\d{9})/g; use DDS; Dump( \@OriginalArray ); for (my $i=0; $i < @OriginalArray; $i++) { if (!defined($VisitedLinks{$OriginalArray[$i]})) { $VisitedLinks{$OriginalArray[$i]} = 1; my $Link = EntrezGeneLinks($OriginalArray[$i]); my $Find = $OriginalArray[$i]; use DDS; Dump( $Link, $Find, $Text ); #~ $Text =~ s/$Find$/$Link/g; $Text =~ s/$Find/$Link/g; use DDS; Dump( $Text, ); } } print $Text,"\n"; sub EntrezGeneLinks { my ($ID) = @_; return '<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene& +term='.$ID.'" target="_blank">'.$ID.'</a>'; } __END__ $ARRAY1 = [ 'LOC100282561' ]; $VAR1 = '<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&ter +m=LOC100282561" target="_blank">LOC100282561</a>'; $VAR2 = 'LOC100282561'; $VAR3 = 'LOC100282561 [Source:RefSeq peptide;Acc:NP_001148941]'; $VAR1 = '<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&ter +m=LOC100282561" target="_blank">LOC100282561</a> [Source:RefSeq pept +ide;Acc:NP_001148941]'; <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term=LOC1002 +82561" target="_blank">LOC100282561</a> [Source:RefSeq peptide;Acc:N +P_001148941]