Matches but not substituting

seaver has asked for the wisdom of the Perl Monks concerning the following question:

Update: Typo found by toolic and subsequently Anon. Thanks!

Dear all

This seems like it should be pretty obvious, and it works for everything else, so I'm convinced I'm missing something. I do a match on a string, and then change that string into a URL.

Sample string: LOC100282561 [Source:RefSeq peptide;Acc:NP_001148941]
Desired match: LOC100282561
Desired substitution: <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term=LOC100282561" target="_blank">LOC100282561</a>

Now, the code works in many other instances, but this is the first time I've started matching this particular text, and it's finding the match, but failing to make the substitution, so I'm convinced there's some hidden characters that I'm missing here, but I don't know what.

use strict;
use warnings;

my $Text="LOC100282561  [Source:RefSeq peptide;Acc:NP_001148941]";
my %VisitedLinks=();

#Searching for NCBI Entrez Gene IDs                                   
+                                                                     
+                                                               
$_ = $Text;
my @OriginalArray = /(LOC\d{9})/g;
for (my $i=0; $i < @OriginalArray; $i++) {
    if (!defined($VisitedLinks{$OriginalArray[$i]})) {
        $VisitedLinks{$OriginalArray[$i]} = 1;
        my $Link = EntrezGeneLinks($OriginalArray[$i]);
        my $Find = $OriginalArray[$i];
        $Text =~ s/$Find$/$Link/g;
    }
}

print $Text,"\n";

sub EntrezGeneLinks {
    my ($ID) = @_;

    return '<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&
+term='.$ID.'" target="_blank">'.$ID.'</a>';
}
[download]

Comment on Matches but not substituting Download Code

Replies are listed 'Best First'.
Re: Matches but not substituting by toolic (Bishop) on Jun 03, 2011 at 14:53 UTC
It would be easier to diagnose if you provide a self-contained code sample that anyone can run. Regardless, my best guess is that `$Text =~ s/$Find$/$Link/g;` [download] should be: `$Text =~ s/$Find/$Link/;` [download] If LOC100282561 is not at the end of the $Text string, then the $ anchor prevents the substitution. Update: with your updated code and my proposed fix, here is the output I get: `<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term=LOC1002 +82561" target="_blank">LOC100282561</a> [Source:RefSeq peptide;Acc:N +P_001148941]` [download] Is that what you expect?	[reply] [d/l] [select]
Re^2: Matches but not substituting by seaver (Pilgrim) on Jun 03, 2011 at 15:14 UTC
Oh boy, I knew the answer would be obvious! That "$" was a typo introduced many moons ago, but for the first time, I had to parse text that didn't have the match at the end of the string, so it never threw up on me until now. Thanks for pointing it out, problem solved, phew! I was really hitting my head on the desk over this one.	[reply]
Re^3: Matches but not substituting by GrandFather (Saint) on Jun 04, 2011 at 00:26 UTC
This sounds like an excellent time to introduce a test suit for the script. Actually any time is an excellent time to introduce unit tests, but right now is mostly the best time. True laziness is hard work	[reply]
Re^2: Matches but not substituting by Anonymous Monk on Jun 03, 2011 at 14:55 UTC
Maybe `$Text =~ s/\Q$Find\E/$Link/g;`	[reply] [d/l]
Re^3: Matches but not substituting by toolic (Bishop) on Jun 03, 2011 at 15:10 UTC
\Q and \E are not needed here because the only characters in the regular expression part of s/// are `LOC0123456789`, and none of those is a metacharacter.	[reply] [d/l]
Re: Matches but not substituting by kennethk (Abbot) on Jun 03, 2011 at 14:55 UTC
The problem is likely that `[` and `]` are Metacharacters in regular expressions used to define character classes. You can avoid you issue by escaping them using `\Q` and `\E` (see Quote and Quote like Operators): `$Text =~ s/\Q$Find\E$/$Link/g;`	[reply] [d/l] [select]
Re: Matches but not substituting by NetWallah (Canon) on Jun 04, 2011 at 05:07 UTC
Others have determined the problem. My suggestion is to use a more perlish style. I offer: `#Searching for NCBI Entrez Gene IDs my @OriginalArray = ( $Text =~ /(LOC\d{9})/g ); for my $loc( @OriginalArray){ next if $VisitedLinks{$loc}; $VisitedLinks{$loc} = 1; my $Link = $self->EntrezGeneLinks($loc); $Text =~ s/\Q$loc\E/$Link/g; }` [download] "XML is like violence: if it doesn't solve your problem, use more."	[reply] [d/l]
Re: Matches but not substituting by Anonymous Monk on Jun 03, 2011 at 14:53 UTC
Um, that code doesn't run :) see How do I post a question effectively? See also quotemeta, because regular expressions are a mini-language	[reply]
Re^2: Matches but not substituting by seaver (Pilgrim) on Jun 03, 2011 at 15:02 UTC
I updated my code so that it should run as a stand-alone piece of code.	[reply]
Re^3: Matches but not substituting by Anonymous Monk on Jun 03, 2011 at 15:12 UTC
Aha, you anchor using $ which makes it fail #!/usr/bin/perl -- use strict; use warnings; my $Text="LOC100282561 [Source:RefSeq peptide;Acc:NP_001148941]"; my %VisitedLinks=(); #Searching for NCBI Entrez Gene IDs + + $_ = $Text; my @OriginalArray = /(LOC\d{9})/g; use DDS; Dump( \@OriginalArray ); for (my $i=0; $i < @OriginalArray; $i++) { if (!defined($VisitedLinks{$OriginalArray[$i]})) { $VisitedLinks{$OriginalArray[$i]} = 1; my $Link = EntrezGeneLinks($OriginalArray[$i]); my $Find = $OriginalArray[$i]; use DDS; Dump( $Link, $Find, $Text ); #~ $Text =~ s/$Find$/$Link/g; $Text =~ s/$Find/$Link/g; use DDS; Dump( $Text, ); } } print $Text,"\n"; sub EntrezGeneLinks { my ($ID) = @_; return '<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene& +term='.$ID.'" target="_blank">'.$ID.'</a>'; } __END__ $ARRAY1 = [ 'LOC100282561' ]; $VAR1 = '<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&ter +m=LOC100282561" target="_blank">LOC100282561</a>'; $VAR2 = 'LOC100282561'; $VAR3 = 'LOC100282561 [Source:RefSeq peptide;Acc:NP_001148941]'; $VAR1 = '<a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&ter +m=LOC100282561" target="_blank">LOC100282561</a> [Source:RefSeq pept +ide;Acc:NP_001148941]'; <a href="http://www.ncbi.nlm.nih.gov/sites/entrez?db=gene&term=LOC1002 +82561" target="_blank">LOC100282561</a> [Source:RefSeq peptide;Acc:N +P_001148941] [download]	[reply] [d/l]
Re^3: Matches but not substituting by toolic (Bishop) on Jun 03, 2011 at 15:07 UTC
See Re: Matches but not substituting	[reply]