Problems with regex grouping

patgas has asked for the wisdom of the Perl Monks concerning the following question:

The problem of sorting a file full of nothing but HTML mailto: links came across my desk, and armed with Perl I naturally solved it a few minutes (a miracle for me, this stuff is finally starting to stick). Then I noticed that not all of the links in the file actually contained mailto: even if they were linking to email addresses. So now I'm stuck trying to change my regex to match either the entire word "mailto:" or nothing at all. My best guess is commented below. Can anyone offer me advice? (And yes, I have read Death to Dot Star! even if I don't fully comprehend it yet.)

#!/usr/bin/perl -w
use strict;

my %names;

while ( <> ) {
    s/\s+/ /ig;
    /<a href="(.*)">(.*)<\/a><br>/;
    # /<a href="(mailto:)??(.*)">(.*)<\/a><br>/;
    $names{"$2($1)"} = $1;
}

open OUT, ">sortboater.txt" or die "Cannot open file: $!";

for ( sort keys %names ) {
    /^(.*)\(.*\)$/;
    print OUT qq(<a href="mailto:$names{$_}">$1</a><br />\n);
}

close OUT or die "Cannot close file: $!";
[download]

-- More than perfect! Let us engage the Concord!

Comment on Problems with regex grouping Download Code

Replies are listed 'Best First'.
Re: Problems with regex grouping by abstracts (Hermit) on Aug 17, 2001 at 21:45 UTC
Hello The regexp you want is: `my @ar = (); # empty array while(<>){ #for every match in the line #(if there are more than one email in a line) while(/<a href="(mailto:)?([^"])">([^<])<\/a>/g){ push @ar, "$3\t$2"; } } open ... for(sort @ar){ my ($name, $email) = split/\t/; print OUT qq(<a href="mailto:$email">$name</a><br />\n); }` [download] Update: If you have only one link perl line, you can do this: `while(<>){ if(/<a href="(mailto:)?([^"])">([^<])<\/a>/){ push @ar, "$3\t$2"; } else { warn "Err: $_\n"; } }` [download] Hope this helps,,, Aziz,,,	[reply] [d/l] [select]
Re: Re: Problems with regex grouping by damian1301 (Curate) on Aug 17, 2001 at 21:59 UTC
To avoid throwing stuff in `$1` when you don't need to, there is a great feature called clustering in REs with the `(?:stuff here)` stuff :). Just change `/<a href="(mailto:)?([^"])">([^<"])<\/a>/g` [download] to `/<a href="(?:mailto:)?([^"])">([^<"])<\/a>/g` And that way nothing will be put in `$1`. You may also want to consider `chomp`ing `$2 and $3` because a newline could get in there somewhere and cause some trouble. Hope I helped out. :):):) Later `$_.=($=+(6<<1));print(chr(my$a=$_));$^H=$_+$_;$_=$^H; print chr($_-39); # Easy but its ok.` [download]	[reply] [d/l] [select]
(dkubb) Re: (1) Problems with regex grouping by dkubb (Deacon) on Aug 17, 2001 at 23:29 UTC
Have you thought about taking a different approach and using CPAN modules to do the work for you? Since you are searching for email addresses in a string, this looks like the perfect job for Email::Find. Here is some working sample code to demonstrate how it's used.	[reply]