patgas has asked for the wisdom of the Perl Monks concerning the following question:

The problem of sorting a file full of nothing but HTML mailto: links came across my desk, and armed with Perl I naturally solved it a few minutes (a miracle for me, this stuff is finally starting to stick). Then I noticed that not all of the links in the file actually contained mailto: even if they were linking to email addresses. So now I'm stuck trying to change my regex to match either the entire word "mailto:" or nothing at all. My best guess is commented below. Can anyone offer me advice? (And yes, I have read Death to Dot Star! even if I don't fully comprehend it yet.)
#!/usr/bin/perl -w use strict; my %names; while ( <> ) { s/\s+/ /ig; /<a href="(.*)">(.*)<\/a><br>/; # /<a href="(mailto:)??(.*)">(.*)<\/a><br>/; $names{"$2($1)"} = $1; } open OUT, ">sortboater.txt" or die "Cannot open file: $!"; for ( sort keys %names ) { /^(.*)\(.*\)$/; print OUT qq(<a href="mailto:$names{$_}">$1</a><br />\n); } close OUT or die "Cannot close file: $!";
-- More than perfect! Let us engage the Concord!

Replies are listed 'Best First'.
Re: Problems with regex grouping
by abstracts (Hermit) on Aug 17, 2001 at 21:45 UTC
    Hello

    The regexp you want is:

    my @ar = (); # empty array while(<>){ #for every match in the line #(if there are more than one email in a line) while(/<a href="(mailto:)?([^"]*)">([^<]*)<\/a>/g){ push @ar, "$3\t$2"; } } open ... for(sort @ar){ my ($name, $email) = split/\t/; print OUT qq(<a href="mailto:$email">$name</a><br />\n); }

    Update: If you have only one link perl line, you can do this:

    while(<>){ if(/<a href="(mailto:)?([^"]*)">([^<]*)<\/a>/){ push @ar, "$3\t$2"; } else { warn "Err: $_\n"; } }

    Hope this helps,,,

    Aziz,,,

      To avoid throwing stuff in $1 when you don't need to, there is a great feature called clustering in REs with the (?:stuff here) stuff :). Just change
      /<a href="(mailto:)?([^"]*)">([^<"]*)<\/a>/g
      to

      /<a href="(?:mailto:)?([^"]*)">([^<"]*)<\/a>/g

      And that way nothing will be put in $1.

      You may also want to consider chomping  $2 and $3 because a newline could get in there somewhere and cause some trouble.

      Hope I helped out. :):):) Later

      $_.=($=+(6<<1));print(chr(my$a=$_));$^H=$_+$_;$_=$^H; print chr($_-39); # Easy but its ok.
(dkubb) Re: (1) Problems with regex grouping
by dkubb (Deacon) on Aug 17, 2001 at 23:29 UTC

    Have you thought about taking a different approach and using CPAN modules to do the work for you? Since you are searching for email addresses in a string, this looks like the perfect job for Email::Find.

    Here is some working sample code to demonstrate how it's used.