jcpunk has asked for the wisdom of the Perl Monks concerning the following question:

i need to untaint an email address, and my reg-ex looks right (to my eyes) however it does not work correctly in the program .
if ($tainted =~ /\w{3}[\w.]*\@[\w.]+/) { $untainted = "$1\@$2"; }
i do not see my error, but $untainted always ends up being equal to @. any thoughts?
jcpunk

by the way thanks for all the help that was, is, and will be

Replies are listed 'Best First'.
Re: slightly broken reg-ex
by hardburn (Abbot) on Jul 02, 2003 at 14:32 UTC

    Advice on parsing e-mail addresses: don't. It's a lot more complex than it looks at first glance. The only regex capable of getting it anywhere close to right is a few hundred characters long, and that doesn't even get embedded comments.

    Instead, use Email::Valid (which contains that massive regex already) to check if the address conforms to RFC 822 (which defines the format for e-mail addresses), then use a simple /(.*)/ to untaint it if it passes (that is, if you really have to untaint it).

    Update: About the code you wrote above--you're not capturing any data in the regex, so $1 and $2 won't be intitilized. For future referance, you need to do something like:

    if($str =~ /([\w\.]+)\@([\w\.]+)/) { $untainted = "$1\@$2"; }

    But my comments above still apply.

    ----
    I wanted to explore how Perl's closures can be manipulated, and ended up creating an object system by accident.
    -- Schemer

    Note: All code is untested, unless otherwise stated

      thanks i will look into how to use that, looks pretty simple...
      jcpunk

      by the way thanks for all the help that was, is, and will be

Re: slightly broken reg-ex
by sauoq (Abbot) on Jul 02, 2003 at 14:34 UTC

    You aren't capturing anything with parens in your regex. So, $1 and $2 are always empty.

    BTW, a regex for valid emails will be much more complex than yours.

    -sauoq
    "My two cents aren't worth a dime.";
    
      indeed, thanks for looking into that, im thinkin the valid email modules sounds kinda nice (one post up for the link)
      jcpunk

      by the way thanks for all the help that was, is, and will be

Re: slightly broken reg-ex
by halley (Prior) on Jul 02, 2003 at 14:39 UTC
    I agree with the recommendation for Email::Valid. The issue with your code seems straightforward: there's never any assignment to $1 or $2. Those should be "captured" by parenthesized groups in your regex. You may also want to trim or anchor the expression to ensure you're not clipping out an improper subset of the offered string.
    if ($tainted =~ / ^ # don't start mid-string \s* # skip any leading space (\w{3}[\w.]*) # prefix to $1 \@ ([\w.]+) # hostname to $2 \s* # skip any trailing space $ /x) # don't end mid-string { $untainted = "$1\@$2"; }

    --
    [ e d @ h a l l e y . c c ]

      thank you very much for the time you put into that
      jcpunk

      by the way thanks for all the help that was, is, and will be

Re: slightly broken reg-ex
by Abstraction (Friar) on Jul 02, 2003 at 14:35 UTC
    Give this a shot.

    if ($tainted =~ /(\w{3}[\w.]*)\@([\w.]+)/){ $untainted = "$1\@$2"; }


      thanks, for that i apprecate the help
      jcpunk

      by the way thanks for all the help that was, is, and will be