http://qs1969.pair.com?node_id=11137297


in reply to Splitting in while loop

The usual "don't use a regex when a real parser exists" applies here too. Single quotes are technically a valid character in email addresses, as are commas and spaces when quoted. The following uses Email::Address to correctly parse and split such (admittedly very unusual and not recommended) addresses.

use warnings; use strict; use Email::Address; while (<DATA>) { for my $addr (Email::Address->parse($_)) { print $addr->address, "\n"; } } __DATA__ 'me'@here.com, "West, Casey" <casey@localhost> "those,foo"@there.com others@there.com you@there.com,them@there.com "Hello, World"@example.com

Outputs:

'me'@here.com casey@localhost "those,foo"@there.com others@there.com you@there.com them@there.com "Hello, World"@example.com

Replies are listed 'Best First'.
Re^2: Splitting in while loop
by hippo (Bishop) on Oct 07, 2021 at 10:59 UTC
    Single quotes are technically a valid character in email addresses

    For those of you who think this is just pedantry I can assure you that it is not.

    Many years ago, when defensive programming was less widespread than now, a customer of $WORK did indeed have such an email address. It followed the pattern of jack.o'malley@bigcorp.com and caused no end of trouble for various systems which poor, hapless Jack was required to use. It even showed up one or two areas of $WORK's systems where such a potential injection was either mismanaged or misreported. These days Jack should have no trouble with systems from respsonsible coders but there are still plenty of slapdash operators out there who will struggle with this, even now.

    If you persist in using home-grown regexen to parse email addresses (or HTML or XML or SQL or ...) then you should be aware that sooner or later it will come back to bite you.


    🦛

      ... email address. It followed the pattern of jack.o'malley@bigcorp.com ...

      Hmm, let me guess: The first name was Robert. ;-)

      Alexander

      --
      Today I will gladly share my knowledge and experience, for there are no sweeter words than "I told you so". ;-)
      Points taken thanks guys, but in my case I'm just using them as a quick hack to visually show the limits of the address in the output, e.g. there is no leading/trailing spaces, etc. In the rare event that I did end up with this kind of thing being output:
      'jack.o'malley@bigcorp.com'
      it wouldn't matter.