For testing these email addresses, you could try Regexp::Pattern::Email.

Thx, kcott, it seems to do the trick:

$ ./2.email.kcott.pl 
NOK: |Elmer Fudd
|
NOK: |Daffy Duck
|
NOK: |Alternate
|
NOK: |Phone
|
NOK: |No
|
NOK: |7/13/2017
|
NOK: |Yes
|
NOK: |9/09/2006
|
OK:  |daffy@gmail.com
|
OK:  |Elmer.am@gmail.com
|
NOK: |12/5/2019
|
OK:  |бесполезное.использование.кота@gmail.com
|
OK:  |kobernIU@hotmail.comp
|
OK:  |drüben@msn.com
|
OK:  |manilow@barry76@gmail.com
|
OK:  |moc.liamg@نالی بلی
|
OK:  |時髦的貓@gmail.com
|
OK:  |pen@ничего.net 
|
OK:  |last@nothing.nyet
|
NOK: |
|
cardinality: 10
Elmer.am@gmail.com
 daffy@gmail.com
 drüben@msn.com
 kobernIU@hotmail.comp
 last@nothing.nyet
 manilow@barry76@gmail.com
 moc.liamg@نالی بلی
 pen@ничего.net 
 бесполезное.использование.кота@gmail.com
 時髦的貓@gmail.com

$ cat 2.email.kcott.pl
#!/usr/bin/perl use v5.028; # strictness implied use warnings; use Path::Tiny; binmode STDOUT, ":utf8"; # to install: cpanm Regexp::Pattern::Email use Regexp::Pattern; my $file_in = path("/home/pi/Documents/curate/1.sscce.email.txt"); my $file_out = path('/home/pi/Documents/curate/1.kcott.email.output.tx +t'); my @addrs = $file_in->lines_utf8; my @matching; for my $addr (@addrs) { if ( $addr =~ re("Email::email_address") ) { say "OK: |$addr|"; push( @matching, $addr ); } else { say "NOK: |$addr|"; } } @matching = sort(@matching); say "cardinality: ", scalar @matching; my $string = join( " ", @matching ); say "$string"; $file_out->spew_utf8($string); __END__ $

This seems to accomplish its task, but I had a side-effect on this platform that I'm struggling to understand. Output was to be marshaled by Path::Tiny. What I ended up with every time I ran it was the proper output plus a phantom file like:

1.kcott.email.output.txt93601288741312

, of zero size, that appeared in my file explorer. I don't even know what to call that on this raspberry pi, even having looked through its menus. When I selected them and hit the delete key, I got:

1.kcott.email.output.txt323160262002: Error when getting information f +or file “/home/pi/Documents/curate/1.kcott.email.output.txt3231602620 +02”: No such file or directory 1.kcott.email.output.txt3642662573981: Error when getting information +for file “/home/pi/Documents/curate/1.kcott.email.output.txt364266257 +3981”: No such file or directory 1.kcott.email.output.txt35531339026259: Error when getting information + for file “/home/pi/Documents/curate/1.kcott.email.output.txt35531339 +026259”: No such file or directory 1.kcott.email.output.txt35631638814375: Error when getting information + for file “/home/pi/Documents/curate/1.kcott.email.output.txt35631638 +814375”: No such file or directory 1.kcott.email.output.txt93601288741312: Error when getting information + for file “/home/pi/Documents/curate/1.kcott.email.output.txt93601288 +741312”: No such file or directory

, and the terminal with ls -al showed nothing of them. I took a screenshot to prove to myself that it was happening.

Is there an io layer going on that I'm not accounting for?

Anyways, the world will keep spinning despite this. Curious as I am, I took a look inside Regexp-Pattern-Email/source/lib/Regexp/Pattern/Email.pm

How on earth could anyone or anything figure out what is going on in the regex that lies in the middle of otherwise short module:

pat => qr((?:(?^:(?:(?^:(?>(?^:(?^:(?>(?^:(?>(?^:(?>(?^:(?^:(?>\s*\((? +:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s ++))*[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?^:(?^:(?>\s*\((?:\s*(?^:(?^: +(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*))|\.|\s +*"(?^:(?^:[^\\"])|(?^:\\(?^:[^\x0A\x0D])))+"\s*))+))|(?>(?^:(?^:(?>(? +^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))* +\s*\)\s*))|(?>\s+))*[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?^:(?^:(?>\s* +\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|( +?>\s+))*))|(?^:(?>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\( +?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*"(?^:(?^:[^\\"])|(?^:\\(?^:[^ +\x0A\x0D])))*"(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[ +^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*)))+))?)(?^:(?>(?^:(?^:(?>\s*\((? +:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s ++))*<(?^:(?^:(?^:(?>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\ +\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*(?^:(?>[^\x00-\x1F\x7F()<>\ +[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*))(?^:(?^:(? +>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s* +))|(?>\s+))*))|(?^:(?>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^ +:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*"(?^:(?^:[^\\"])|(?^:\\(? +^:[^\x0A\x0D])))*"(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\( +?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*)))\@(?^:(?^:(?>(?^:(?^:(?>\s +*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))| +(?>\s+))*(?^:(?>[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x +7F()<>\[\]:;@\\,."\s]+)*))(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+)) +|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*))|(?^:(?>(?^:(?^:(?> +\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*) +)|(?>\s+))*\[(?:\s*(?^:(?^:[^\[\]\\])|(?^:\\(?^:[^\x0A\x0D]))))*\s*\] +(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|) +)*\s*\)\s*))|(?>\s+))*))))>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+) +)|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*)))|(?^:(?^:(?^:(?>( +?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|)) +*\s*\)\s*))|(?>\s+))*(?^:(?>[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[ +^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*))(?^:(?^:(?>\s*\((?:\s*(?^:(?^:( +?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*))|(?^:(? +>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))| +))*\s*\)\s*))|(?>\s+))*"(?^:(?^:[^\\"])|(?^:\\(?^:[^\x0A\x0D])))*"(?^ +:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\ +s*\)\s*))|(?>\s+))*)))\@(?^:(?^:(?>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[ +^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*(?^:(?>[^\x0 +0-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s] ++)*))(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D +]))|))*\s*\)\s*))|(?>\s+))*))|(?^:(?>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(? +>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*\[(?:\s*(? +^:(?^:[^\[\]\\])|(?^:\\(?^:[^\x0A\x0D]))))*\s*\](?^:(?^:(?>\s*\((?:\s +*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+)) +*)))))(?>(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D +]))|))*\s*\)\s*))*)))),

Why does this have to be so complicated?


In reply to Re^2: regex for unicode email addresses by Aldebaran
in thread regex for unicode email addresses by Aldebaran

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.