$ ./2.email.kcott.pl
NOK: |Elmer Fudd
|
NOK: |Daffy Duck
|
NOK: |Alternate
|
NOK: |Phone
|
NOK: |No
|
NOK: |7/13/2017
|
NOK: |Yes
|
NOK: |9/09/2006
|
OK: |daffy@gmail.com
|
OK: |Elmer.am@gmail.com
|
NOK: |12/5/2019
|
OK: |бесполезное.использование.кота@gmail.com
|
OK: |kobernIU@hotmail.comp
|
OK: |drüben@msn.com
|
OK: |manilow@barry76@gmail.com
|
OK: |moc.liamg@نالی بلی
|
OK: |時髦的貓@gmail.com
|
OK: |pen@ничего.net
|
OK: |last@nothing.nyet
|
NOK: |
|
cardinality: 10
Elmer.am@gmail.com
daffy@gmail.com
drüben@msn.com
kobernIU@hotmail.comp
last@nothing.nyet
manilow@barry76@gmail.com
moc.liamg@نالی بلی
pen@ничего.net
бесполезное.использование.кота@gmail.com
時髦的貓@gmail.com
$ cat 2.email.kcott.pl
#!/usr/bin/perl
use v5.028; # strictness implied
use warnings;
use Path::Tiny;
binmode STDOUT, ":utf8";
# to install: cpanm Regexp::Pattern::Email
use Regexp::Pattern;
my $file_in = path("/home/pi/Documents/curate/1.sscce.email.txt");
my $file_out = path('/home/pi/Documents/curate/1.kcott.email.output.tx
+t');
my @addrs = $file_in->lines_utf8;
my @matching;
for my $addr (@addrs) {
if ( $addr =~ re("Email::email_address") ) {
say "OK: |$addr|";
push( @matching, $addr );
}
else {
say "NOK: |$addr|";
}
}
@matching = sort(@matching);
say "cardinality: ", scalar @matching;
my $string = join( " ", @matching );
say "$string";
$file_out->spew_utf8($string);
__END__
$
This seems to accomplish its task, but I had a side-effect on this platform that I'm struggling to understand. Output was to be marshaled by Path::Tiny. What I ended up with every time I ran it was the proper output plus a phantom file like:
1.kcott.email.output.txt93601288741312
, of zero size, that appeared in my file explorer. I don't even know what to call that on this raspberry pi, even having looked through its menus. When I selected them and hit the delete key, I got:
1.kcott.email.output.txt323160262002: Error when getting information f
+or file “/home/pi/Documents/curate/1.kcott.email.output.txt3231602620
+02”: No such file or directory
1.kcott.email.output.txt3642662573981: Error when getting information
+for file “/home/pi/Documents/curate/1.kcott.email.output.txt364266257
+3981”: No such file or directory
1.kcott.email.output.txt35531339026259: Error when getting information
+ for file “/home/pi/Documents/curate/1.kcott.email.output.txt35531339
+026259”: No such file or directory
1.kcott.email.output.txt35631638814375: Error when getting information
+ for file “/home/pi/Documents/curate/1.kcott.email.output.txt35631638
+814375”: No such file or directory
1.kcott.email.output.txt93601288741312: Error when getting information
+ for file “/home/pi/Documents/curate/1.kcott.email.output.txt93601288
+741312”: No such file or directory
, and the terminal with ls -al showed nothing of them. I took a screenshot to prove to myself that it was happening.
Is there an io layer going on that I'm not accounting for?
Anyways, the world will keep spinning despite this. Curious as I am, I took a look inside Regexp-Pattern-Email/source/lib/Regexp/Pattern/Email.pm
How on earth could anyone or anything figure out what is going on in the regex that lies in the middle of otherwise short module:
pat => qr((?:(?^:(?:(?^:(?>(?^:(?^:(?>(?^:(?>(?^:(?>(?^:(?^:(?>\s*\((?
+:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s
++))*[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?^:(?^:(?>\s*\((?:\s*(?^:(?^:
+(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*))|\.|\s
+*"(?^:(?^:[^\\"])|(?^:\\(?^:[^\x0A\x0D])))+"\s*))+))|(?>(?^:(?^:(?>(?
+^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*
+\s*\)\s*))|(?>\s+))*[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?^:(?^:(?>\s*
+\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(
+?>\s+))*))|(?^:(?>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(
+?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*"(?^:(?^:[^\\"])|(?^:\\(?^:[^
+\x0A\x0D])))*"(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[
+^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*)))+))?)(?^:(?>(?^:(?^:(?>\s*\((?
+:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s
++))*<(?^:(?^:(?^:(?>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\
+\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*(?^:(?>[^\x00-\x1F\x7F()<>\
+[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*))(?^:(?^:(?
+>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*
+))|(?>\s+))*))|(?^:(?>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^
+:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*"(?^:(?^:[^\\"])|(?^:\\(?
+^:[^\x0A\x0D])))*"(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(
+?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*)))\@(?^:(?^:(?>(?^:(?^:(?>\s
+*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|
+(?>\s+))*(?^:(?>[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x
+7F()<>\[\]:;@\\,."\s]+)*))(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))
+|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*))|(?^:(?>(?^:(?^:(?>
+\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*)
+)|(?>\s+))*\[(?:\s*(?^:(?^:[^\[\]\\])|(?^:\\(?^:[^\x0A\x0D]))))*\s*\]
+(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|)
+)*\s*\)\s*))|(?>\s+))*))))>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+)
+)|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*)))|(?^:(?^:(?^:(?>(
+?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))
+*\s*\)\s*))|(?>\s+))*(?^:(?>[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[
+^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]+)*))(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(
+?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*))|(?^:(?
+>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|
+))*\s*\)\s*))|(?>\s+))*"(?^:(?^:[^\\"])|(?^:\\(?^:[^\x0A\x0D])))*"(?^
+:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\
+s*\)\s*))|(?>\s+))*)))\@(?^:(?^:(?>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[
+^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*(?^:(?>[^\x0
+0-\x1F\x7F()<>\[\]:;@\\,."\s]+(?:\.[^\x00-\x1F\x7F()<>\[\]:;@\\,."\s]
++)*))(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D
+]))|))*\s*\)\s*))|(?>\s+))*))|(?^:(?>(?^:(?^:(?>\s*\((?:\s*(?^:(?^:(?
+>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))*\[(?:\s*(?
+^:(?^:[^\[\]\\])|(?^:\\(?^:[^\x0A\x0D]))))*\s*\](?^:(?^:(?>\s*\((?:\s
+*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D]))|))*\s*\)\s*))|(?>\s+))
+*)))))(?>(?^:(?>\s*\((?:\s*(?^:(?^:(?>[^()\\]+))|(?^:\\(?^:[^\x0A\x0D
+]))|))*\s*\)\s*))*)))),
Why does this have to be so complicated? |