BlueLines has asked for the wisdom of the Perl Monks concerning the following question:
As part of my current contract, I've got to deal with a text field that contains email addresses. These addresses should be seperated by whitespace or commas or combinations thereof (It's going to be used by salespeople, so simply telling them to follow a specific format is out of the question). Here's some example input:
foo@foo.com bar@bar.com, foo@foobar.com
bar@foo.com , bar@foobar.com
etc. I struggled with a regex for an hour or so, and came up with this:
my $addresses = $form_fields->{addresses};
my @addresses = map( $_ =~/(\S+)/, split /\b(?:\s+|(?:\s*,\s*))\b/, $
+addresses);
where $addresses contains the data POST'ed from the textarea. I feel like this should be possible without the map, but I couldn't make it work (any regex I came up with would choke on leading / trailing whitespace). Anyone care to turn this into one regex?
BlueLines
Disclaimer: This post may contain inaccurate information, be habit forming, cause atomic warfare between peaceful countries, speed up male pattern baldness, interfere with your cable reception, exile you from certain third world countries, ruin your marriage, and generally spoil your day. No batteries included, no strings attached, your mileage may vary.
Re: Parsing data that may or may not be CSV
by gav^ (Curate) on Apr 19, 2002 at 03:10 UTC
|
my $data = join '', <DATA>;
$data =~ s/,/ /g;
my @addresses = $data =~ /(\S+)/g;
print join "\n", @addresses;
__DATA__
foo@foo.com bar@bar.com, foo@foobar.com
bar@foo.com , bar@foobar.com
This should work find as email addresses shouldn't contain a comma.
update: thanks to dws for suggesting to subsitute a space for a comma.
gav^ | [reply] [d/l] |
Re: Parsing data that may or may not be CSV
by dws (Chancellor) on Apr 19, 2002 at 02:55 UTC
|
Try a different split regexp. This one seems to work, though it may be just a step in the right direction:
while ( <DATA> ) {
print join "\n", split /(?:\s+,?\s*|,\s*)/, $_;
print "\n";
}
__DATA__
foo@foo.com bar@bar.com, foo@foobar.com
bar@foo.com , bar@foobar.com
| [reply] [d/l] |
|
while ( <DATA> ) {
push @addrs, split /(?:\s+,?\s*|,\s*)/, $_;
}
print Dumper \@addrs;
__DATA__
foo@foo.com bar@bar.com, foo@foobar.com
bar@foo.com , bar@foobar.com
produces the following output:
$VAR1 = [
'',
'foo@foo.com',
'bar@bar.com',
'foo@foobar.com',
'bar@foo.com',
'bar@foobar.com'
];
That empty entry at $addrs[0] is what I'm trying to avoid.
BlueLines
Disclaimer: This post may contain inaccurate information, be habit forming, cause atomic warfare between peaceful countries, speed up male pattern baldness, interfere with your cable reception, exile you from certain third world countries, ruin your marriage, and generally spoil your day. No batteries included, no strings attached, your mileage may vary. | [reply] [d/l] [select] |
Re: Parsing data that may or may not be CSV
by Zaxo (Archbishop) on Apr 19, 2002 at 03:12 UTC
|
Parsing email addresses with a regex is not easy, Mail::Address should help.
For splitting the list,
my @addresses = split /(?:\s|,)+/, $cgi->param('addresses');
The (?:..) construct is for grouping only, without setting $1. It's used to prevent split from inserting separators as extra array elements. A character class of whitespace and comma could be used instead.
After Compline, Zaxo | [reply] [d/l] |
|
This doesn't work either:
#!/usr/bin/perl
use Data::Dumper;
my $addresses = ' foo@foo.com bar@bar.com, foo@foobar.com
bar@foo.com , bar@foobar.com ';
my @addresses = split /(?:\s|,)+/, $addresses;
print Dumper \@addresses;
produces:
$VAR1 = [
'',
'foo@foo.com',
'bar@bar.com',
'foo@foobar.com',
'bar@foo.com',
'bar@foobar.com'
];
I agree that parsing email addresses via a regex is next to impossible, but fortunately for me I'm not required to verify the addresses :-)
BlueLines
Disclaimer: This post may contain inaccurate information, be habit forming, cause atomic warfare between peaceful countries, speed up male pattern baldness, interfere with your cable reception, exile you from certain third world countries, ruin your marriage, and generally spoil your day. No batteries included, no strings attached, your mileage may vary. | [reply] [d/l] [select] |
|
|