Beefy Boxes and Bandwidth Generously Provided by pair Networks
There's more than one way to do things
 
PerlMonks  

Parsing data that may or may not be CSV

by BlueLines (Hermit)
on Apr 19, 2002 at 02:47 UTC ( [id://160415]=perlquestion: print w/replies, xml ) Need Help??

BlueLines has asked for the wisdom of the Perl Monks concerning the following question:

As part of my current contract, I've got to deal with a text field that contains email addresses. These addresses should be seperated by whitespace or commas or combinations thereof (It's going to be used by salespeople, so simply telling them to follow a specific format is out of the question). Here's some example input:
foo@foo.com bar@bar.com, foo@foobar.com bar@foo.com , bar@foobar.com
etc. I struggled with a regex for an hour or so, and came up with this:
my $addresses = $form_fields->{addresses}; my @addresses = map( $_ =~/(\S+)/, split /\b(?:\s+|(?:\s*,\s*))\b/, $ +addresses);
where $addresses contains the data POST'ed from the textarea. I feel like this should be possible without the map, but I couldn't make it work (any regex I came up with would choke on leading / trailing whitespace). Anyone care to turn this into one regex?

BlueLines

Disclaimer: This post may contain inaccurate information, be habit forming, cause atomic warfare between peaceful countries, speed up male pattern baldness, interfere with your cable reception, exile you from certain third world countries, ruin your marriage, and generally spoil your day. No batteries included, no strings attached, your mileage may vary.

Replies are listed 'Best First'.
Re: Parsing data that may or may not be CSV
by gav^ (Curate) on Apr 19, 2002 at 03:10 UTC
    What about:
    my $data = join '', <DATA>; $data =~ s/,/ /g; my @addresses = $data =~ /(\S+)/g; print join "\n", @addresses; __DATA__ foo@foo.com bar@bar.com, foo@foobar.com bar@foo.com , bar@foobar.com
    This should work find as email addresses shouldn't contain a comma.

    update: thanks to dws for suggesting to subsitute a space for a comma.

    gav^

Re: Parsing data that may or may not be CSV
by dws (Chancellor) on Apr 19, 2002 at 02:55 UTC
    Try a different split regexp. This one seems to work, though it may be just a step in the right direction:
    while ( <DATA> ) { print join "\n", split /(?:\s+,?\s*|,\s*)/, $_; print "\n"; } __DATA__ foo@foo.com bar@bar.com, foo@foobar.com bar@foo.com , bar@foobar.com
      This doesn't work:
      while ( <DATA> ) { push @addrs, split /(?:\s+,?\s*|,\s*)/, $_; } print Dumper \@addrs; __DATA__ foo@foo.com bar@bar.com, foo@foobar.com bar@foo.com , bar@foobar.com
      produces the following output:
      $VAR1 = [ '', 'foo@foo.com', 'bar@bar.com', 'foo@foobar.com', 'bar@foo.com', 'bar@foobar.com' ];
      That empty entry at $addrs[0] is what I'm trying to avoid.

      BlueLines

      Disclaimer: This post may contain inaccurate information, be habit forming, cause atomic warfare between peaceful countries, speed up male pattern baldness, interfere with your cable reception, exile you from certain third world countries, ruin your marriage, and generally spoil your day. No batteries included, no strings attached, your mileage may vary.
Re: Parsing data that may or may not be CSV
by Zaxo (Archbishop) on Apr 19, 2002 at 03:12 UTC

    Parsing email addresses with a regex is not easy, Mail::Address should help.

    For splitting the list,

    my @addresses = split /(?:\s|,)+/, $cgi->param('addresses');
    The (?:..) construct is for grouping only, without setting $1. It's used to prevent split from inserting separators as extra array elements. A character class of whitespace and comma could be used instead.

    After Compline,
    Zaxo

      This doesn't work either:
      #!/usr/bin/perl use Data::Dumper; my $addresses = ' foo@foo.com bar@bar.com, foo@foobar.com bar@foo.com , bar@foobar.com '; my @addresses = split /(?:\s|,)+/, $addresses; print Dumper \@addresses;
      produces:
      $VAR1 = [ '', 'foo@foo.com', 'bar@bar.com', 'foo@foobar.com', 'bar@foo.com', 'bar@foobar.com' ];
      I agree that parsing email addresses via a regex is next to impossible, but fortunately for me I'm not required to verify the addresses :-)

      BlueLines

      Disclaimer: This post may contain inaccurate information, be habit forming, cause atomic warfare between peaceful countries, speed up male pattern baldness, interfere with your cable reception, exile you from certain third world countries, ruin your marriage, and generally spoil your day. No batteries included, no strings attached, your mileage may vary.

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: perlquestion [id://160415]
Approved by dws
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others wandering the Monastery: (3)
As of 2024-04-20 14:16 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found