Parsing a csv file from Exchange

cajun has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Parsing a csv file from Exchange by vladb (Vicar) on May 21, 2002 at 00:01 UTC
If you are only interested in pulling out the SMTP email address from the field, you could get it to work this way: `# ... up to this point, # you must have read 1 line from the file ... # and (for sake of example) the email field saved in into $email_field + variable. my @emails; while ($email_field =~ m/SMTP:([^\%]+)\%/g) { # validate matched email first... (as per UPDATE) push @emails, $1; # add email address to the list } # @emails array now contains all SMTP email addresses # found in the field.` [download] Here, I look for any sub-string in the field bounded with string 'SMTP' on the left and '%' character on the right. Option `/g` allows regular expression to pick successive occurances of a matched string (on every loop cycle). You could read more on this and many other features of Perl regular expressions here. UPDATE: thanks jeffa. The reason I used the `while` loop is to also possibly include some email address validation inside the loop for every match. Therefore, only 'correct' email addresses would go in the array. Of course, a one-liner would work better if no such thing was required or validation occurred later on the @emails array. _____________________ $"=q;grep;;$,=q"grep";for(`find . -name ".saves~"`){s;$/;;;/(.-(\d+) +-.*)$/;$_=&#91"ps -e -o pid \| "," $2 \| "," -v "," "]`@$_`?{print" ++ $1"}:{print"- $1"}&&`rm $1`;print"\n";} [download]	[reply] [d/l] [select]
(jeffa) 2Re: Parsing a csv file from Exchange by jeffa (Bishop) on May 21, 2002 at 00:12 UTC
No need for the while loop and push, and you want to drop the last % sign in case the field in question is the last one: `my @emails = $email_field =~ m/SMTP:([^\%]+)/g;` [download] Note to cajun: you do know about the Text::CSV CPAN modules, correct? jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l]
Re: (jeffa) 2Re: Parsing a csv file from Exchange by cajun (Chaplain) on May 21, 2002 at 06:28 UTC
Thanks to all. This works fine as long as SMTP is the first address listed in $email_field. Unfortunately Bill doesn't always put SMTP first. (see my example above). If anything other than SMTP is first, this breaks.	[reply]
Re: Parsing a csv file from Exchange by greenFox (Vicar) on May 21, 2002 at 03:34 UTC
Others have provided specific re's to extract the SMTP address, here is a more generic solution which will grab all the fields into a structure- `while (<DATA>){ chomp; my %rec=(); while ( / ([^:]) # any amount of non ":"'s -better as ([A-Z0-9])? : # followed by a colon ([^%]) # any amount of non "%"'s %{0,1} # followed by 0 or 1 % characters /xg ){ $rec{$1}=$2; } print $rec{SMTP}, "\n"; # for example } __DATA__ SMTP:makeusbetter@domain.com%X400:c=US;a= ;p=domain ;o=domain;s=makeus +better;%CCMAIL:makeusbetter at domain% CCMAIL:Hunter, Sandy at domain%MS:domainin/domain/sandy%SMTP:sandyh@do +main.com%X400:c=US;a= ;p=domain` [download] The second line of your sample data did not end with an % which I suspect it normally would so I made the % char optional, I think for your actual data you could change the re to `/([^:]):([^%])%/g` -- my $chainsaw = 'Perl';*	[reply] [d/l] [select]
(jeffa) 2Re: Parsing a csv file from Exchange by jeffa (Bishop) on May 21, 2002 at 14:30 UTC
Very nice, but instead of throwing away the hash each loop, why not store it in an array? `use strict; use Data::Dumper; my @rec; while (<DATA>){ chomp; my %rec; $rec{$1}=$2 while /([^:]):([^%])%{0,1}/xg; push @rec,{%rec}; } print Dumper \@rec; print $rec[0]->{SMTP}, "\n"; print $rec[1]->{SMTP}, "\n";` [download] jeffa L-LL-L--L-LL-L--L-LL-L-- -R--R-RR-R--R-RR-R--R-RR B--B--B--B--B--B--B--B-- H---H---H---H---H---H--- (the triplet paradiddle with high-hat)	[reply] [d/l]
Re: Parsing a csv file from Exchange by strat (Canon) on May 21, 2002 at 10:12 UTC
I'd just do it with an ordinary split... e.g. `my %emails = (); foreach ( split(/\%/, $emailcolumn) ) { if ( my ($key, $value) = split(/\s\:\s/, $_, 2) ) { push ( @{ $emails{$key} }, $value); } } # foreach # do something with %emails` [download] Then you've got a data structure as the following (Hash of Arrays): `%emails = ( SMTP => [ 'makeusbetter@domain.com', ...], X400 => [ 'c=US;a= ;p=domain ;o=domain;s=makeusbetter', ... ], smtp => [ ..., ...], CCMAIL => [..., ...] );` [download] Beware that exchange sometime delivers SMTP and smtp adresses. I think that the SMTP: is the main address, whereas the smtp: are additional addresses, but am not absolutely sure. Best regards, perl -le "s==F=e=>y~\martinF~stronat~=>s~[^\w]~~g=>chop,print"	[reply] [d/l] [select]
Re: Parsing a csv file from Exchange by arunhorne (Pilgrim) on May 21, 2002 at 00:07 UTC
If you only want the SMTP address (which appears in your example to always be tranferred using the X400 protocol) then something like the following may form a suitable basis for work: `$line = "SMTP:makeusbetter@domain.com%X400:c=US;a= ;p=domain ;o=domain +;s=makeusbetter;%CCMAIL:makeusbetter at domain%"; if ($line =~ m/SMTP\:(.*)\%X400/) { print "Email address is: $1\n"; }` [download] Arun	[reply] [d/l]