cajun has asked for the wisdom of the Perl Monks concerning the following question:

I'm trying to parse an exported csv listing of the addresses from our Exchange server. I can split on the commas and get the various fields with one exception. One of the fields is Email Addresses which contains SMTP, MS, CC, & X400 addresses in seemingly no particular order. Here is couple of samples of this field:

SMTP:makeusbetter@domain.com%X400:c=US;a= ;p=domain ;o=domain;s=makeusbetter;%CCMAIL:makeusbetter at domain%
CCMAIL:Hunter, Sandy at domain%MS:domainin/domain/sandy%SMTP:sandyh@domain.com%X400:c=US;a= ;p=domain

Question is, how can parse just the portion of the address I want from this field ? I'm really only interested in the SMTP address. If there is an easier way of accomplishing this (from unix) pointers would be appreciated.

Replies are listed 'Best First'.
Re: Parsing a csv file from Exchange
by vladb (Vicar) on May 21, 2002 at 00:01 UTC
    If you are only interested in pulling out the SMTP email address from the field, you could get it to work this way:
    # ... up to this point, # you must have read 1 line from the file ... # and (for sake of example) the email field saved in into $email_field + variable. my @emails; while ($email_field =~ m/SMTP:([^\%]+)\%/g) { # validate matched email first... (as per UPDATE) push @emails, $1; # add email address to the list } # @emails array now contains all SMTP email addresses # found in the field.
    Here, I look for any sub-string in the field bounded with string 'SMTP' on the left and '%' character on the right. Option /g allows regular expression to pick successive occurances of a matched string (on every loop cycle). You could read more on this and many other features of Perl regular expressions here.

    UPDATE: thanks jeffa. The reason I used the while loop is to also possibly include some email address validation inside the loop for every match. Therefore, only 'correct' email addresses would go in the array. Of course, a one-liner would work better if no such thing was required or validation occurred later on the @emails array.

    _____________________
    $"=q;grep;;$,=q"grep";for(`find . -name ".saves*~"`){s;$/;;;/(.*-(\d+) +-.*)$/;$_=&#91"ps -e -o pid | "," $2 | "," -v "," "]`@$_`?{print" ++ $1"}:{print"- $1"}&&`rm $1`;print"\n";}
      No need for the while loop and push, and you want to drop the last % sign in case the field in question is the last one:
      my @emails = $email_field =~ m/SMTP:([^\%]+)/g;
      Note to cajun: you do know about the Text::CSV CPAN modules, correct?

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
        Thanks to all. This works fine as long as SMTP is the first address listed in $email_field. Unfortunately Bill doesn't always put SMTP first. (see my example above). If anything other than SMTP is first, this breaks.
Re: Parsing a csv file from Exchange
by greenFox (Vicar) on May 21, 2002 at 03:34 UTC
    Others have provided specific re's to extract the SMTP address, here is a more generic solution which will grab all the fields into a structure-
    while (<DATA>){ chomp; my %rec=(); while ( / ([^:]*) # any amount of non ":"'s -better as ([A-Z0-9]*)? : # followed by a colon ([^%]*) # any amount of non "%"'s %{0,1} # followed by 0 or 1 % characters /xg ){ $rec{$1}=$2; } print $rec{SMTP}, "\n"; # for example } __DATA__ SMTP:makeusbetter@domain.com%X400:c=US;a= ;p=domain ;o=domain;s=makeus +better;%CCMAIL:makeusbetter at domain% CCMAIL:Hunter, Sandy at domain%MS:domainin/domain/sandy%SMTP:sandyh@do +main.com%X400:c=US;a= ;p=domain

    The second line of your sample data did not end with an % which I suspect it normally would so I made the % char optional, I think for your actual data you could change the re to /([^:]*):([^%]*)%/g

    --
    my $chainsaw = 'Perl';

      Very nice, but instead of throwing away the hash each loop, why not store it in an array?
      use strict; use Data::Dumper; my @rec; while (<DATA>){ chomp; my %rec; $rec{$1}=$2 while /([^:]*):([^%]*)%{0,1}/xg; push @rec,{%rec}; } print Dumper \@rec; print $rec[0]->{SMTP}, "\n"; print $rec[1]->{SMTP}, "\n";

      jeffa

      L-LL-L--L-LL-L--L-LL-L--
      -R--R-RR-R--R-RR-R--R-RR
      B--B--B--B--B--B--B--B--
      H---H---H---H---H---H---
      (the triplet paradiddle with high-hat)
      
Re: Parsing a csv file from Exchange
by strat (Canon) on May 21, 2002 at 10:12 UTC
    I'd just do it with an ordinary split... e.g.
    my %emails = (); foreach ( split(/\%/, $emailcolumn) ) { if ( my ($key, $value) = split(/\s*\:\s*/, $_, 2) ) { push ( @{ $emails{$key} }, $value); } } # foreach # do something with %emails
    Then you've got a data structure as the following (Hash of Arrays):
    %emails = ( SMTP => [ 'makeusbetter@domain.com', ...], X400 => [ 'c=US;a= ;p=domain ;o=domain;s=makeusbetter', ... ], smtp => [ ..., ...], CCMAIL => [..., ...] );
    Beware that exchange sometime delivers SMTP and smtp adresses. I think that the SMTP: is the main address, whereas the smtp: are additional addresses, but am not absolutely sure.

    Best regards,
    perl -le "s==*F=e=>y~\*martinF~stronat~=>s~[^\w]~~g=>chop,print"

Re: Parsing a csv file from Exchange
by arunhorne (Pilgrim) on May 21, 2002 at 00:07 UTC

    If you only want the SMTP address (which appears in your example to always be tranferred using the X400 protocol) then something like the following may form a suitable basis for work:

    $line = "SMTP:makeusbetter@domain.com%X400:c=US;a= ;p=domain ;o=domain +;s=makeusbetter;%CCMAIL:makeusbetter at domain%"; if ($line =~ m/SMTP\:(.*)\%X400/) { print "Email address is: $1\n"; }

    Arun