Greetings, monks.
Recently while reinstalling XP on a home machine, I forgot to back up the Thunderbird address book - oops. But since Thunderbird uses mbox format mailboxes, I suspected Perl could assist. The following code is the result of that suspicion. I grab From and To headers and parse them for email addresses (and associated names), then generate a CSV file that Thunderbird can import as an address book.
#!perl -w # # Parse an mbox file (Thunderbird et al) and extract email addresses ( +and names, if any) # Generate a CSV file suitable for import into Thunderbird's Address B +ook # Written because /somebody/ deleted the address book :D # bjp 2008-12-21 # use strict; my %owner; # below line not split in actual code my $col_line = <<'EOL'; First Name,Last Name,Display Name,Nickname,Primary Email,Secondary Ema +il, Work Phone,Home Phone,Fax Number,Pager Number,Mobile Number,Home Addre +ss, Home Address 2,Home City,Home State,Home ZipCode,Home Country,Work Add +ress,Work Address 2, Work City,Work State,Work ZipCode,Work Country,Job Title,Department,Or +ganization,Web Page 1, Web Page 2,Birth Year,Birth Month,Birth Day,Custom 1,Custom 2,Custom 3 +,Custom 4,Notes, EOL my $input_mbox = $ARGV[0] || die "Usage: $0 <mbox file>\n"; my $to_buf = ''; my $within_to = 0; open( my $fh, '<', $input_mbox ) or die "open: $!\n"; while (<$fh>) { chomp; if (/^From:/) { s/^From://; s/^\s+//; parse_save_addrs($_); } if ( /^[-a-z]+:/i && $within_to ) { $within_to = 0; } if ( /^To:/ || $within_to ) { $within_to = 1; s/^To://; s/^\s+//; $to_buf .= $_; } if ( !$within_to && length $to_buf ) { $to_buf =~ s/\r\n|\n//g; #print "to_buf: $to_buf\n\n"; parse_save_addrs($to_buf); $to_buf = ''; } } print $col_line; for ( sort keys %owner ) { print ",,$owner{$_},,$_,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,\n"; } sub strip_lt_white { my ($str) = @_; $str =~ s/^\s+//; $str =~ s/\s+$//; $str; } sub parse_save_addrs { my ($addrs) = @_; my @list = split /,|;/, $addrs; for my $entry (@list) { $entry =~ s/^\s+//; $entry =~ s/\s+$//; if ($entry =~ / (?: (?:['"]*) # optional quotes ([^<'"]*) # anything up to end quotes or email delim (?:['"]*) # optional quotes \s* # optional whitespace <(.+)> # email in lt-gt delimiters ) | ([._a-z0-9]+\@[.-a-z0-9]+) /xi ) { #no warnings; #print "1=[$1] 2=[$2] 3=[$3]\n"; my $name = $1 || ''; my $email = $2 || $3; # skip junk next if $email =~ /<|>|=/ || $email =~ /mailto:/i; $name = strip_lt_white($name); $email = strip_lt_white($email); $owner{$email} ||= $name; #print "name=[$name], email=[$email]\n"; } else { print STDERR "Entry parse failed: $entry\n"; } # print "$entry\n"; } }
This was enough to 'get the job done'. But how could I improve that code? Is there any advantage to using a CSV module (e.g. Text::CSV)? Also, using that regex to parse the names and addresses strikes me as fragile. How would you monks attack this problem? Any criticism of the above code is welcome too.
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |