Greetings, monks.

Recently while reinstalling XP on a home machine, I forgot to back up the Thunderbird address book - oops. But since Thunderbird uses mbox format mailboxes, I suspected Perl could assist. The following code is the result of that suspicion. I grab From and To headers and parse them for email addresses (and associated names), then generate a CSV file that Thunderbird can import as an address book.

#!perl -w # # Parse an mbox file (Thunderbird et al) and extract email addresses ( +and names, if any) # Generate a CSV file suitable for import into Thunderbird's Address B +ook # Written because /somebody/ deleted the address book :D # bjp 2008-12-21 # use strict; my %owner; # below line not split in actual code my $col_line = <<'EOL'; First Name,Last Name,Display Name,Nickname,Primary Email,Secondary Ema +il, Work Phone,Home Phone,Fax Number,Pager Number,Mobile Number,Home Addre +ss, Home Address 2,Home City,Home State,Home ZipCode,Home Country,Work Add +ress,Work Address 2, Work City,Work State,Work ZipCode,Work Country,Job Title,Department,Or +ganization,Web Page 1, Web Page 2,Birth Year,Birth Month,Birth Day,Custom 1,Custom 2,Custom 3 +,Custom 4,Notes, EOL my $input_mbox = $ARGV[0] || die "Usage: $0 <mbox file>\n"; my $to_buf = ''; my $within_to = 0; open( my $fh, '<', $input_mbox ) or die "open: $!\n"; while (<$fh>) { chomp; if (/^From:/) { s/^From://; s/^\s+//; parse_save_addrs($_); } if ( /^[-a-z]+:/i && $within_to ) { $within_to = 0; } if ( /^To:/ || $within_to ) { $within_to = 1; s/^To://; s/^\s+//; $to_buf .= $_; } if ( !$within_to && length $to_buf ) { $to_buf =~ s/\r\n|\n//g; #print "to_buf: $to_buf\n\n"; parse_save_addrs($to_buf); $to_buf = ''; } } print $col_line; for ( sort keys %owner ) { print ",,$owner{$_},,$_,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,\n"; } sub strip_lt_white { my ($str) = @_; $str =~ s/^\s+//; $str =~ s/\s+$//; $str; } sub parse_save_addrs { my ($addrs) = @_; my @list = split /,|;/, $addrs; for my $entry (@list) { $entry =~ s/^\s+//; $entry =~ s/\s+$//; if ($entry =~ / (?: (?:['"]*) # optional quotes ([^<'"]*) # anything up to end quotes or email delim (?:['"]*) # optional quotes \s* # optional whitespace <(.+)> # email in lt-gt delimiters ) | ([._a-z0-9]+\@[.-a-z0-9]+) /xi ) { #no warnings; #print "1=[$1] 2=[$2] 3=[$3]\n"; my $name = $1 || ''; my $email = $2 || $3; # skip junk next if $email =~ /<|>|=/ || $email =~ /mailto:/i; $name = strip_lt_white($name); $email = strip_lt_white($email); $owner{$email} ||= $name; #print "name=[$name], email=[$email]\n"; } else { print STDERR "Entry parse failed: $entry\n"; } # print "$entry\n"; } }

This was enough to 'get the job done'. But how could I improve that code? Is there any advantage to using a CSV module (e.g. Text::CSV)? Also, using that regex to parse the names and addresses strikes me as fragile. How would you monks attack this problem? Any criticism of the above code is welcome too.


Life is denied by lack of attention,
whether it be to cleaning windows
or trying to write a masterpiece...
-- Nadia Boulanger

In reply to Reconstructing a Thunderbird address book from an mbox file by missingthepoint

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.