Category: |
Text Processing |
Author/Contact Info |
Tim Lewis (LewisT@UAH.EDU) |
Description: |
PINE is a common text-based email viewer on many UNIX systems. The PINE program stores email in large text files which makes it very handy to archive your old email... except that there's no table of contents at the beginning of the file to let you know what messages are stored there. This script solves that problem by parsing the PINE email store and creating a separate table of contents from the headers of each email. The resulting TOC lists the message number, title, sender info and date in formatted columns. I usually concatinate the TOC and email storage file, and then save the resulting file in my email archives.
Note: This script works very well with version 3.96 of PINE, which I use, but there are other versions that I have not tested it on.
PLEASE comment on this code. I'm a fairly new perl programmer and would appreciate feedback on how to improve my programming.
|
#!/usr/bin/perl
use warnings;
use strict;
if (!$ARGV[0]) {
print "Usage: pinetoc inputfile outputfile\n";
die;
}
open (INFILE, "<$ARGV[0]") or die "Could not open input file!\n";
open (OUTFILE, ">$ARGV[1]") or die "Could not open input file!\n";
##### Variables #####
my $From = ""; # Used to store the from address
my $Subject = ""; # Used to store the subject
my $Date = ""; # Used to store the message date
my $LetterNum = 0; # Counts the number of emails
my $HeaderFlag = -1; # Flag is < 0 when we're searching for a n
+ew email
# Flag is > 0 and < 7 when we're getting header info
# Flag is > 6 when we've found all the header info
##### Main Loop #####
while (<INFILE>){
# Look for a new message (all messages have a header line beginnin
+g "X-UIDL: ")
if (/^X-UIDL: \w{32}/) {
if ($HeaderFlag > 0) {
# We haven't got all the header info yet... but we'll writ
+e anyway
&WriteTOCline ($LetterNum, $From, $Subject, $Date, $Header
+Flag);
}
$LetterNum++;
# Clear the message data variables
$HeaderFlag = 0;
$From = "";
$Subject = "";
$Date = "";
}
if ($HeaderFlag < 0) {
# Do nothing -- already found the header info, so we're search
+ing for a new letter
}
elsif ($HeaderFlag < 7) {
if ($_ =~ "^From:") {
s/(From: |"|(\[|<)[^\]>](\]|>)|\n)//g; # remove a bunch
+a stuff to isolate the name
s/^\s*|\s*$//g; # remove leading or trailin
+g whitespace
$From = $_;
$HeaderFlag += 1;
}
elsif ($_ =~ "^Subject:") {
s/Subject:|\n//g; # remove stuff to isolate the
+ subject
s/^\s*|\s*$//g; # remove leading or trailin
+g whitespace
$Subject = $_;
if ($Subject eq "") {
$Subject = "(Blank subject)";
}
$HeaderFlag += 2;
}
elsif ($_ =~ "^Date:") {
($Date) = ($_ =~ /Date: (\w+, \w+ \w+ \w+)/);
$HeaderFlag += 4;
}
}
else {
# We've got all the header info
&WriteTOCline ($LetterNum, $From, $Subject, $Date, $HeaderFlag
+);
$HeaderFlag = -1;
}
}
close INFILE;
close OUTFILE;
exit 0;
##### Subroutine for writing the TOC #####
sub WriteTOCline {
my($LetterNum, $From, $Subject, $Date, $HeaderFlag) = @_;
my @Error = ("","From", "Subject", "", "Date");
my $ErrorNum = $HeaderFlag ^ 7;
if ($ErrorNum > 7) {
print "Error: Too much header info in letter $LetterNum titled
+ '$Subject'\n";
}
elsif ($ErrorNum >0) {
print "Error: Missing '$Error[$ErrorNum]' field in message $Le
+tterNum\n";
}
# Write to output file (all cases)
printf OUTFILE "%-4d %-30.30s %-20.20s %-16.16s\n", $LetterNum,
+ $Subject, $From, $Date or die "Could not write to output file!\n";
}
|
(jjhorner)PineTOC
by jjhorner (Hermit) on Aug 01, 2000 at 04:35 UTC
|
Pretty good code, from just a quick peek, but
even though you are declaring your variables, you aren't
checking up on yourself with the warnings and strict pragmas.
Please use them. Even experienced Perl programmers use
them.
"-w" (or "use warnings") and "use strict" are
your friends.
cut-n-paste the following code and run it as your punishment.
#!/usr/bin/perl -w
use strict;
my $i;
for($i = 0; $i < 100; $i++) {
print "I will use strict and warnings.\n";
};
J. J. Horner
Linux, Perl, Apache, Stronghold, Unix
jhorner@knoxlug.org http://www.knoxlug.org/
| [reply] [d/l] |
|
Thanks for your input! I updated my code, and ran my penance program like a good monk. =)
| [reply] |
RE: PineTOC
by splinky (Hermit) on Aug 01, 2000 at 08:28 UTC
|
First off, not a bad bit of code. I notice that you're
checking the returns from your opens, which
is a very good thing.
I can't help but wonder why, in WriteTOCline,
you take the two-step approach of sprintf
followed by print instead of just using
printf.
And now, a few more Perlish ways to do a few things:
You can shorten if ($_ =~ /^X-UIDL: \w{32}/) {
to if (/^X-UIDL: \w{32}/) {. The $_
is implied on matches unless another variable is explicitly
used.
All instances of $Variable = $Variable + n
can be shortened to $Variable += n with no
loss of readability to anyone who knows Perl (or C, for that
matter).
Probably the biggest change you could make, and one which
would be very educational for you, would be to read RFC
822, which defines the format of email messages, and use
that knowledge to set $/ to something useful
so that you could slurp up whole messages at a time instead
of reading them one line at a time.
Finally, I'll rain on your parade a bit by telling you
that you're reinventing the wheel. If you want the
semi-official Perl package for handling email, have a look
at Graham Barr's
MailTools bundle.
*Woof* | [reply] [d/l] [select] |
|
Thanks for your input. I updated my code based on your comments.
I originally used the "sprintf" followed by "print" because I didn't know the "printf" command would take formatting. Thanks for pointing this out!
I've read parts of RFC822 in the past, but I'm not sure how relavant it would be to this situation. PINE stores its messages with all the RFC headers, true.. but is the PINE message store totally 822 compliant? Maybe it is, but I assume that PINE probably changes the formatting of the messages and headers slightly when it stores them. Certainly, the messages in the store don't end in a single period on a line by itself (the signal for the end of an SMTP email). Still, I'm sure you're right in saying that there is a more efficient way to "slurp up whole messages".
Thanks for the reference to MailTools. I'll take a look.
Tally
| [reply] |
|
|