in reply to Re^2: Renaming html email dumps according to sender and date
in thread Renaming html email dumps according to sender and date

Hello, brethren!
This is what i did after your precious advices.
Now my mailman program correctly parses that messy html and extracts the right fields to use 'em in renaming the file itself according to name and date.
You should then check another post o'mine named "Epiphany" that I'll post in the "Meditations" section, as it's related to this script developing process...something i'm pretty silly!
It also checks for already existing filenames and adds a roman number to distinguish between them.
And, *WOW*, it's strict compliant! ;-)

Thanks to everyone, ++ you all!

Here's my code:
#!/usr/bin/perl -w # # mailman # $\="\n"; use strict; use HTML::TokeParser; use Date::Manip; $ARGV[0] || die "\n\tUsage: rp FILENAME\n\n"; my $p = HTML::TokeParser->new("$ARGV[0]") || die "\nthe file $ARGV[0] +doesn't exist.\n\n";
######################################## # # insert here the list of known names # ######################################## my %nomi=( 'elenaleonardi' => 'ele', 'lellobove' => 'cte', 'dark.prg' => 'mazzini', 'robertopar' => 'parolisi', 'meloinfo' => 'melo', 'massimogab' => 'max', 'simofiore' => 'simone', 'mpagnucci'=> 'pagnucci', 'zeromega'=> 'pagnucci', 'melojunior'=> 'melo', 'mmelillo'=> 'melo', 'marcomelillo'=> 'melo', 'matteobagn'=> 'matteo', 'uncoucou' => 'lorena', 'zappagalattica' => 'zappa', 'gavrilus' => 'zappa', 'carloalbertodue' => 'carlone', 'bugman996' => 'bug', 'michel' => 'ziobudda', 'Bagnoli' => 'matteo', 'salciaiola' => 'stegualerci', ); ######################################## # # hash of main data types # ######################################## my %hash; while (my $t = $p->get_token){ next unless $t->[0] eq 'S' and $t->[1] eq 'th' and defined $t->[2]{'class'} and $t->[2]{'class'} eq 'DatTh'; my $key = $p->get_text('/th'); chop $key; $p->get_tag('td'); my $value = $p->get_text('/td'); $value =~ s/^\s*//; $value =~ s/\s*\[Add\]\s*$//; $value=~s/[AP]M.*$//; $hash{$key} = $value; } my $name=$hash{From}; my $date=$hash{Date}; &Date_Init("TZ=GMT","DateFormat=UK"); $date=UnixDate($date,"%y%m%d"); $name=~s/\@.*//; $name=~s/.*\"//; $name=~s/.*\<//; if (exists $nomi{$name}) { $name=$nomi{$name}; } $name=$name.$date.".html"; ########################################### # # find contemporaries and number'em # using roman numbers upto V. more than V # messages a day from the same folk are # *EVIL* !!! :-P # ########################################### if (-e $name) { $name=~s/.html/II.html/; } if (-e $name) { $name=~s/II.html/III.html/; } if (-e $name) { $name=~s/III.html/IV.html/; } if (-e $name) { $name=~s/IV.html/V.html/; } rename $ARGV[0], $name if (! -e $name) || die "\nToo many! Please rena +me manually.\n\n"; print "\nRenamed as: "; print "$name\n\n";

Replies are listed 'Best First'.
Re: My solution
by Mr_Jon (Monk) on Aug 23, 2004 at 17:22 UTC
    A nice way of adding incremental roman numerals, as you do at the end of your script, would be to use the Math::Roman module (handily mentioned in the Perl Cookbook). This would impose no arbitrary limit on the messages you receive from the same person - it even goes above the 'highest' Roman numeral of 5000.
    #!/usr/bin/perl -w use strict; use Math::Roman qw(roman); my $roman = new Math::Roman; $\ = "\n"; while (<DATA>) { chomp; my $old_name = $_; if ($old_name =~ /^(.+\d{6})(\w+)?(\.htm)$/) { my $old_roman = $2 || 1; my $new_roman = roman("$old_roman") + 1; my $new_name = $1 . $new_roman . $3; print "$old_name => $new_name"; } } __DATA__ filename230804.htm filename230804IV.htm filename230804V.htm filename230804II.htm filename230804X.htm
    Output:
    filename230804.htm => filename230804II.htm filename230804IV.htm => filename230804V.htm filename230804V.htm => filename230804VI.htm filename230804II.htm => filename230804III.htm filename230804X.htm => filename230804XI.htm
    Of course, you could simply append 'normal' numbers to achieve the same result, but that wouldn't be half as much fun...