Re^2: Renaming html email dumps according to sender and date

Just to add to this excellent reply, that you can use the Date::Manip module to change the extracted date from a string to YYMMDD format in order to create a date-stamped output filename (after stripping the extra date info in brackets).

Although this module could well be considered overkill for this simple task, it can deal with a huge variety of input formats, so is ideal for a situation (like this) where the expected data is subject to change outside your control.

#! usr/bin/perl -w
use strict;

$\ = "\n";

use Date::Manip;
&Date_Init("TZ=GMT","DateFormat=UK"); # or whatever

while (<DATA>) {
    print UnixDate($_,"%y%m%d");
}

__DATA__
Mon,  9 Aug 2004 10:28 AM
9th August 2004 10:28
10:28 9/8/2004 
Aug 9 2004 10:28
10:28 09 August 2004
[download]

Comment on Re^2: Renaming html email dumps according to sender and date Download Code

Replies are listed 'Best First'.
My solution by Sigmund (Pilgrim) on Aug 21, 2004 at 16:08 UTC
Hello, brethren! This is what i did after your precious advices. Now my mailman program correctly parses that messy html and extracts the right fields to use 'em in renaming the file itself according to name and date. You should then check another post o'mine named "Epiphany" that I'll post in the "Meditations" section, as it's related to this script developing process...something i'm pretty silly! It also checks for already existing filenames and adds a roman number to distinguish between them. And, WOW, it's strict compliant! ;-) Thanks to everyone, ++ you all! Here's my code: `#!/usr/bin/perl -w # # mailman # $\="\n"; use strict; use HTML::TokeParser; use Date::Manip; $ARGV[0] \|\| die "\n\tUsage: rp FILENAME\n\n"; my $p = HTML::TokeParser->new("$ARGV[0]") \|\| die "\nthe file $ARGV[0] +doesn't exist.\n\n";` [download] Read more... (3 kB)	[reply] [d/l] [select]
Re: My solution by Mr_Jon (Monk) on Aug 23, 2004 at 17:22 UTC
A nice way of adding incremental roman numerals, as you do at the end of your script, would be to use the Math::Roman module (handily mentioned in the Perl Cookbook). This would impose no arbitrary limit on the messages you receive from the same person - it even goes above the 'highest' Roman numeral of 5000. `#!/usr/bin/perl -w use strict; use Math::Roman qw(roman); my $roman = new Math::Roman; $\ = "\n"; while (<DATA>) { chomp; my $old_name = $_; if ($old_name =~ /^(.+\d{6})(\w+)?(\.htm)$/) { my $old_roman = $2 \|\| 1; my $new_roman = roman("$old_roman") + 1; my $new_name = $1 . $new_roman . $3; print "$old_name => $new_name"; } } __DATA__ filename230804.htm filename230804IV.htm filename230804V.htm filename230804II.htm filename230804X.htm` [download] Output: `filename230804.htm => filename230804II.htm filename230804IV.htm => filename230804V.htm filename230804V.htm => filename230804VI.htm filename230804II.htm => filename230804III.htm filename230804X.htm => filename230804XI.htm` [download] Of course, you could simply append 'normal' numbers to achieve the same result, but that wouldn't be half as much fun...	[reply] [d/l] [select]