uxbod has asked for the wisdom of the Perl Monks concerning the following question:
I am working on a project where I need to be able to extract attachments from an email that have been added in Simplified Chinese. The problem is that when they are extract to the file system I end up with names like ????.doc!
I put together a little test script to show what I mean:
#!/usr/bin/perl use MIME::Parser; use MIME::Parser::Filer; my $tempdir = "extract"; ( -d $tempdir) or mkdir $tempdir, 0755 or die "mkdir: $!"; my $parser = new MIME:arser; $parser->output_under("/home/uxbod/extract"); $parser->extract_uuencode(1); $entity = $parser->parse_open("/home/uxbod/testmessage"); foreach my $part ($entity->parts_DFS) { next if (!$part->bodyhandle); my $rec_filename = $part->head->recommended_filename; my $filename = $part->bodyhandle->path; print "Recommended: $rec_filename Alternative : $filename\n"; } $parser->filer->purge; rmtree $tempdir;
and when this runs I see the following output:
[uxbod@gateway ~]# ./testextract.pl ignoring text in character set `GB2312' at /usr/share/perl5/MIME/Parser/Filer.pm line 659 ignoring text in character set `GB2312' at /usr/share/perl5/MIME/Parser/Filer.pm line 659 Recommended: =?gb2312?B?MzYw0MLOxbzgsuItMTItMDEtQ2hpIFNpbXAudHh0?= Alt +ernative : /home/uxbod/extract/msg-1321526988-4755-0/1 Recommended: =?gb2312?B?MzYw0MLChLFPnHktMTItMDEtQ2hpIFRyYWQudHh0?= Alt +ernative : /home/uxbod/extract/msg-1321526988-4755-0/1-1
As you can see the last two MIME entities are encoded using gb2312 but how can I get that to be the correct name on the file system ? If I extract the file through an email client and transfer it across to that system it does look okay:
-rw-r--r-- 1 uxbod uxbod 34304 Nov 15 10:42 撰稿材料.doc
Any help would be very very much appreciated.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: MIME::Parser::Filer and filenames in Simplified Chinese
by zwon (Abbot) on Nov 17, 2011 at 15:57 UTC | |
by uxbod (Initiate) on Nov 21, 2011 at 16:21 UTC | |
by Anonymous Monk on Nov 21, 2011 at 16:36 UTC | |
by uxbod (Initiate) on Nov 21, 2011 at 16:58 UTC | |
by uxbod (Initiate) on Nov 21, 2011 at 17:10 UTC | |
by uxbod (Initiate) on Nov 21, 2011 at 17:40 UTC | |
|