WilliamDee has asked for the wisdom of the Perl Monks concerning the following question:

Hi PerlMonks, I've got a knotty problem with the MIME::Parser. When extracting a text-file attachment from an email it puts in extra ^M characters on Windows (expected behavior which is spoken of in the documentation).

My question is: is there a means of getting around this behavior, by manually tweaking a setting? Or if necessary, somewhere to tweak or subclass the code (not ideal given that I'm not *that* great with perl).

Example of the code I'm using (the email is already written to file - that section of code appears to be working perfectly given that I can extract .jpg and .csv attachments without problems):

... my $parser = new MIME::Parser; $parser->output_dir("unpacking"); $parser->parse_open("receiving/message-1.msg"); ...

When used with a .csv file it comes out perfectly. Rename the .csv file to a .txt file and resend, the extracted file has the extra ^M's on the end of each line.

Your thoughts and hints will be most appreciated.

Cheers!
William

PS: I'm using ActiveState's ActivePerl, version 5.16.3, built for MSWin32-x86-multi-thread. The binary build is: 1603 [296746].

PPS: In case it is relevant or you're interested, here is the appropriate MIME-part of the two saved message files:

(.csv) ------_=_NextPart_001_01CF26C0.66530897 Content-Type: application/x-msexcel; name="stuff.csv" Content-Transfer-Encoding: base64 Content-Description: stuff.csv Content-Disposition: attachment; filename="stuff.csv" YSxiLGMsZCxlDQphLGIsY2MsZGQsZWVlDQphYSxiYmIsY2NjYyxkZGRkZCxlZWVlZWUNCg +== ------_=_NextPart_001_01CF26C0.66530897--

...and...

(.txt) ------_=_NextPart_001_01CF26C0.A0C3FDAB Content-Type: text/plain; name="stuff.txt" Content-Transfer-Encoding: base64 Content-Description: stuff.txt Content-Disposition: attachment; filename="stuff.txt" YSxiLGMsZCxlDQphLGIsY2MsZGQsZWVlDQphYSxiYmIsY2NjYyxkZGRkZCxlZWVlZWUNCg +== ------_=_NextPart_001_01CF26C0.A0C3FDAB--
  • Comment on Is it possible to force MIME::Parser to extract text-files on a Windows system without the extra CR's on the end of lines?
  • Select or Download Code

Replies are listed 'Best First'.
Re: Is it possible to force MIME::Parser to extract text-files on a Windows system without the extra CR's on the end of lines?
by Athanasius (Archbishop) on Feb 11, 2014 at 02:44 UTC

    Hello WilliamDee, and welcome to the Monastery!

    I’m not familiar with MIME::Parser, but it occurs to me that it might be easier to just accept the output as-is, and post-process to remove the unwanted CR characters.

    For example, if you know that an extracted string has no carriage returns that you want to keep, post-processing is as simple as:

    $string =~ s/\r//g;

    If you need to be more precise, you can use a look-ahead assertion to remove only carriage returns that occur immediately before newlines:

    $string =~ s/\r(?=\n)//g;

    Hope that helps,

    Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

      Thank you for the welcome and the idea, Athanasius.

      It is a possibility to do some post-processing only on text-files, if there are no other options. I'll admit that I'm not keen at the thought of slurping large files (2+ megabytes) into memory again and doing a regex replace like the following:

      $fileguts =~ s/\r{2,}\n/\r\n/g;

      That should reasonably efficient at the process. Your idea does raise another thought though: avoiding the mangling of files which come out of unix-based systems, changing \n to \r\n. It might be preferable to do something like:

      $fileguts =~ s/\r+\n/\n/g;

      In the interest of not potentially mangling files - for the moment I will continue to hang out in the hope of another, MIME::Parser-based fix. :)

      Cheers!
      William

      PS: Another possibility might be to change the original MIME message before writing to disk, say from:

      Content-Type: text/plain;

      To:

      Content-Type: application/x-msexcel;

      A bit of an ugly hack to trick MIME::Parse, though probably doable. And might be preferable to the extra disk-load/regex-replace/disk-save cycle. While I'm not expecting hundreds of files per minute/second, it is best to assume that something like that might happen if an ISP error suddenly causes a surge or someone attempts a DoS/mailbomb attack.

        Thank you Athanasius, I have gone down the path of changing the content-type to something that will extract text/plain as binary files (application/x-msexcel). The code I'm using now is:

        # open the file in raw/binary output for writing open MAILOUT, '>:raw', "$receiving/message-$thetime-$popcount.msg" + or LogWrite("Unable to open message-$thetime-$popcount.msg for writi +ng: $!"); # get the email into a temporary variable my $hold = $pop->HeadAndBody($popcount); # force it to use binary saving $hold =~ s/text\/plain/application\/x-msexcel/g; # write to file print MAILOUT $hold; # close the file close MAILOUT;

        And the text-files extracted by MIME::Parser are now saved without extra \r characters added to them.

        Cheers for the help! :)
        William

Re: Is it possible to force MIME::Parser to extract text-files on a Windows system without the extra CR's on the end of lines?
by Anonymous Monk on Feb 11, 2014 at 02:59 UTC

    Is it possible to force MIME::Parser to extract text-files on a Windows system without the extra CR's on the end of lines?

    Probably not, but I wouldn't even bother looking :) grab Path::Tiny and/or File::Find::Rule find some target files and normalize the newlines to what you need them to be

    update: OTOH, if you subclass MIME::Parser::FileUnder, you could normalize the newlines in "purgable"

    So no it isn't possible to force MIME::Parser to do this with a setting but only with code

      Thank you, I was rather afraid of that. I guess that I'll have to stretch my knowledge and see how bad a job that I can manage.

      Cheers!
      William