in reply to Is it possible to force MIME::Parser to extract text-files on a Windows system without the extra CR's on the end of lines?

Hello WilliamDee, and welcome to the Monastery!

I’m not familiar with MIME::Parser, but it occurs to me that it might be easier to just accept the output as-is, and post-process to remove the unwanted CR characters.

For example, if you know that an extracted string has no carriage returns that you want to keep, post-processing is as simple as:

$string =~ s/\r//g;

If you need to be more precise, you can use a look-ahead assertion to remove only carriage returns that occur immediately before newlines:

$string =~ s/\r(?=\n)//g;

Hope that helps,

Athanasius <°(((><contra mundum Iustus alius egestas vitae, eros Piratica,

  • Comment on Re: Is it possible to force MIME::Parser to extract text-files on a Windows system without the extra CR's on the end of lines?
  • Select or Download Code

Replies are listed 'Best First'.
Re^2: Is it possible to force MIME::Parser to extract text-files on a Windows system without the extra CR's on the end of lines?
by WilliamDee (Initiate) on Feb 11, 2014 at 04:01 UTC

    Thank you for the welcome and the idea, Athanasius.

    It is a possibility to do some post-processing only on text-files, if there are no other options. I'll admit that I'm not keen at the thought of slurping large files (2+ megabytes) into memory again and doing a regex replace like the following:

    $fileguts =~ s/\r{2,}\n/\r\n/g;

    That should reasonably efficient at the process. Your idea does raise another thought though: avoiding the mangling of files which come out of unix-based systems, changing \n to \r\n. It might be preferable to do something like:

    $fileguts =~ s/\r+\n/\n/g;

    In the interest of not potentially mangling files - for the moment I will continue to hang out in the hope of another, MIME::Parser-based fix. :)

    Cheers!
    William

    PS: Another possibility might be to change the original MIME message before writing to disk, say from:

    Content-Type: text/plain;

    To:

    Content-Type: application/x-msexcel;

    A bit of an ugly hack to trick MIME::Parse, though probably doable. And might be preferable to the extra disk-load/regex-replace/disk-save cycle. While I'm not expecting hundreds of files per minute/second, it is best to assume that something like that might happen if an ISP error suddenly causes a surge or someone attempts a DoS/mailbomb attack.

      Thank you Athanasius, I have gone down the path of changing the content-type to something that will extract text/plain as binary files (application/x-msexcel). The code I'm using now is:

      # open the file in raw/binary output for writing open MAILOUT, '>:raw', "$receiving/message-$thetime-$popcount.msg" + or LogWrite("Unable to open message-$thetime-$popcount.msg for writi +ng: $!"); # get the email into a temporary variable my $hold = $pop->HeadAndBody($popcount); # force it to use binary saving $hold =~ s/text\/plain/application\/x-msexcel/g; # write to file print MAILOUT $hold; # close the file close MAILOUT;

      And the text-files extracted by MIME::Parser are now saved without extra \r characters added to them.

      Cheers for the help! :)
      William

        $cough->binary(1); ... print ...