busch4al has asked for the wisdom of the Perl Monks concerning the following question:

I am trying to consistently save pdf email attachments. I have been able to save everything else no problem. I am saving after converting using MIME::QuotedPrintable (because the ContentTransferEncoding is quoted-printable), but some pdfs will open fine and some will not. When I look at the text (in Notepad) of the file, it looks like the line-endings might be off a bit (no pun intended). Anyone run into this type of thing before? Help! please

Replies are listed 'Best First'.
Re: saving pdf attachments
by tachyon (Chancellor) on Feb 26, 2003 at 21:46 UTC

    What are you using to parse your mail messages? Also if you save/cut and paste the whole message into a doc.zip, you should be able to open it with WinZip (GOK as to why) and see all the parts. Check you can do the extract to a valid PDF just to be sure you have valid attachments.

    cheers

    tachyon

    s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

      Hi Tachyon, I am using my own MimeReader.pm (I got frustrated trying to get Mime::Tools to work.) I have
      all the parts and can save binary, text, quoted-printable (usually, just having trouble with pdfs).
      I did take the email in Outlook and saved the troublesome attachment, and it is valid. I also compared that file to the
      one I save (that doesn't work), and found that the line-endings are the probable problem. Just not sure what about them
      is wrong.

        OK so the problem is that your MimeReader.pm can not cope with what the MIME it is getting.

        You probably are not parsing the corrupt MIME that some mail clients send 'correctly'! I have a webmail app with its own MIME parser built in. It has proven pretty reliable in practice at extracting attachemnts. Drop me a line and I will send you the code. Email jfreeman@tassie.net.au

        cheers

        tachyon

        s&&rsenoyhcatreve&&&s&n.+t&"$'$`$\"$\&"&ee&&y&srve&&d&&print

Re: saving pdf attachments
by Jaap (Curate) on Feb 26, 2003 at 21:10 UTC
    Can you post a piece of the correct pdf and a piece of the bad pdf?
    You might also want to post a piece of your perl code that handles this.
      Hi Jaap, Here's the code that saves the attachment:

      if ($code =~ /quoted-printable/i) { print "printing $num to c:\\$filename\n"; open(K, ">c:\\$filename"); binmode(K); #not sure when to use this, seems to be necessary for pd +f attachments print K MIME::QuotedPrint::decode_qp(join "", @{ $message->{body}->{ +data} }); close K; }

      The error when trying to open the bad pdf is:
      "The file is damaged and cannot be repaired."

      Here's a snippet of a good one (note that in Notepad the
      line-endings are little squares - \r I presume):

      %PDF-1.2 %ßÜÂÞ - Business Objects 1 0 obj << /Type /Pages /Count 1 /Kids [ 4 0 R ] >> endobj 2 0 obj << /CreationDate (D:20030206205259Z) /ModDate (D:20030206205259Z)

      And here's a bad one:

      %PDF-1.2 %ßÜÂÞ - Business Objects 1 0 obj << /Type /Pages /Count 17 /Kids [ 4 0 R 5 0 R 6 0 R 7 0 R 8 0 R 9 0 R 10 0 R 11 0 R 12 0 R 13 0 +R 14 0 R 15 0 R 16 0 R 17 0 R 18 0 R 19 0 R 20 0 R ] >> endobj 2 0 obj << /CreationDate (D:20030206205456Z) /ModDate (D:20030206205456Z)

      Any advice would be great!

        It sound like maybe the PDF reader is very critical about line endings. On a windows platform a line ends in \r\n while on *nix platforms a line ends in just \n. It could be that the quotedprintable decoder messes up the newlines.
        Although i do not have a clear answer for you, you can test this more by substituting the line endings in the bad pdf with other line endings:
        my $pdfContent = MIME::QuotedPrint::decode_qp(join "", @{ $message->{b +ody}->{data} }); $pdfContent =~ s/\n/\r\n/g;
        Play a little with the line endings and test it.