comment on

I've been ripping my hair out these last two nights. My problem seems like one of those ghosts haunting my perl stream. Following the MIME::Parser documentation I've made a simple parser to output any attached PDF files for further use in a FAX application. However the email containing the PDF file is parsed in a way where base64 data suddently disappears. Small 1 Byte chunks are missing when I a do a diff on the original and extracted base64 streams. The problem only occours when the PDF file is above ~1Mb. Any other file type is extracted without any problems, it only occours with PDF documents. Have any of you wise people seen this problem before?. I suspect that it is a slurp problem but I have been unable to prove it. The first sign of corruction was that the length of the base64 data no longer was multiple of 4. MIME-tools 5.420 Perl 5.8.8. UPDATE: Just did a test with the same file zipped, and no errors in the base64 data. Now I'm really baffled. UPDATE: 3. time is the charm. After emailing the file a third time the data is no longer corrupt.. Looks like I'm the one to blame. Some kind of pre-processing error. Note to self. if it does not work, recreate the datasource. Maybe you screwed it up somehow. Sorry.

The files used in the code below can be found here: http://www.cyberdoc.dk/perlmonks/

#/usr/bin/perl -w
use MIME::Parser;

my $parser = new MIME::Parser;
$parser->decode_bodies(0);
$parser->decode_headers(0);
$parser->output_to_core(1);
$parser->tmp_to_core(1);

local($/) = undef;  # slurp

open(EMAIL, "63602021-5.eml");
binmode EMAIL;
$entity = $parser->parse_data(<EMAIL>);
close(EMAIL);

dump_entity($entity) if $entity;

sub dump_entity{
  my $ent = shift;
  my @parts = $ent->parts;
  if (@parts) {
    map { dump_entity($_) } @parts;
  } else {
    if(scalar($ent->head->mime_type) eq "application/pdf"){
      print "Part: "
            . $ent->head->recommended_filename
            . " (" . scalar($ent->head->mime_type) . ")\n";
      my $data_org = $ent->bodyhandle->as_string();
      print length($data_org) . "\n";
      $data_org = fixbase64($data_org);
      print length($data_org) . "\n";
      writefile("/tmp/test.b64",$data_org);
      $data_bin = decode_base64($data_org);
      print length($data_bin) . "\n";
    }
  }
}

sub writefile{
  my $file = shift;
  my $data = shift;
  open(HANDLE,">$file");
  local($/) = undef;
  binmode HANDLE;
  print HANDLE $data;
  close(HANDLE);
}

sub fixbase64{
        my $str = shift;
        my $res = "";
        $str =~ tr|A-Za-z0-9+=/||cd;
        while ($str =~ /(.{1,76})/gs) {
                $res .= $1 . "\n";
        }
        return $res;
}

sub decode_base64{
  local($^W) = 0;
  my $str = shift;
  my $res = "";
  $str =~ tr|A-Za-z0-9+=/||cd;
  if (length($str) % 4) {
    warn("Length of base64 data ["
         . length($str) . "] not a multiple of 4");
  }
  $str =~ s/=+$//;
  $str =~ tr|A-Za-z0-9+/| -_|;
  while ($str =~ /(.{1,60})/gs) {
    my $len = chr(32 + length($1)*3/4);
    $res .= unpack("u", $len . $1 );
  }
  return $res;
}
[download]

In reply to MIME::Parser odd PDF error. by cyberdoc

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.