cyberdoc has asked for the wisdom of the Perl Monks concerning the following question:

I've been ripping my hair out these last two nights. My problem seems like one of those ghosts haunting my perl stream. Following the MIME::Parser documentation I've made a simple parser to output any attached PDF files for further use in a FAX application. However the email containing the PDF file is parsed in a way where base64 data suddently disappears. Small 1 Byte chunks are missing when I a do a diff on the original and extracted base64 streams. The problem only occours when the PDF file is above ~1Mb. Any other file type is extracted without any problems, it only occours with PDF documents. Have any of you wise people seen this problem before?. I suspect that it is a slurp problem but I have been unable to prove it. The first sign of corruction was that the length of the base64 data no longer was multiple of 4. MIME-tools 5.420 Perl 5.8.8. UPDATE: Just did a test with the same file zipped, and no errors in the base64 data. Now I'm really baffled. UPDATE: 3. time is the charm. After emailing the file a third time the data is no longer corrupt.. Looks like I'm the one to blame. Some kind of pre-processing error. Note to self. if it does not work, recreate the datasource. Maybe you screwed it up somehow. Sorry.
The files used in the code below can be found here: http://www.cyberdoc.dk/perlmonks/
#/usr/bin/perl -w use MIME::Parser; my $parser = new MIME::Parser; $parser->decode_bodies(0); $parser->decode_headers(0); $parser->output_to_core(1); $parser->tmp_to_core(1); local($/) = undef; # slurp open(EMAIL, "63602021-5.eml"); binmode EMAIL; $entity = $parser->parse_data(<EMAIL>); close(EMAIL); dump_entity($entity) if $entity; sub dump_entity{ my $ent = shift; my @parts = $ent->parts; if (@parts) { map { dump_entity($_) } @parts; } else { if(scalar($ent->head->mime_type) eq "application/pdf"){ print "Part: " . $ent->head->recommended_filename . " (" . scalar($ent->head->mime_type) . ")\n"; my $data_org = $ent->bodyhandle->as_string(); print length($data_org) . "\n"; $data_org = fixbase64($data_org); print length($data_org) . "\n"; writefile("/tmp/test.b64",$data_org); $data_bin = decode_base64($data_org); print length($data_bin) . "\n"; } } } sub writefile{ my $file = shift; my $data = shift; open(HANDLE,">$file"); local($/) = undef; binmode HANDLE; print HANDLE $data; close(HANDLE); } sub fixbase64{ my $str = shift; my $res = ""; $str =~ tr|A-Za-z0-9+=/||cd; while ($str =~ /(.{1,76})/gs) { $res .= $1 . "\n"; } return $res; } sub decode_base64{ local($^W) = 0; my $str = shift; my $res = ""; $str =~ tr|A-Za-z0-9+=/||cd; if (length($str) % 4) { warn("Length of base64 data [" . length($str) . "] not a multiple of 4"); } $str =~ s/=+$//; $str =~ tr|A-Za-z0-9+/| -_|; while ($str =~ /(.{1,60})/gs) { my $len = chr(32 + length($1)*3/4); $res .= unpack("u", $len . $1 ); } return $res; }

Replies are listed 'Best First'.
Re: MIME::Parser odd PDF error.
by cyberdoc (Novice) on May 28, 2007 at 16:55 UTC
    As it says in the updated description It turned out to be a pre-processing error. I recreated the datasource from scratch and emailed the file again. Afterwards no errors and everything is perfect. I apologize for my ignorance.