another_monkey has asked for the wisdom of the Perl Monks concerning the following question:

Dear monks, I would appreciate you kind help with the following problem: We have an Apache server functioning as a forward proxy, with ext_filter configured: whenever the response is of MIME type PDF, the filter is called (a perl script), and the PDF's content may be read from the STDIN. We read the PDF from STDIN, write it to a file and that's all. This almost always work well, but on one specific website, the PDF is malformed when written in the following way:
my $input_file = shift; binmode STDIN; open(OUT, ">" . $input_file); binmode OUT; foreach my $line (<STDIN>){ print OUT $line; } close OUT;
If we instead call 'tee' (set the filter to use 'tee')- the file is written correctly. Analyzing the malformed PDF shows that the xref table is malformed in the PDF we write and Adobe Reader fails to open it. We have already tried using sysopen,sysread etc. , using ":raw", and several other ways to write a binary file properly, and nothing worked (cut&paste code from documnetation for writing binary files). Only when using the 'tee' utility in linux as the filter, it was written correctly. This doesn't help us- we need to be able to write it to a file from stdin as part of the perl script. Any suggestions? If there could be a way to somehow call 'tee' with a system call, and give it STDIN of the perl program- it might could work. Many thanks in advance.

Replies are listed 'Best First'.
Re: Writing PDF binary file from stream yields malformed PDF
by shawnhcorey (Friar) on Jun 26, 2013 at 14:56 UTC

    Use the three argument open and set the the files' layer to raw

    binmode STDIN, ':raw'; my $output_file = shift @ARGV; open my $out_fh, '>:raw', $output_file or die "could not open $output_ +file: $!\n"; while( my $line = <STDIN> ){ print {$out_fh} $line or die "could not print to $output_file: $!\ +n"; } close $out_fh or die "could not close $output_file: $!\n";
    </code>

      Use the three argument open and set the the files' layer to raw

      three argument good, with :raw good, while good, binmode is :raw

      Thanks! I've tried it, but it still doesn't work on the special PDF (works for others, not for the problematic one). I've also applied Anonymous's corrections (thanks!).
Re: Writing PDF binary file from stream yields malformed PDF
by mprentice (Sexton) on Jun 26, 2013 at 13:53 UTC

    Do you have an example PDF that breaks? I tried this code on my Mac system with a 4.9 MB pdf in Perl 5.18 and the output PDF opened without error in Adobe Reader for Mac.

Re: Writing PDF binary file from stream yields malformed PDF
by BrowserUk (Patriarch) on Jun 27, 2013 at 08:46 UTC

    The problem here is that you are using readline() on a binary file.

    You are relying on the fact that many binary files happen to contain bytes that look like line ends.

    And using for with a file handle means you are slurping the whole file into a list.

    And if the file contains (say) a lot of packed binary integers, it could mean that you are reading and writing the file in a gazillion iddy-biddy chunks.

    You say you've tried sysread and syswrite but don't show any code, or how the results failed -- ie. did you compare the input and output files?

    Your best bet would be to replace the read/write loop above with something like:

    open OUT ... { local $\; my $buffer; while( read( STDIN, $buffer, 4096 ) ) { print OUT $buffer; } } close OUT;

    With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.

      look ma, no line ends

      local $/ = \(8 * 1024); # blocksize while( readline ...

        No shit Sherlock!

        Now explain it to the OP who might benefit from it -- if you can bring yourself to type an actual explanation that is.


        With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday'
        Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
        "Science is about questioning the status quo. Questioning authority".
        In the absence of evidence, opinion is indistinguishable from prejudice.