Re: I have a problem in finding corrupted PDF files

Your usage is basically correct, but there are several things to note: The filenames in @all (as returned by readdir) won't include any directory components of the path, so unless your $ARGV[0] always is '.', you probably want

    my $pdf = PDF->new("$path/$aa");
[download]

Next, and more importantly, PDF->new() does die in some cases, which will make the entire script terminate... To work around this, you need to eval { ... } that call:

foreach my $aa (@all) {
    
    my $pdf;
    eval { $pdf = PDF->new("$path/$aa") };

    if (!$@ && $pdf->IsaPDF) {
        print "$aa is a pdf file\n"
    } else {
        print "$aa is corrupted\n";
    }
}
[download]

Lastly, I'm not sure at all if the very simple method used to identify PDF files, i.e.

# from PDF/Parse.pm

sub IsaPDF {
    return ($_[0]->{Header} != undef) ;
}
[download]

with Header being extracted as the version component of the basic PDF header (e.g. "%PDF-1.6" --> "1.6"), would actually identify more subtle corruptness in the file. So, unless the problem is such that it would make the parser die, the error would likely go unnoticed...

Comment on Re: I have a problem in finding corrupted PDF files Select or Download Code

Replies are listed 'Best First'.
Re^2: I have a problem in finding corrupted PDF files by hurix_03 (Initiate) on Dec 11, 2007 at 05:39 UTC
Thanks for your Reply. But still i'm getting the following errors. 1) Bad object reference in the line'. ("$path/$aa"). 2)If the pdf file is corrupted one, it shows the error "Can't read cross-reference section, according to trailer" and terminate the program. I want to get a corrupted pdf file list.	[reply]
Re^3: I have a problem in finding corrupted PDF files by almut (Canon) on Dec 11, 2007 at 11:29 UTC
If the pdf file is corrupted one, it shows the error "Can't read cross-reference section, according to trailer" and terminate the program. This is exactly one of the cases I had in mind when I said "`PDF->new()` does die in some cases". As this is a simple, straightforward `die` message in PDF/Parse.pm (line 82), `sub ReadCrossReference_pass1 { ... $_=PDF::Core::PDFGetline ($fd,\$offset); die "Can't read cross-reference section, according to trailer\n" if +! /xref\r?\n?/ ; ... }` [download] I'm pretty sure (actually, I've tried it) that it will be caught, if you wrap the call in an `eval` block with curly braces, as shown in my previous reply. That's exactly what that eval BLOCK form (Perl's exception handling mechanism) is for. If you still can't keep the script running in that case, please post the exact code you're using.	[reply] [d/l] [select]
Re^3: I have a problem in finding corrupted PDF files by fxy (Initiate) on Jan 27, 2008 at 15:33 UTC
Hi: You can try a popular PDF file recovery tool called Advanced PDF Repair to repair your PDF file. It is a powerful tool to repair corrupt or damaged PDF files. Detailed information about Advanced PDF Repair can be found at http://www.datanumen.com/apdfr/ And you can also download a free demo version at http://www.datanumen.com/apdfr/apdfr.exe Alan	[reply]