in reply to I have a problem in finding corrupted PDF files
Your usage is basically correct, but there are several things to note: The filenames in @all (as returned by readdir) won't include any directory components of the path, so unless your $ARGV[0] always is '.', you probably want
my $pdf = PDF->new("$path/$aa");
Next, and more importantly, PDF->new() does die in some cases, which will make the entire script terminate... To work around this, you need to eval { ... } that call:
foreach my $aa (@all) { my $pdf; eval { $pdf = PDF->new("$path/$aa") }; if (!$@ && $pdf->IsaPDF) { print "$aa is a pdf file\n" } else { print "$aa is corrupted\n"; } }
Lastly, I'm not sure at all if the very simple method used to identify PDF files, i.e.
# from PDF/Parse.pm sub IsaPDF { return ($_[0]->{Header} != undef) ; }
with Header being extracted as the version component of the basic PDF header (e.g. "%PDF-1.6" --> "1.6"), would actually identify more subtle corruptness in the file. So, unless the problem is such that it would make the parser die, the error would likely go unnoticed...
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: I have a problem in finding corrupted PDF files
by hurix_03 (Initiate) on Dec 11, 2007 at 05:39 UTC | |
by almut (Canon) on Dec 11, 2007 at 11:29 UTC | |
by fxy (Initiate) on Jan 27, 2008 at 15:33 UTC |