Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
However , when I open the file in which I wrote the pdf contents reads as follows :#!/use/bin/perl $path_to_file1 = "adminhelp.pdf"; $path_to_newfile = "hello_world.out"; # start displaying the HTML page print "Content-type: text/html\n\n"; print "<html><head></head>\r\n"; print "<body>\r\n"; open (READFILE, "$path_to_file1") || &errorfunc("Couldn't open the fil +e [$path_to_file1] to read."); print "The file [$path_to_file1] was opened for reading<br>"; ($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, +$ctime, $blksize, $blocks) = stat $path_to_file1; print "The size of file [$path_to_file1] is $size<br>"; open (WRITEFILE, ">$path_to_newfile") || &errorfunc("Couldn't open the + file [$path_to_newfile] to write."); print "The file [$path_to_newfile] was opened for writing<br>"; binmode READFILE; while (read READFILE, $buf, 16384) { print <<HTML WRITEFILE $buf HTML ; } close (READFILE); close (WRITEFILE); print "<br><br>Finished copying from [$path_to_file1] to [$path_to_new +file]<br>"; print "<br><br></body></html>"; exit;
%PDF-1.3 [binary section] 2 0 obj << /BitsPerComponent 8 /ColorSpace/DeviceRGB /Filter[/DCTDecode] /Height 73 /Subtype/Image /Type/XObject /Length 2855 /Width 107 >> stream [binary section] endstream endobj 1 0 obj << /Length 5281 /Filter [/FlateDecode] >> stream [binary section] << /MediaBox[0 0 612 792] /Resources<>/ProcSet[/PDF/Text/ImageC]/Font<>>> /Type/Page /Contents 7 0 R /Parent 4 0 R >> endobj 10 0 obj << /MediaBox[0 0 612 792] /Resources<>/ProcSet[/PDF/Text/ImageC]/Font<>>> /Type/Page /Contents 9 0 R /Parent 4 0 R >> endobj 12 0 obj << /MediaBox[0 0 612 792] /Resources<>/ProcSet[/PDF/Text/ImageC]/Font<>>> /Type/Page /Contents 11 0 R /Parent 4 0 R >> endobj 14 0 obj << /MediaBox[0 0 612 792] /Resources<>/ProcSet[/PDF/Text/ImageC]/Font<>>> /Type/Page /Contents 13 0 R /Parent 4 0 R >> endobj 18 0 obj << /Type/Pages/Count 1 /Parent 17 0 R /Kids[16 0 R] >> endobj 16 0 obj << /MediaBox[0 0 612 792] /Resources<>/ProcSet[/PDF/Text/ImageC]/Font<>>> /Type/Page /Contents 15 0 R /Parent 18 0 R >> endobj 32 0 obj << /PageMode/UseNone /Type/Catalog /OpenAction[3 0 R/XYZ null null null] /PageLabels 19 0 R /Pages 17 0 R >> endobj 33 0 obj << /Subject() /CreationDate(D:20020516122721) /Producer(Jaws PDF Creator, Word macro v2.11.29) /Author(vivek) /Keywords() /Title(This help page consists of following modules) /Creator(Microsoft Word 9.0) >> endobj xref 0 34 0000000000 65535 f 0000003034 00000 n 0000000015 00000 n 0000057178 00000 n 0000057076 00000 n 0000008389 00000 n 0000057365 00000 n 0000015319 00000 n 0000057552 00000 n 0000019184 00000 n 0000057739 00000 n 0000024202 00000 n 0000057927 00000 n 0000028903 00000 n 0000058116 00000 n 0000033317 00000 n 0000058376 00000 n 0000057014 00000 n 0000058305 00000 n 0000038708 00000 n 0000038747 00000 n 0000038927 00000 n 0000042174 00000 n 0000042380 00000 n 0000042750 00000 n 0000043107 00000 n 0000048691 00000 n 0000048897 00000 n 0000049361 00000 n 0000049731 00000 n 0000056334 00000 n 0000056540 00000 n 0000058566 00000 n 0000058691 00000 n trailer << /Size 34 /Root 32 0 R /Info 33 0 R /ID[] >> startxref 58914 %%EOF
Edit: BazB deleted binary sections of PDF (marked removed sections with [binary section] )
janitored by ybiC: Balanced <readmore> tags, retitle from less-than-descriptive "Help on Perl Script"
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Extract text from PDF
by Corion (Patriarch) on Nov 29, 2003 at 09:47 UTC | |
by Anonymous Monk on Nov 29, 2003 at 10:26 UTC | |
by Roger (Parson) on Nov 29, 2003 at 12:02 UTC | |
by Corion (Patriarch) on Nov 30, 2003 at 01:10 UTC | |
by Anonymous Monk on Nov 29, 2003 at 11:31 UTC |