Re: Text from PDF
by steves (Curate) on Oct 26, 2004 at 17:16 UTC
|
PDF::FDF::Simple claims to
be able to extract some subset of text from PDF files to
strings, although I have never personally used it. I'd be
interested to hear how capable it is for this task if you
decide to try it.
| [reply] |
Re: Text from PDF
by gellyfish (Monsignor) on Oct 26, 2004 at 16:34 UTC
|
You could use ps2ascii which is a tool that uses the GhostScript tools. You can get versions for both windows and unix.
/J\
| [reply] [d/l] |
Re: Text from PDF
by Popcorn Dave (Abbot) on Oct 26, 2004 at 16:28 UTC
|
| [reply] |
Re: Text from PDF
by saberworks (Curate) on Oct 26, 2004 at 16:57 UTC
|
If you don't need perl you can use the linux utility pdftotext and it will extract out all the text, and you can use perl to parse from there. | [reply] |
Re: Text from PDF
by punch_card_don (Curate) on Oct 26, 2004 at 17:36 UTC
|
| [reply] |
Re: Text from PDF
by dragonchild (Archbishop) on Oct 27, 2004 at 13:04 UTC
|
PDF::Extract seems to be where you want to look.
Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.
| [reply] |
|
|
| [reply] |
Re: Text from PDF
by Anonymous Monk on Oct 26, 2004 at 21:56 UTC
|
Thanx for all the replies. For the record: (1) I've looked at other utils, but wanted a perl solution; (2) saw the FDF module & don't know that I want to bring it in since I've never heard of FDF's; (3) I want a script to do what Adobe does (badly) by saving the pdf to text.
Also for the record: I give up. I'm gonna buy somethin' | [reply] |
Re: Text from PDF
by steves (Curate) on Oct 27, 2004 at 10:08 UTC
|
I played around with
PDF::FDF::Simple and I couldn't get it to extract text
from PDF files. I thought that FDF was just a subset of PDF
but there must be more to it than that. Then I looked around
for free PDF-to-text tools and was surprised to find that
there aren't many that are truly free. Ghostscript may be
your best free option. It apparently has a tool for getting
text from PDF documents. Another one I found is a Java tool
named PDFBox.
| [reply] |