Re: Perl variant of linux tool strings
by Tanktalus (Canon) on Mar 23, 2005 at 20:34 UTC
|
Depending on your needs, this may be doable with just a regular expression...
while (/([[:print:]]){4,}/g)
{
print $1,$/;
}
| [reply] [d/l] |
Re: Perl variant of linux tool strings
by duct_tape (Hermit) on Mar 23, 2005 at 20:46 UTC
|
| [reply] |
|
|
I like to collect words from a pdf or word document!
So far Perl Power tools does a very good job! Thanks
| [reply] |
|
|
For collecting words from pdf documents, you can use
the ps2ascii utility which comes with
ghostscript. It executes the document with ghostscript,
using a special device that outputs only ascii text.
As ghostscript can handle pdfs too, ps2ascii works fine on
them (although I did have
some compatibility problems with some pdfs, depending
on the generating program and the version of ghostscript).
This doesn't work for word documents of course.
| [reply] |
|
|
|
|
|
|
|
Re: Perl variant of linux tool strings
by dragonchild (Archbishop) on Mar 23, 2005 at 20:35 UTC
|
| [reply] [d/l] |
|
|
Thanks for the answers. What I want with this is the 'words' inside, for example a pdf file.
| [reply] |
|
|
What's your real question? Do you want to parse a PDF file or do you want to extract the ASCII sequences from within a non-ASCII file?
Being right, does not endow the right to be rude; politeness costs nothing. Being unknowing, is not the same as being stupid. Expressing a contrary opinion, whether to the individual or the group, is more often a sign of deeper thought than of cantankerous belligerence. Do not mistake your goals as the only goals; your opinion as the only opinion; your confidence as correctness. Saying you know better is not the same as explaining you know better.
| [reply] |