Re: MSWORD TO TEXT
by ChOas (Curate) on Jan 31, 2001 at 15:10 UTC
|
Hey!
I found this for ya:
LAOLA
is a collection of documentations and perl programs dealing with binary file formats of Windows program documents.
LAOLA is giving access to the raw document streams of any program using "structured storage" technology to save its documents.
ELSER is dealing especially with these streams as they are present in Word 6 and Word 7 documents.
You can find it here
GreetZ!,
print "profeth still\n" if /bird|devil/; | [reply] |
|
|
| [reply] |
|
|
Hey!
Completely right, but it also mentions its successor OLE::Storage
(available at your local CPAN), which uses Perl5, and does more...
GreetZ!,
print "profeth still\n" if /bird|devil/;
| [reply] |
Re: MSWORD TO TEXT
by Trinary (Pilgrim) on Jan 31, 2001 at 21:13 UTC
|
Did a couple searches, came up with this...the consensus
seems to be that there aren't any Word .doc parsers, and your only hope is to
use Win32::OLE, which apparently dosen't suit your needs. OLE::Storage just might do the trick...check it out.
I could've sworn there was a command-line utility to do this conversion
that would work, but I'm unable to find it right now...will search around some more. Trinary | [reply] |
|
|
| [reply] |
|
|
For this part, i just write a part of code to transform WORD to TEXTE by using OLE::win32. The limit of the system is to use on WINDOWS MACHINE. But it works correctly
| [reply] |
|
|
|
|
I could've sworn there was a command-line utility to do this conversion that would work, but I'm
unable to find it right now...will search around some more.
Here at work we got word2x installed which works on Word6 documents (according to the manpage)
but since I got no word documents at all I can't test it.
Maybe have a look at http://word2x.alcom.co.uk.
The README points to http://www.kfa-juelich.de/isr/1/texconv.html "for a list of other converters".
As one can guess from the URL (kfa-juelich) it is a TeX-site ...
I could swear I once ran across a word-to-ascii-converter, but can't remember name or place, sorry.
Regards
Stefan K
$dom = "skamphausen.de"; ## May The Open Source Be With You!
$Mail = "mail@$dom; $Url = "http://www.$dom";
| [reply] [d/l] [select] |
Re: (Zigster) MSWORD TO TEXT
by zigster (Hermit) on Apr 12, 2001 at 19:11 UTC
|
I use the UNIX command 'strings' it works fine and dandy with most word docs I have come across. The op is a little ruff but in most cases I can read the document. It all depends how clean you want the output.
--
Zigster | [reply] |
|
|
Zigster,
All I can say about strings is WOW! That works perfectly on Word 2k, WordPerfect 8, and Excel 2k files. Combined with pdftotext you have a nearly complete solution for extracting text from common user docs, which I'm doing for a search engine for a web-based document management site.
Just goes to show that if there's something you want to do on Unix/Linux, chances are the tool is already sitting on your hard drive.
| [reply] |
|
|
Glad to know it worked for you, I would be very interested in seeing the result when you have completed it. As a full on UNIX head working in a MS world a complete toolset for converting MS docs to ASCII would be of great interest to me. Please msg me when/if you complete the tools.
Cheers
--
Zigster
| [reply] |
Re: MSWORD TO TEXT
by buckaduck (Chaplain) on Apr 12, 2001 at 19:04 UTC
|
I'm pretty happy with the freeware program catdoc.
It doesn't handle anything fancy like OLE objects, but it
does a fine job extracting plain text from a plain Word document,
including the Office97 format.
If nothing else, the link above will point you to other good
resources.
buckaduck | [reply] |