in reply to How can I read the .docx file in perl?
Unzip it ... it's just a collection of XML files. Then you can dig through them with your favorite XML parser. For many documents, you'll need only the items in the word directory.
$ unzip hack.docx Archive: hack.docx inflating: [Content_Types].xml inflating: _rels/.rels inflating: word/_rels/document.xml.rels inflating: word/document.xml inflating: word/_rels/header2.xml.rels inflating: word/footer2.xml inflating: word/footer1.xml inflating: word/footer3.xml inflating: word/header2.xml inflating: word/header1.xml inflating: word/endnotes.xml inflating: word/footnotes.xml inflating: word/header3.xml extracting: word/media/image1.jpeg inflating: word/theme/theme1.xml inflating: word/_rels/settings.xml.rels inflating: word/settings.xml inflating: word/styles.xml inflating: word/webSettings.xml inflating: word/numbering.xml inflating: docProps/app.xml inflating: docProps/core.xml inflating: word/fontTable.xml inflating: docProps/custom.xml
...roboticus
When your only tool is a hammer, all problems look like your thumb.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: How can I read the .docx file in perl?
by locked_user sundialsvc4 (Abbot) on Apr 17, 2013 at 13:35 UTC |