Unzip it ... it's just a collection of XML files. Then you can dig through them with your favorite XML parser. For many documents, you'll need only the items in the word directory.
$ unzip hack.docx Archive: hack.docx inflating: [Content_Types].xml inflating: _rels/.rels inflating: word/_rels/document.xml.rels inflating: word/document.xml inflating: word/_rels/header2.xml.rels inflating: word/footer2.xml inflating: word/footer1.xml inflating: word/footer3.xml inflating: word/header2.xml inflating: word/header1.xml inflating: word/endnotes.xml inflating: word/footnotes.xml inflating: word/header3.xml extracting: word/media/image1.jpeg inflating: word/theme/theme1.xml inflating: word/_rels/settings.xml.rels inflating: word/settings.xml inflating: word/styles.xml inflating: word/webSettings.xml inflating: word/numbering.xml inflating: docProps/app.xml inflating: docProps/core.xml inflating: word/fontTable.xml inflating: docProps/custom.xml
...roboticus
When your only tool is a hammer, all problems look like your thumb.
In reply to Re: How can I read the .docx file in perl?
by roboticus
in thread How can I read the .docx file in perl?
by prabuvos
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |