in reply to Re^2: To Read and Edit docx files in Windows 7
in thread To Read and Edit docx files in Windows 7

I've just created a short Word document called December_12.docx and copied it on a Unix platform. Then made a copy of it called December_12.zip. Then, unzipping it shows this:
$cp December_12.docx December_12.zip $unzip December_12.zip Archive: December_12.zip inflating: [Content_Types].xml inflating: _rels/.rels inflating: word/_rels/document.xml.rels inflating: word/document.xml inflating: word/theme/theme1.xml inflating: word/settings.xml inflating: word/webSettings.xml inflating: word/stylesWithEffects.xml inflating: docProps/core.xml inflating: word/styles.xml inflating: word/fontTable.xml inflating: docProps/app.xml
Now you could in principle edit the word/document.xml document, except that the XML looks quite messy:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?> <w:document xmlns:wpc="http://schemas.microsoft.com/office/word/2010/w +ordprocessingCanvas" xmlns:mc="http://schemas.openxmlformats.org/mark +up-compatibility/2006" xmlns:o="urn:schemas-microsoft-com:office:offi +ce" xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/re +lationships" xmlns:m="http://schemas.openxmlformats.org/officeDocumen +t/2006/math" xmlns:v="urn:schemas-microsoft-com:vml" xmlns:wp14="http +://schemas.microsoft.com/office/word/2010/wordprocessingDrawing" xmln +s:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessing +Drawing" xmlns:w10="urn:schemas-microsoft-com:office:word" xmlns:w="h +ttp://schemas.openxmlformats.org/wordprocessingml/2006/main" xmlns:w1 +4="http://schemas.microsoft.com/office/word/2010/wordml" xmlns:wpg="h +ttp://schemas.microsoft.com/office/word/2010/wordprocessingGroup" xml +ns:wpi="http://schemas.microsoft.com/office/word/2010/wordprocessingI +nk" xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml" +xmlns:wps="http://schemas.microsoft.com/office/word/2010/wordprocessi +ngShape" mc:Ignorable="w14 wp14"><w:body><w:p w:rsidR="006C10F5" w:rs +idRDefault="00C4263D"/><w:p w:rsidR="00C4263D" w:rsidRPr="00C4263D" w +:rsidRDefault="00C4263D"><w:pPr><w:rPr><w:lang w:val="en-US"/></w:rPr +></w:pPr><w:r w:rsidRPr="00C4263D"><w:rPr><w:lang w:val="en-US"/></w: +rPr><w:t>December 12, 2014.</w:t></w:r></w:p><w:p w:rsidR="00C4263D" +w:rsidRPr="00C4263D" w:rsidRDefault="00C4263D"><w:pPr><w:rPr><w:lang +w:val="en-US"/></w:rPr></w:pPr><w:proofErr w:type="gramStart"/><w:r w +:rsidRPr="00C4263D"><w:rPr><w:lang w:val="en-US"/></w:rPr><w:t xml:sp +ace="preserve">The quick brown </w:t></w:r><w:proofErr w:type="spellS +tart"/><w:r w:rsidRPr="00C4263D"><w:rPr><w:lang w:val="en-US"/></w:rP +r><w:t>fox</w:t></w:r><w:proofErr w:type="spellEnd"/><w:r w:rsidRPr=" +00C4263D"><w:rPr><w:lang w:val="en-US"/></w:rPr><w:t xml:space="prese +rve"> jumps over the lazy dog.</w:t></w:r><w:proofErr w:type="gramEnd +"/></w:p><w:p w:rsidR="00C4263D" w:rsidRDefault="00C4263D"><w:pPr><w: +rPr><w:lang w:val="en-US"/></w:rPr></w:pPr></w:p><w:p w:rsidR="00C426 +3D" w:rsidRPr="00C4263D" w:rsidRDefault="00C4263D"><w:pPr><w:rPr><w:l +ang w:val="en-US"/></w:rPr></w:pPr><w:bookmarkStart w:id="0" w:name=" +_GoBack"/><w:bookmarkEnd w:id="0"/></w:p><w:sectPr w:rsidR="00C4263D" + w:rsidRPr="00C4263D"><w:pgSz w:w="11906" w:h="16838"/><w:pgMar w:top +="1417" w:right="1417" w:bottom="1417" w:left="1417" w:header="708" w +:footer="708" w:gutter="0"/><w:cols w:space="708"/><w:docGrid w:lineP +itch="360"/></w:sectPr></w:body></w:document>
The content of the Word document was only these two lines:
December 12, 2014. The quick brown fox jumps over the lazy dog.

Replies are listed 'Best First'.
Re^4: To Read and Edit docx files in Windows 7
by DVCHAL (Novice) on Dec 11, 2014 at 07:05 UTC
    Thanks for the Sample Laurent. Any way to Extract the Content from XML file through Perl Script? In your Example, How to Extract only "The quick brown fox jumps over the lazy dog" through the perl Script from the messy XML file. Even if its a Table, whether we able to read in XML?