Are you working with MS Word 2007 or 2010 docx files? If so you can open the file (its a Zip file) and process the /word/document.xml file. The paragraph start tag looks like this: <w:p w:rsidR="00E56F3D" w:rsidRDefault="00E56F3D" w:rsidP="00E56F3D"> and ends with this </w:p>. You will need to combine the sub tabs to create a whole paragraph as the text is not contiguous (its broken up by formatting tags). The actual paragraph text is contained in tags that are wrapped by <w:t>...</w:t>