Well a .docx file is actually just an archive file containing the metadata that you speak of. In Perl it is possible to access an archive file, extract the metadata (which would be a .xml file), and then you can parse the .xml file for what you need. These tasks are accomplished with specific Perl modules that can be found on CPAN.
| [reply] |
| [reply] |
/me nods...
IIRC, docx is an XML-formatted file with a well-known public schema, zip-compressed. If you do not already find a CPAN module to do what you want, an approach could be to write code that unzips it, then attacks the XML content using XPath expressions ... thus avoiding the need to write code to match the XML internal structure. But it is extremely likely that what you are doing is “a thing already done.”
| |