At a unix operations department which serves development projects, work arrives as a word document. When the work is done, the word document needs to be updated to reflect the actual solution applied. I had a pile of 76 documents with sufficiently similar requirements that I could create a list of 76 lines of CSV and write a Perl script to do the 76 pieces of work. Afterwards there remains about 10 days of paperwork to be done (about 1 hour per request). So I was given the choice: either sit and type for ten days or write more Perl scripts to do it for me.

One of the requirements is to add a row to the change history table in the document. Having found out that .docx files are a zipped set of mainly xml files, but that Archive::Zip cannot extract to Windows, I used Archive::Zip::Member to open each file as a filehandle and for the file containing the document body:- Document.xml, (the others just get written into a new directory for rearchiving later into a new docx file), I fed its contents to XML::Parser, getting an array of deeply nested arrays and hashes back. Several arrays deep in the structure comes a "w:tbl" tag followed by an array reference to where the table starts. But I need to match a string several further levels deeper to know whether I have found the table I want to update.

So I have a fledgeling docx package with a trivial new method and two recursive traversal routines that check the ref() for how to traverse each element found. I devised an addressing mechanism as : { treetop => $arrayref, immedparent => $someref, ixork => $ArrayIndexOrHashKey } and the recursive routines aim to return the addresses for any matches to a search string.

It didn't work first time (although it does compile and do some kind of traversal already) and owing to the ridiculous amount of empty air in these structures, it takes ages to step my way through with the debugger to find out why my search routines don't do anything useful.

Meanwhile, while I headbang my way through this for another day or two, I wondered if anyone has a better idea (e.g. module(s) suggestion) for how to traverse an arbitrarily deep and baggy array of arrays and hashes looking for two matches and recording a) the reference of the next array element after the first match, b) the reference of the parent of the second match and the c) index or key within parent of the second match.

Thanks in advance!

-S

One world, one people


In reply to Traversing arbitrarily deep and baggy structures by anonymized user 468275

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.