Hi,

I'm trying to write a script to reconstruct directory structures and file names that are described by an XML file, however I'm meeting with mixed success. As far as Perl scripting goes I'm still using my training wheels.

At work we have an application that basically archives directories, and files by renaming all the files and directories into MD5 hash names, then tossing the lot into a single directory. It writes the description of "which file goes where" into an XML document.

Unfortunately, it also tosses in about 10 attributes for every item, only two of which I really need, those being the original name, and the MD5 equivalent name. I found an example of a script that does something similar, and was able to modify it for my needs. The script doesn't seem to like anything complex, like my XML document though. It spits the output out as one long unbroken string of MD5 names, followed by another unbroken string of file names.

It is getting the directory structures right, but just mashing everything in the directory together. I'm just not understanding how to make the script isolate individual attributes correctly for each XML element. Here is an example of the XML data structure:

<ncp_directory op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upol="0" + uprs="0" vpol="1" vnipol="1" rpol="1" user_specific="0" ntperm="0" n +ame="$dir1" flags="" lm="129232888600305382" cr="129232888600305382" +> <ncp_directory op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upol +="0" uprs="0" vpol="1" vnipol="1" rpol="1" user_specific="0" ntperm=" +0" name="CutePDFWriter" flags="" lm="129232886309260678" cr="12923271 +1066448490" > <ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upol="0 +" uprs="0" vpol="1" vnipol="1" rpol="1" name="cpwmon2k.dll" length="8 +7552" md5="27A8QATED9I2Ox8F65OGEPPDCIV" flags="a" lm="129018983800000 +000" cr="129232711245774126" gac_register_op="SAME" register="false" +/> <ncp_directory op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" up +ol="0" uprs="0" vpol="1" vnipol="1" rpol="1" user_specific="0" ntperm +="0" name="converter" flags="" lm="129232881029776793" cr="1292328706 +12045954" > <ncp_directory op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" +upol="0" uprs="0" vpol="1" vnipol="1" rpol="1" user_specific="0" ntpe +rm="0" name="GPLGS" flags="" lm="129232870625951047" cr="129232870612 +202191" > <ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upo +l="0" uprs="0" vpol="1" vnipol="1" rpol="1" name="gsdll32.dll" length +="2768896" md5="5F7UGLCH9K3GKxBNML1LM0G3RNL" flags="a" lm="1274070452 +20000000" cr="129232870614545746" gac_register_op="SAME" register="fa +lse" /> <ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upo +l="0" uprs="0" vpol="1" vnipol="1" rpol="1" name="a010013l.pfb" lengt +h="69958" md5="7EDJ7V7QHMBQ1x6HLC54FG0OP6T" flags="a" lm="12685496594 +0000000" cr="129232870612202191" gac_register_op="SAME" /> <!--- truncated for brevity sake--> <ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upo +l="0" uprs="0" vpol="1" vnipol="1" rpol="1" name="z003034l.pfb" lengt +h="113405" md5="D6I2GGENUCQLEx6FMO1IPG1E8F7" flags="a" lm="1268541248 +40000000" cr="129232870625951047" gac_register_op="SAME" /> <ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upo +l="0" uprs="0" vpol="1" vnipol="1" rpol="1" name="zeroline.ps" length +="2567" md5="FETPJPBOOF039xCCTQFGII9DNN0" flags="a" lm="1265889176800 +00000" cr="129232870625951047" gac_register_op="SAME" /> </ncp_directory> <ncp_file op="ADD" ipol="0" iprs="0" uppol="0" upprs="0" upol= +"0" uprs="0" vpol="1" vnipol="1" rpol="1" name="GSSetup.exe" length=" +122880" md5="E61K8P45E8D81x3T3E47C8QIP0U" flags="a" lm="1277517870000 +00000" cr="129232870612045954" gac_register_op="SAME" /> </ncp_directory>

As you can see, it goes Directory name contained files with attributes /end directory tag etc. Here is my script:

use XML::XPath; my $file = 'ncpobjs.xml'; my $xp = XML::XPath->new(filename => $file); foreach my $ncptype ($xp->find('//ncp_directory')->get_nodelist){ print $ncptype->find('ncp_file')->string_value; print ' (' . $ncptype->find('@name') . ') '; print $ncptype->find('ncp_file/@md5'), " ", $ncptype->find('ncp_fi +le/@name'), "\n"; print "\n"; }

And here is an example of the quasi-gibberish that I'm getting as output for each directory level:

(x64) ALSOO431VHGO2x825OF80GN8RNM605U9UMHOR3M1xEQIPMMRKFK3F0 PSCRIPT.HLPPSCRIPT.NTF

So it boils down to how do I change this odd output into something like "MD5 Name = File Name" for each file element? I have the feeling I might need another for-loop inside to deal with the files, I just can't figure out where to place it. Any insight would be very much appreciated!


In reply to Having problems accessing individual attributes in xml by Gemenon

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.