Hi Guys,

I'm pulling XML data from a REST API feed and I need to convert said data to CSV for entry in a database. The API is for web analytics data which spits out slightly different XML files depending upon the report chosen. I want my conversion script to be able to handle slight variations in the XML schema.

First, here's the current state of my code. It works for reports which have only 2 dimensions. If a report has more dimensions, additional 'DataRow' nodes are added as children. As the Xpaths change, my script breaks, printing only the first 2 dimensions and no data. I'm a using a loop per level, so I'm hoping there's a cleaner way to go about it than having 5+ nested loops.
perl <<'EOF' use warnings; use strict; use XML::LibXML; my($data) = './work/data.xml'; my($file) = './out/data.csv'; open(my $out, '>', $file) or die "Could not open file '$file' $!"; my($parser) = XML::LibXML->new(); my $xmldata = $parser->parse_file($data); my @childarray; #The first 2 loop blocks print column headings #Dimensions ########################### #Fixed Date Heading ########################### push( @childarray, qq("WTDate")); for my $node ( $xmldata->findnodes( '/DimensionalReport/ReportDefiniti +on/list/Dimension' )) { for my $dimension ($node->findnodes('*[2]')) { my $child = $dimension->to_literal; if ($dimension->nodeName eq "string") { $child = qq("$child"); } push( @childarray, $child); } } #Measures for my $node ( $xmldata->findnodes( '/DimensionalReport/ReportDefiniti +on/list/Measure' )) { for my $measure ($node->findnodes('*[1]')) { my $child = $measure->to_literal; if ($measure->nodeName eq "string") { $child = qq("$child"); } push( @childarray, $child); } } print $out join(",", @childarray), "\n"; undef @childarray; #Print Data #Records for my $node ( $xmldata->findnodes( '/DimensionalReport/list/DataRow' +)) { for my $lvldim ($node->findnodes('./list/list/DataRow')) { my $dim1 = $node->getAttribute("name"); if ($node->nodeName eq "string" or $node->nodeName eq "list") +{ $dim1 = qq("$dim1"); } push( @childarray, $dim1); my $child = $lvldim->getAttribute("name"); if ($lvldim->nodeName eq "string" or $lvldim->nodeName eq "lis +t") { $child = qq("$child"); } push( @childarray, $child); for my $lvlmeas ($lvldim->findnodes('./list/*')) { my $child = $lvlmeas->to_literal; if ($lvlmeas->nodeName eq "string" or $lvlmeas->nodeName e +q "list") { $child = qq("$child"); } push( @childarray, $child); } print $out join(",", @childarray), "\n"; undef @childarray; } } close $out; EOF }
Now I will provide two data examples. Here is a simple one:

https://ezcrypt.it/jk6n#4qRm2gc3F7f0RnOMqL5bPaYl

Expected output should be something like this:
"Time Period","Active Visits","Page Views","Clickthroughs","Daily Visi +tors","Weekly Visitors","Monthly Visitors","Quarterly Visitors","Year +ly Visitors","Single Page View Visits","Entry Page Visits","Bounce Ra +te" 1/1/2013,9609.00,36456.00,604.00,9265.00,8948.00,9265.00,9265.00,9265. +00,4444.00,9608.00,46.25 1/2/2013,22088.00,76762.00,10291.00,21460.00,21088.00,21263.00,21263.0 +0,21263.00,12022.00,22054.00,54.51

And, something with a few more dimensions:

https://ezcrypt.it/kk6n#eVINakVb1teCRmxKvdbtirFi

And the goal output:
"WTDate","DMA","Most Recent Campaign Demand Channel","Most Recent Camp +aign Partner","Most Recent Campaign Marketing Program","Most Recent C +ampaign Marketing Activity","Most Recent Campaign Description","Most +Recent Campaign ID","Visits","Page Views","Clickthroughs","Orders","R +evenue","Average Revenue per Order","Units","Average Units per Order" +,"Average Visit Duration (Minutes)","Average Visit Page Views","Hits" +,"Daily Campaign Visitors","Weekly Campaign Visitors","Monthly Campai +gn Visitors","Quarterly Campaign Visitors","Yearly Campaign Visitors" +,"New Campaign Visitors" 2013-01-01,"500 (Portland-Auburn:ME-NH)","online","null","null","null" +,"null","ban_384938",384,4859,38,3,4958,50.00,4,2,2.3,3.1,3844,200,40 +0,500,593,2,203 2013-01-01,"500 (Portland-Auburn:ME-NH)","online","null","null","null" +,"null","ban_384950",390,4859,38,3,4958,50.00,4,2,2.3,3.1,3844,200,40 +0,500,593,2,203 2013-01-01,"501 (New York:CT-NJ-NY-PA)","online","null","null","null", +"null","ban_384950",35,200,38,3,4958,50.00,4,2,2.3,3.1,3844,200,400,5 +00,593,2,203

I apologize for my sloppy code; this is my first perl script. Any feedback to push me in the right direction would be much appreciated.

Thanks

In reply to LibXml - Convert XML to CSV from API Feed by Devon

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.