in reply to Need help with complex hash of hashes.

The XML parsing in your code seems ... odd. Can you provide a sample of the XML input? Is it really XML-encoded CSV or does the input actually have structure? If I read the code correctly, your XML parsing is implicitly assuming that all of the XML fields will be in the same order in every record.

Maybe something like:

my @items; for my $record (@records) { my %item = (); for my $tag ($record->children) { $item{$tag->name} = $tag->text } push @items, {%item}; }

This gets rid of all of the $*_pos variables and directly converts XML to an array of hashes for further processing.

The series of regex substitutions can go in their own sub cleanup. Then change the relevant line to $item{cleanup $tag->name} = cleanup $tag->text assuming that cleanup is declared with the appropriate prototype.

sub cleanup ($) { my $work = shift; $work =~ s/\r|\n//g; # Cleanup Carraige Returns $work =~ s/, / /g; # Cleanup Comma Space $work =~ s/,/ /g; # Cleanup Comma $work =~ s/"//g; # Cleanup Parentheses $work =~ s/^\s+|\s+$//g; # Trim spaces $work =~ s/([^[:ascii:]]+)/unidecode($1)/ge; return $work; }

This will create some differences, for example, $data[$class_pos] is now $items[$n]{sys_class_name} after the XML parsing loop, where $n is an index into the @items array.

Replies are listed 'Best First'.
Re^2: Need help with complex hash of hashes.
by vlturner (Sexton) on Jan 27, 2016 at 13:28 UTC

    I need the positions. The reason is I don't know anything about the data, and need to keep track fo things by column header. Yes, I will fix the cleanup, it was quick an dirty to just get rid of crap in the CMDB that shouldn't be there to begin with.

      I need the positions

      I don't understand this. With XML the order of the tags shouldn't matter. Anyway, here is my best guess at what you are trying to do, create a CSV of all the data and a hash structure of some of it.

      #!/usr/bin/perl use strict; use warnings; use XML::Twig; use Text::CSV; use Data::Dump 'pp'; my @header = (); my @csv_rows = (); my $order_by = 'u_ci_id'; my %seen = (); my %arx_data = (); # parse xml in chunks my $twig = 'XML::Twig'->new( twig_handlers => { result => \&record } ); my $input_xml = get_xml(); # change as required $twig->parse($input_xml); # data dump pp \%arx_data; #pp \@csv_rows; #pp \@header; # sort and dump csv my @sorted = sort { $b->[0] cmp $a->[0] } @csv_rows; unshift @sorted,[@header]; # add header my $outfile = 'xmldump.csv'; my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); $csv->eol("\n"); if ($outfile){ open my $fh, ">:encoding(utf8)", $outfile or die "$outfile: $!"; for (@sorted){ shift @$_; # remove sort field $csv->print ($fh, $_); } close $fh or die "new.csv: $!"; } else { for (@sorted){ shift @$_; $csv->print (\*STDOUT, $_); } } # parse 1 record sub record { my ($e,$t) = @_; # csv records unless (@header){ my @info_tags = $t->children; @header = map{ $_->name } @info_tags; unshift @header,$order_by; # add sort field } # extract data in same order as header row my @data = map{ $t->field($_) } @header; push @csv_rows,clean(@data); #returns array ref # build hash structure my $app = $t->field('u_application_id'); my $u_ci = $t->field('u_ci_id'); my $class = $t->field('sys_class_name'); my $name = $t->field('name'); my $ip = $t->field('ip_address'); if ( $class eq 'cmdb_ci_appl'){ $arx_data{$u_ci}{'name'} = $name; } if ( $class eq 'cmdb_ci_svr' ){ next if $seen{$app}{$ip}++; push @{$arx_data{$app}{'members'}},$ip; } } sub clean { my @f = @_; for (@f){ s/\r|\n//g; # Cleanup Carraige Returns s/, / /g; # Cleanup Comma Space s/,/ /g; # Cleanup Comma s/"//g; # Cleanup Parentheses s/^\s+|\s+$//g; # Trim spaces s/([^[:ascii:]]+)/unidecode($1)/ge; } return \@f; }
      poj

        I need to use that final hash to build out this CSV file. Now your code is great because it properly dumps the XML.csv, and now while I have the data in the big hash build out this secondary CSV file too. You can see where the list of member IP's need to go separated bu semicolons. member1;member2;member3

        #,,,,,,,, # Console Version: ,9.6.0,,,,,,, # Version: ,12.0.0,,,,,,, # Timestamp: ,2015-11-03 13:25:53 EST,,,,,,, #,,,,,,,, # Source: ,host1,,,,,,, # StartTime: ,2015-11-03 09:12:00 EST,,,,,,, # EndTime: ,2015-11-03 10:12:00 EST,,,,,,, # Data Type: ,Business Groups,,,,,,, #,,,,,,,, Status,Name,Description,Members,Id,InboundUtilizationBandwidth,Outboun +dUtilizationBandwidth,Appliances,CollectOnDirector W,Application1,,10.10.10.1;10.10.10.2;10.4.23.34,68,0,0,,

        So my original code contained the following.

        my $twig = 'XML::Twig'->new; $twig->xparse($input_xml);

        And POJ's code contains the following.

        my $twig = 'XML::Twig'->new( twig_handlers => { result => \&record }) +; $twig->parse($input_xml);

        POJ's code when I do 10000 records is blowing up in the Twig module as follows.

        DB<2> c Exiting subroutine via next at vance.pl line 91. at vance.pl line 91. main::record(XML::Twig=HASH(0x6003ead30), XML::Twig::Elt=HASH(0x60 +2c20e90)) called at /usr/lib/perl5/site_perl/5.22/XML/Twig.pm line 23 +48 XML::Twig::_twig_end(XML::Parser::Expat=HASH(0x6003fafe0), "result +") called at /usr/lib/perl5/site_perl/5.22/x86_64-cygwin-threads/XML/ +Parser/Expat.pm line 474 XML::Parser::Expat::parse(XML::Parser::Expat=HASH(0x6003fafe0), "< +?xml version=\"1.0\" encoding=\"UTF-8\"?><response><result><chec"...) + called at /usr/lib/perl5/site_perl/5.22/x86_64-cygwin-threads/XML/Pa +rser.pm line 187 eval {...} called at /usr/lib/perl5/site_perl/5.22/x86_64-cygwin-t +hreads/XML/Parser.pm line 186 XML::Parser::parse(XML::Twig=HASH(0x6003ead30), "<?xml version=\"1 +.0\" encoding=\"UTF-8\"?><response><result><chec"...) called at /usr/ +lib/perl5/site_perl/5.22/XML/Twig.pm line 767 eval {...} called at /usr/lib/perl5/site_perl/5.22/XML/Twig.pm lin +e 767 XML::Twig::parse(XML::Twig=HASH(0x6003ead30), "<?xml version=\"1.0 +\" encoding=\"UTF-8\"?><response><result><chec"...) called at vance.p +l line 33 Exiting subroutine via next at vance.pl line 91.

        It is deep in the Twig module when I single step, but I think its tied to some sort of overflow condition. Any ideas as to how to fix this? POJ, why did you change those lines? Here is the current code.

        use strict; use warnings; use XML::Twig; use Text::CSV; use Data::Dump 'pp'; use Data::Dumper; use REST::Client; use MIME::Base64; use Text::Unidecode; use utf8; my @header = (); my @csv_rows = (); my $order_by = 'u_ci_id'; my %seen = (); my %arx_data = (); # parse xml in chunks my $twig = 'XML::Twig'->new( twig_handlers => { result => \&record }) +; my $client = REST::Client->new(host => 'https://staplessb.service-now. +com'); my $encoded_auth = encode_base64("1774580:Natalie1", ''); $client->GET("/api/now/table/cmdb_ci?sysparm_limit=10000", {'Authorization' => "Basic $encoded_auth", 'Accept' => 'application/xml'}); my $input_xml = $client->responseContent(); $twig->parse($input_xml); # data dump pp \%arx_data; #pp \@csv_rows; #pp \@header; # sort and dump csv my @sorted = sort { $b->[0] cmp $a->[0] } @csv_rows; unshift @sorted,[@header]; # add header my $outfile = 'xmldump.csv'; my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); $csv->eol("\n"); if ($outfile){ open my $fh, ">:encoding(utf8)", $outfile or die "$outfile: $!"; for (@sorted){ shift @$_; # remove sort field $csv->print ($fh, $_); } close $fh or die "new.csv: $!"; } else { for (@sorted){ shift @$_; $csv->print (\*STDOUT, $_); } } # parse 1 record sub record { my ($e,$t) = @_; # csv records unless (@header){ my @info_tags = $t->children; @header = map{ $_->name } @info_tags; unshift @header,$order_by; # add sort field } # extract data in same order as header row my @data = map{ $t->field($_) } @header; push @csv_rows,clean(@data); #returns array ref # build hash structure my $app = $t->field('u_application_id'); my $u_ci = $t->field('u_ci_id'); my $class = $t->field('sys_class_name'); my $name = $t->field('name'); my $ip = $t->field('ip_address'); my $fqdn = $t->field('fqdn'); my $dns = $t->field('dns_domain'); my $mac = $t->field('mac'); if ( $class eq 'cmdb_ci_appl'){ $arx_data{$u_ci}{'name'} = $name; } else { next if $seen{$app}{$ip}++; push @{$arx_data{$app}{'members'}},$ip; } } sub clean { my @f = @_; for (@f){ s/\r|\n//g; # Cleanup Carraige Returns s/, / /g; # Cleanup Comma Space s/,/ /g; # Cleanup Comma s/"//g; # Cleanup Parentheses s/^\s+|\s+$//g; # Trim spaces s/([^[:ascii:]]+)/unidecode($1)/ge; } return \@f; }

      Does the first row returned contain a set of empty tags and subsequent rows contain the same tags with data? Something like:

      <result> <checked_in></checked_in> <change_request></change_request> <po_number></po_number> </result> <result> <checked_in>yesterday</checked_in> <change_request>AQX123</change_request> <po_number>42</po_number> </result>

      Until, of course, you get to:

      <result> <po_number>42</po_number> <checked_in>1 week ago</checked_in> <change_request>AQX120</change_request> </result>

      And your parsing by position produces bogus results.

      Or is it really XML-encoded CSV? More like:

      <result> <checked_in></checked_in> <change_request></change_request> <po_number></po_number> </result> <result> <bogon>yesterday</bogon> <bogon>AQX123</bogon> <bogon>42</bogon> </result>

      In the former case, you don't need the positions after all, and the fact that they change from run to run should indicate that they are not significant. In the latter case, your CMDB belongs on the Daily WTF. Your code mentions an option for JSON output. Perhaps that module is less broken?

        So there are 250K CI records. You have CI's for devices with their IP addresses where 'name' is the hostname and a referral CI ( 'u_application_id' )to the application they are children of, and you have CI's that are the application, where 'name' is the application name. You have no control of the order of the records. If the class name 'sys_class_name' is equal to 'cmdb_ci_appl' then it is an application name. If not then it is probably a device and if its 'u_application_id' CI matches and application CI 'u_ci_id' then its 'ip_address' is a member of that application. Now all lines have a CI 'u_ci_id' something like CI10001127. I hope this helps. We are trying to end up with a hash that looks like this.

        my %application = ( 'CI10335478' => { name => 'app_name1', member => [ 'MemberIP1', 'MemberIP2', 'MemberIP6', ], }, 'CI10334984' => { name => 'app_name2', member => [ 'MemberIP4', 'MemberIP2', ], },

        But are ending up with this.

        { CI10001127 => { name => "Exchange 2003" }, CI10004051 => { name => "Centralized Payment Services Group (CPSG)" +}, CI10047601 => { name => "PGP Desktop" } }

        With no members.....

        if ( $class eq 'cmdb_ci_appl'){ $arx_data{$u_ci}{'name'} = $name; } else { next if $seen{$app}{$ip}++; push @{$arx_data{$app}{'members'}},$ip; }

        Here is an XML record

        <result><checked_in></checked_in><change_request></change_request><po_ +number></po_number><u_encryption_schema></u_encryption_schema><correl +ation_id></correlation_id><cfg_auto_provider></cfg_auto_provider><sup +ported_by></supported_by><u_service_catalog>true</u_service_catalog>< +first_discovered></first_discovered><u_oem_part_numbe></u_oem_part_nu +mbe><sent_for_repair></sent_for_repair><owned_by></owned_by><gl_accou +nt></gl_account><u_project_id></u_project_id><managed_by></managed_by +><asset></asset><maintenance_schedule></maintenance_schedule><u_chg>f +alse</u_chg><category>Hardware</category><delivery_date></delivery_da +te><install_status>8</install_status><u_maintenance_window>Friday 19: +00 - 08:00</u_maintenance_window><virtual>false</virtual><u_sub_class +></u_sub_class><dns_domain></dns_domain><u_incident>true</u_incident> +<u_parts_software_needed></u_parts_software_needed><change_control></ +change_control><checked_out></checked_out><u_decommission_stamp></u_d +ecommission_stamp><purchase_date></purchase_date><order_date></order_ +date><u_template></u_template><skip_sync>false</skip_sync><lease_id>< +/lease_id><u_sevice_options></u_sevice_options><vendor></vendor><sys_ +id>094722740a0a3c9b01928362f7df6468</sys_id><u_maint_contract_sla>1</ +u_maint_contract_sla><sys_created_by>1522337</sys_created_by><u_sla_s +upport_tie></u_sla_support_tie><sys_domain_path>/</sys_domain_path><s +ubcategory>Computer</subcategory><start_date></start_date><comments>< +/comments><location><link>https://foobar.bozo-now.com/api/now/table/c +mn_location/f7eaf4e40ff13100ac2b348ce1050e35</link><value>f7eaf4e40ff +13100ac2b348ce1050e35</value></location><unverified>false</unverified +><justification></justification><u_disposal_date></u_disposal_date><s +ys_domain><link>https://foobar.bozo-now.com/api/now/table/sys_user_gr +oup/global</link><value>global</value></sys_domain><sys_tags></sys_ta +gs><sys_mod_count>194</sys_mod_count><cost_cc>USD</cost_cc><u_total_p +urchase_cost>5576.73</u_total_purchase_cost><monitor>false</monitor>< +sys_updated_on>2015-10-13 07:33:56</sys_updated_on><warranty_expirati +on>2013-05-20</warranty_expiration><invoice_number></invoice_number>< +fqdn></fqdn><cost></cost><u_retail_grp></u_retail_grp><u_server_type> +</u_server_type><ip_address>10.10.10.10</ip_address><u_manufacture_da +te_dummy>2013</u_manufacture_date_dummy><last_discovered>2015-10-13 0 +2:10:13</last_discovered><model_id><link>https://foobar.bozo-now.com/ +api/now/table/cmdb_model/0b40134cfdaf9804103f34566ba9df5a</link><valu +e>0b40134cfdaf9804103f34566ba9df5a</value></model_id><manufacturer><l +ink>https://foobar.bozbozo-now.com/api/now/table/core_company/41be01e +e0a0a3c0e59174d9f419c61de</link><value>41be01ee0a0a3c0e59174d9f419c61 +de</value></manufacturer><u_inc>false</u_inc><company></company><due> +</due><cfg_auto_management_server></cfg_auto_management_server><u_pro +posed>false</u_proposed><u_change>true</u_change><u_work_notes></u_wo +rk_notes><u_store_location></u_store_location><asset_tag>SCO.ST.10563 +4</asset_tag><discovery_source>Manual Entry</discovery_source><u_appl +ication_id></u_application_id><assignment_group></assignment_group><c +an_print>false</can_print><u_problem>true</u_problem><department></de +partment><u_manufacture_date>2013-07-02</u_manufacture_date><support_ +group><link>https://foobar.bozo-now.com/api/now/table/sys_user_group/ +4436591d0a0a3c0e76d6e42af5bba6e7</link><value>4436591d0a0a3c0e76d6e42 +af5bba6e7</value></support_group><u_exclusion>false</u_exclusion><sys +_created_on>2010-11-01 21:10:45</sys_created_on><u_display_name>lnxnm +sprag27</u_display_name><u_alias></u_alias><cost_center></cost_center +><short_description>NOMS</short_description><sys_updated_by>1742882</ +sys_updated_by><name>lnxnmsprag27</name><u_environment>PROD</u_enviro +nment><u_function></u_function><due_in></due_in><u_change_approver>tr +ue</u_change_approver><install_date></install_date><assigned></assign +ed><u_active_stamp></u_active_stamp><u_retail_group></u_retail_group> +<serial_number>USE020N1VV</serial_number><u_work_notes__asset_></u_wo +rk_notes__asset_><repair_contract_id></repair_contract_id><assigned_t +o></assigned_to><mac_address></mac_address><model_number></model_numb +er><schedule></schedule><u_ci_id>CI10110212</u_ci_id><returned_from_r +epair></returned_from_repair><ng_assignment_flag>ready</ng_assignment +_flag><sys_class_name>cmdb_ci_linux_server</sys_class_name><u_procure +ment_product_number></u_procurement_product_number><attributes></attr +ibutes><u_location_stamp>2015-06-18 14:54:20</u_location_stamp><fault +_count>0</fault_count><operational_status>1</operational_status><cfg_ +auto_change></cfg_auto_change></result>