in reply to Re: Need help with complex hash of hashes.
in thread Need help with complex hash of hashes.

I need the positions. The reason is I don't know anything about the data, and need to keep track fo things by column header. Yes, I will fix the cleanup, it was quick an dirty to just get rid of crap in the CMDB that shouldn't be there to begin with.

  • Comment on Re^2: Need help with complex hash of hashes.

Replies are listed 'Best First'.
Re^3: Need help with complex hash of hashes.
by poj (Abbot) on Jan 27, 2016 at 14:18 UTC
    I need the positions

    I don't understand this. With XML the order of the tags shouldn't matter. Anyway, here is my best guess at what you are trying to do, create a CSV of all the data and a hash structure of some of it.

    #!/usr/bin/perl use strict; use warnings; use XML::Twig; use Text::CSV; use Data::Dump 'pp'; my @header = (); my @csv_rows = (); my $order_by = 'u_ci_id'; my %seen = (); my %arx_data = (); # parse xml in chunks my $twig = 'XML::Twig'->new( twig_handlers => { result => \&record } ); my $input_xml = get_xml(); # change as required $twig->parse($input_xml); # data dump pp \%arx_data; #pp \@csv_rows; #pp \@header; # sort and dump csv my @sorted = sort { $b->[0] cmp $a->[0] } @csv_rows; unshift @sorted,[@header]; # add header my $outfile = 'xmldump.csv'; my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); $csv->eol("\n"); if ($outfile){ open my $fh, ">:encoding(utf8)", $outfile or die "$outfile: $!"; for (@sorted){ shift @$_; # remove sort field $csv->print ($fh, $_); } close $fh or die "new.csv: $!"; } else { for (@sorted){ shift @$_; $csv->print (\*STDOUT, $_); } } # parse 1 record sub record { my ($e,$t) = @_; # csv records unless (@header){ my @info_tags = $t->children; @header = map{ $_->name } @info_tags; unshift @header,$order_by; # add sort field } # extract data in same order as header row my @data = map{ $t->field($_) } @header; push @csv_rows,clean(@data); #returns array ref # build hash structure my $app = $t->field('u_application_id'); my $u_ci = $t->field('u_ci_id'); my $class = $t->field('sys_class_name'); my $name = $t->field('name'); my $ip = $t->field('ip_address'); if ( $class eq 'cmdb_ci_appl'){ $arx_data{$u_ci}{'name'} = $name; } if ( $class eq 'cmdb_ci_svr' ){ next if $seen{$app}{$ip}++; push @{$arx_data{$app}{'members'}},$ip; } } sub clean { my @f = @_; for (@f){ s/\r|\n//g; # Cleanup Carraige Returns s/, / /g; # Cleanup Comma Space s/,/ /g; # Cleanup Comma s/"//g; # Cleanup Parentheses s/^\s+|\s+$//g; # Trim spaces s/([^[:ascii:]]+)/unidecode($1)/ge; } return \@f; }
    poj

      I need to use that final hash to build out this CSV file. Now your code is great because it properly dumps the XML.csv, and now while I have the data in the big hash build out this secondary CSV file too. You can see where the list of member IP's need to go separated bu semicolons. member1;member2;member3

      #,,,,,,,, # Console Version: ,9.6.0,,,,,,, # Version: ,12.0.0,,,,,,, # Timestamp: ,2015-11-03 13:25:53 EST,,,,,,, #,,,,,,,, # Source: ,host1,,,,,,, # StartTime: ,2015-11-03 09:12:00 EST,,,,,,, # EndTime: ,2015-11-03 10:12:00 EST,,,,,,, # Data Type: ,Business Groups,,,,,,, #,,,,,,,, Status,Name,Description,Members,Id,InboundUtilizationBandwidth,Outboun +dUtilizationBandwidth,Appliances,CollectOnDirector W,Application1,,10.10.10.1;10.10.10.2;10.4.23.34,68,0,0,,

      So my original code contained the following.

      my $twig = 'XML::Twig'->new; $twig->xparse($input_xml);

      And POJ's code contains the following.

      my $twig = 'XML::Twig'->new( twig_handlers => { result => \&record }) +; $twig->parse($input_xml);

      POJ's code when I do 10000 records is blowing up in the Twig module as follows.

      DB<2> c Exiting subroutine via next at vance.pl line 91. at vance.pl line 91. main::record(XML::Twig=HASH(0x6003ead30), XML::Twig::Elt=HASH(0x60 +2c20e90)) called at /usr/lib/perl5/site_perl/5.22/XML/Twig.pm line 23 +48 XML::Twig::_twig_end(XML::Parser::Expat=HASH(0x6003fafe0), "result +") called at /usr/lib/perl5/site_perl/5.22/x86_64-cygwin-threads/XML/ +Parser/Expat.pm line 474 XML::Parser::Expat::parse(XML::Parser::Expat=HASH(0x6003fafe0), "< +?xml version=\"1.0\" encoding=\"UTF-8\"?><response><result><chec"...) + called at /usr/lib/perl5/site_perl/5.22/x86_64-cygwin-threads/XML/Pa +rser.pm line 187 eval {...} called at /usr/lib/perl5/site_perl/5.22/x86_64-cygwin-t +hreads/XML/Parser.pm line 186 XML::Parser::parse(XML::Twig=HASH(0x6003ead30), "<?xml version=\"1 +.0\" encoding=\"UTF-8\"?><response><result><chec"...) called at /usr/ +lib/perl5/site_perl/5.22/XML/Twig.pm line 767 eval {...} called at /usr/lib/perl5/site_perl/5.22/XML/Twig.pm lin +e 767 XML::Twig::parse(XML::Twig=HASH(0x6003ead30), "<?xml version=\"1.0 +\" encoding=\"UTF-8\"?><response><result><chec"...) called at vance.p +l line 33 Exiting subroutine via next at vance.pl line 91.

      It is deep in the Twig module when I single step, but I think its tied to some sort of overflow condition. Any ideas as to how to fix this? POJ, why did you change those lines? Here is the current code.

      use strict; use warnings; use XML::Twig; use Text::CSV; use Data::Dump 'pp'; use Data::Dumper; use REST::Client; use MIME::Base64; use Text::Unidecode; use utf8; my @header = (); my @csv_rows = (); my $order_by = 'u_ci_id'; my %seen = (); my %arx_data = (); # parse xml in chunks my $twig = 'XML::Twig'->new( twig_handlers => { result => \&record }) +; my $client = REST::Client->new(host => 'https://staplessb.service-now. +com'); my $encoded_auth = encode_base64("1774580:Natalie1", ''); $client->GET("/api/now/table/cmdb_ci?sysparm_limit=10000", {'Authorization' => "Basic $encoded_auth", 'Accept' => 'application/xml'}); my $input_xml = $client->responseContent(); $twig->parse($input_xml); # data dump pp \%arx_data; #pp \@csv_rows; #pp \@header; # sort and dump csv my @sorted = sort { $b->[0] cmp $a->[0] } @csv_rows; unshift @sorted,[@header]; # add header my $outfile = 'xmldump.csv'; my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); $csv->eol("\n"); if ($outfile){ open my $fh, ">:encoding(utf8)", $outfile or die "$outfile: $!"; for (@sorted){ shift @$_; # remove sort field $csv->print ($fh, $_); } close $fh or die "new.csv: $!"; } else { for (@sorted){ shift @$_; $csv->print (\*STDOUT, $_); } } # parse 1 record sub record { my ($e,$t) = @_; # csv records unless (@header){ my @info_tags = $t->children; @header = map{ $_->name } @info_tags; unshift @header,$order_by; # add sort field } # extract data in same order as header row my @data = map{ $t->field($_) } @header; push @csv_rows,clean(@data); #returns array ref # build hash structure my $app = $t->field('u_application_id'); my $u_ci = $t->field('u_ci_id'); my $class = $t->field('sys_class_name'); my $name = $t->field('name'); my $ip = $t->field('ip_address'); my $fqdn = $t->field('fqdn'); my $dns = $t->field('dns_domain'); my $mac = $t->field('mac'); if ( $class eq 'cmdb_ci_appl'){ $arx_data{$u_ci}{'name'} = $name; } else { next if $seen{$app}{$ip}++; push @{$arx_data{$app}{'members'}},$ip; } } sub clean { my @f = @_; for (@f){ s/\r|\n//g; # Cleanup Carraige Returns s/, / /g; # Cleanup Comma Space s/,/ /g; # Cleanup Comma s/"//g; # Cleanup Parentheses s/^\s+|\s+$//g; # Trim spaces s/([^[:ascii:]]+)/unidecode($1)/ge; } return \@f; }

        Try changing last lines of record subroutine to this

        . . if ( $class eq 'cmdb_ci_appl'){ $arx_data{$u_ci}{'name'} = $name; } elsif (! $seen{$app}{$ip}++) { push @{$arx_data{$app}{'members'}},$ip; } $t->purge ; # add this line to free memory }
        poj
Re^3: Need help with complex hash of hashes.
by jcb (Parson) on Jan 28, 2016 at 02:52 UTC

    Does the first row returned contain a set of empty tags and subsequent rows contain the same tags with data? Something like:

    <result> <checked_in></checked_in> <change_request></change_request> <po_number></po_number> </result> <result> <checked_in>yesterday</checked_in> <change_request>AQX123</change_request> <po_number>42</po_number> </result>

    Until, of course, you get to:

    <result> <po_number>42</po_number> <checked_in>1 week ago</checked_in> <change_request>AQX120</change_request> </result>

    And your parsing by position produces bogus results.

    Or is it really XML-encoded CSV? More like:

    <result> <checked_in></checked_in> <change_request></change_request> <po_number></po_number> </result> <result> <bogon>yesterday</bogon> <bogon>AQX123</bogon> <bogon>42</bogon> </result>

    In the former case, you don't need the positions after all, and the fact that they change from run to run should indicate that they are not significant. In the latter case, your CMDB belongs on the Daily WTF. Your code mentions an option for JSON output. Perhaps that module is less broken?

      So there are 250K CI records. You have CI's for devices with their IP addresses where 'name' is the hostname and a referral CI ( 'u_application_id' )to the application they are children of, and you have CI's that are the application, where 'name' is the application name. You have no control of the order of the records. If the class name 'sys_class_name' is equal to 'cmdb_ci_appl' then it is an application name. If not then it is probably a device and if its 'u_application_id' CI matches and application CI 'u_ci_id' then its 'ip_address' is a member of that application. Now all lines have a CI 'u_ci_id' something like CI10001127. I hope this helps. We are trying to end up with a hash that looks like this.

      my %application = ( 'CI10335478' => { name => 'app_name1', member => [ 'MemberIP1', 'MemberIP2', 'MemberIP6', ], }, 'CI10334984' => { name => 'app_name2', member => [ 'MemberIP4', 'MemberIP2', ], },

      But are ending up with this.

      { CI10001127 => { name => "Exchange 2003" }, CI10004051 => { name => "Centralized Payment Services Group (CPSG)" +}, CI10047601 => { name => "PGP Desktop" } }

      With no members.....

      if ( $class eq 'cmdb_ci_appl'){ $arx_data{$u_ci}{'name'} = $name; } else { next if $seen{$app}{$ip}++; push @{$arx_data{$app}{'members'}},$ip; }

        After parsing the XML into something sane (see my previous comment), the next step is to build the hash you want:

        my %applications; for my $item (@items) { if ($item->{sys_class_name} eq 'cmdb_ci_appl') { $applications{$item->{u_ci_id}}{name} = $item->{name} } else # assume device { push @{$applications{$item->{u_application_id}}->{member}}, $item->{ip_address} if $item->{ip_address} } }

        This code has minimal consistency checks. Making it fully robust is left as an exercise for the reader.

      Here is an XML record

      <result><checked_in></checked_in><change_request></change_request><po_ +number></po_number><u_encryption_schema></u_encryption_schema><correl +ation_id></correlation_id><cfg_auto_provider></cfg_auto_provider><sup +ported_by></supported_by><u_service_catalog>true</u_service_catalog>< +first_discovered></first_discovered><u_oem_part_numbe></u_oem_part_nu +mbe><sent_for_repair></sent_for_repair><owned_by></owned_by><gl_accou +nt></gl_account><u_project_id></u_project_id><managed_by></managed_by +><asset></asset><maintenance_schedule></maintenance_schedule><u_chg>f +alse</u_chg><category>Hardware</category><delivery_date></delivery_da +te><install_status>8</install_status><u_maintenance_window>Friday 19: +00 - 08:00</u_maintenance_window><virtual>false</virtual><u_sub_class +></u_sub_class><dns_domain></dns_domain><u_incident>true</u_incident> +<u_parts_software_needed></u_parts_software_needed><change_control></ +change_control><checked_out></checked_out><u_decommission_stamp></u_d +ecommission_stamp><purchase_date></purchase_date><order_date></order_ +date><u_template></u_template><skip_sync>false</skip_sync><lease_id>< +/lease_id><u_sevice_options></u_sevice_options><vendor></vendor><sys_ +id>094722740a0a3c9b01928362f7df6468</sys_id><u_maint_contract_sla>1</ +u_maint_contract_sla><sys_created_by>1522337</sys_created_by><u_sla_s +upport_tie></u_sla_support_tie><sys_domain_path>/</sys_domain_path><s +ubcategory>Computer</subcategory><start_date></start_date><comments>< +/comments><location><link>https://foobar.bozo-now.com/api/now/table/c +mn_location/f7eaf4e40ff13100ac2b348ce1050e35</link><value>f7eaf4e40ff +13100ac2b348ce1050e35</value></location><unverified>false</unverified +><justification></justification><u_disposal_date></u_disposal_date><s +ys_domain><link>https://foobar.bozo-now.com/api/now/table/sys_user_gr +oup/global</link><value>global</value></sys_domain><sys_tags></sys_ta +gs><sys_mod_count>194</sys_mod_count><cost_cc>USD</cost_cc><u_total_p +urchase_cost>5576.73</u_total_purchase_cost><monitor>false</monitor>< +sys_updated_on>2015-10-13 07:33:56</sys_updated_on><warranty_expirati +on>2013-05-20</warranty_expiration><invoice_number></invoice_number>< +fqdn></fqdn><cost></cost><u_retail_grp></u_retail_grp><u_server_type> +</u_server_type><ip_address>10.10.10.10</ip_address><u_manufacture_da +te_dummy>2013</u_manufacture_date_dummy><last_discovered>2015-10-13 0 +2:10:13</last_discovered><model_id><link>https://foobar.bozo-now.com/ +api/now/table/cmdb_model/0b40134cfdaf9804103f34566ba9df5a</link><valu +e>0b40134cfdaf9804103f34566ba9df5a</value></model_id><manufacturer><l +ink>https://foobar.bozbozo-now.com/api/now/table/core_company/41be01e +e0a0a3c0e59174d9f419c61de</link><value>41be01ee0a0a3c0e59174d9f419c61 +de</value></manufacturer><u_inc>false</u_inc><company></company><due> +</due><cfg_auto_management_server></cfg_auto_management_server><u_pro +posed>false</u_proposed><u_change>true</u_change><u_work_notes></u_wo +rk_notes><u_store_location></u_store_location><asset_tag>SCO.ST.10563 +4</asset_tag><discovery_source>Manual Entry</discovery_source><u_appl +ication_id></u_application_id><assignment_group></assignment_group><c +an_print>false</can_print><u_problem>true</u_problem><department></de +partment><u_manufacture_date>2013-07-02</u_manufacture_date><support_ +group><link>https://foobar.bozo-now.com/api/now/table/sys_user_group/ +4436591d0a0a3c0e76d6e42af5bba6e7</link><value>4436591d0a0a3c0e76d6e42 +af5bba6e7</value></support_group><u_exclusion>false</u_exclusion><sys +_created_on>2010-11-01 21:10:45</sys_created_on><u_display_name>lnxnm +sprag27</u_display_name><u_alias></u_alias><cost_center></cost_center +><short_description>NOMS</short_description><sys_updated_by>1742882</ +sys_updated_by><name>lnxnmsprag27</name><u_environment>PROD</u_enviro +nment><u_function></u_function><due_in></due_in><u_change_approver>tr +ue</u_change_approver><install_date></install_date><assigned></assign +ed><u_active_stamp></u_active_stamp><u_retail_group></u_retail_group> +<serial_number>USE020N1VV</serial_number><u_work_notes__asset_></u_wo +rk_notes__asset_><repair_contract_id></repair_contract_id><assigned_t +o></assigned_to><mac_address></mac_address><model_number></model_numb +er><schedule></schedule><u_ci_id>CI10110212</u_ci_id><returned_from_r +epair></returned_from_repair><ng_assignment_flag>ready</ng_assignment +_flag><sys_class_name>cmdb_ci_linux_server</sys_class_name><u_procure +ment_product_number></u_procurement_product_number><attributes></attr +ibutes><u_location_stamp>2015-06-18 14:54:20</u_location_stamp><fault +_count>0</fault_count><operational_status>1</operational_status><cfg_ +auto_change></cfg_auto_change></result>

        If you run this against your XML it will filter out the relevant tags

        #!/usr/bin/perl use strict; use XML::Twig; my $input_xml = 'big.xml'; # tags to keep my $re = qr/u_application_i|u_ci_id|sys_class_name|name|ip_address/; my $twig = 'XML::Twig'->new( twig_handlers => { result => \&record }, ); open my $fh,'>','small.xml' or die "$!"; $twig->parsefile($input_xml); sub record { my ($e,$t) = @_; for ($t->children){ $_->cut unless ($_->name =~ $re); } $t->flush($fh,pretty_print => 'indented'); }

        Result

        <result> <ip_address>10.10.10.10</ip_address> <u_application_id></u_application_id> <u_display_name>lnxnmsprag27</u_display_name> <name>lnxnmsprag27</name> <u_ci_id>CI10110212</u_ci_id> <sys_class_name>cmdb_ci_linux_server</sys_class_name> </result>
        poj

        So here is some sample code that parses this into a hash with the sample XML embedded in the script at the __DATA__ marker:

        #!/usr/bin/perl use strict; use warnings; use XML::Twig; use Data::Dumper; my $xml; ($xml = join '',<DATA>) =~ s/\n//g; my $twig = XML::Twig->new; $twig->xparse($xml); my @records = $twig->root->children; my @items; for my $record (@records) { my %item = (); for my $tag ($record->children) { $item{$tag->name} = $tag->text } push @items, {%item}; } print Data::Dumper->Dump([\@items], ['items']); __DATA__ <response><result><checked_in></checked_in><change_request></change_re +quest> <po_number></po_number><u_encryption_schema></u_encryption_schema> <correlation_id></correlation_id><cfg_auto_provider></cfg_auto_provide +r> <supported_by></supported_by><u_service_catalog>true</u_service_catalo +g> <first_discovered></first_discovered><u_oem_part_numbe></u_oem_part_nu +mbe> <sent_for_repair></sent_for_repair><owned_by></owned_by> <gl_account></gl_account><u_project_id></u_project_id><managed_by></ma +naged_by> <asset></asset><maintenance_schedule></maintenance_schedule> <u_chg>false</u_chg><category>Hardware</category> <delivery_date></delivery_date><install_status>8</install_status> <u_maintenance_window>Friday 19:00 - 08:00</u_maintenance_window> <virtual>false</virtual><u_sub_class></u_sub_class><dns_domain></dns_d +omain> <u_incident>true</u_incident> <u_parts_software_needed></u_parts_software_needed> <change_control></change_control><checked_out></checked_out> <u_decommission_stamp></u_decommission_stamp><purchase_date></purchase +_date> <order_date></order_date><u_template></u_template><skip_sync>false</sk +ip_sync> <lease_id></lease_id><u_sevice_options></u_sevice_options><vendor></ve +ndor> <sys_id>094722740a0a3c9b01928362f7df6468</sys_id> <u_maint_contract_sla>1</u_maint_contract_sla> <sys_created_by>1522337</sys_created_by><u_sla_support_tie></u_sla_sup +port_tie> <sys_domain_path>/</sys_domain_path><subcategory>Computer</subcategory +> <start_date></start_date><comments></comments> <location><link>https://foobar.bozo-now.com/api/now/table/cmn_location +/f7 eaf4e40ff13100ac2b348ce1050e35</link><value>f7eaf4e40ff13100ac2b348ce 1050e35</value></location><unverified>false</unverified> <justification></justification><u_disposal_date></u_disposal_date> <sys_domain><link>https://foobar.bozo-now.com/api/now/table/sys_user_g +rou p/global</link><value>global</value></sys_domain><sys_tags></sys_tags> <sys_mod_count>194</sys_mod_count><cost_cc>USD</cost_cc> <u_total_purchase_cost>5576.73</u_total_purchase_cost><monitor>false</ +monitor> <sys_updated_on>2015-10-13 07:33:56</sys_updated_on> <warranty_expiration>2013-05-20</warranty_expiration> <invoice_number></invoice_number><fqdn></fqdn><cost></cost> <u_retail_grp></u_retail_grp><u_server_type></u_server_type> <ip_address>10.10.10.10</ip_address> <u_manufacture_date_dummy>2013</u_manufacture_date_dummy> <last_discovered>2015-10-13 02:10:13</last_discovered> <model_id><link>https://foobar.bozo-now.com/api/now/table/cmdb_model/0 +b401 34cfdaf9804103f34566ba9df5a</link><value>0b40134cfdaf9804103f3 4566ba9df5a</value></model_id> <manufacturer><link>https://foobar.bozbozo-now.com/api/now/table/core_ +company/ 41be01ee0a0a3c0e59174d9f419c61de</link><value>41be01ee0a0a3c0e59174d9 f419c61de</value></manufacturer><u_inc>false</u_inc><company></company +> <due></due><cfg_auto_management_server></cfg_auto_management_server> <u_proposed>false</u_proposed><u_change>true</u_change> <u_work_notes></u_work_notes><u_store_location></u_store_location> <asset_tag>SCO.ST.105634</asset_tag> <discovery_source>Manual Entry</discovery_source> <u_application_id></u_application_id><assignment_group></assignment_gr +oup> <can_print>false</can_print><u_problem>true</u_problem> <department></department><u_manufacture_date>2013-07-02</u_manufacture +_date> <support_group><link>https://foobar.bozo-now.com/api/now/table/ sys_user_group/4436591d0a0a3c0e76d6e42af5bba6e7</link> <value>4436591d0a0a3c0e76d6e42af5bba6e7</value></support_group> <u_exclusion>false</u_exclusion> <sys_created_on>2010-11-01 21:10:45</sys_created_on> <u_display_name>lnxnmsprag27</u_display_name><u_alias></u_alias> <cost_center></cost_center><short_description>NOMS</short_description> <sys_updated_by>1742882</sys_updated_by> <name>lnxnmsprag27</name><u_environment>PROD</u_environment> <u_function></u_function><due_in></due_in> <u_change_approver>true</u_change_approver><install_date></install_dat +e> <assigned></assigned><u_active_stamp></u_active_stamp> <u_retail_group></u_retail_group><serial_number>USE020N1VV</serial_num +ber> <u_work_notes__asset_></u_work_notes__asset_> <repair_contract_id></repair_contract_id><assigned_to></assigned_to> <mac_address></mac_address><model_number></model_number> <schedule></schedule><u_ci_id>CI10110212</u_ci_id> <returned_from_repair></returned_from_repair> <ng_assignment_flag>ready</ng_assignment_flag> <sys_class_name>cmdb_ci_linux_server</sys_class_name> <u_procurement_product_number></u_procurement_product_number> <attributes></attributes> <u_location_stamp>2015-06-18 14:54:20</u_location_stamp> <fault_count>0</fault_count> <operational_status>1</operational_status> <cfg_auto_change></cfg_auto_change></result></response>

        This puts the entries into a hash and then dumps the hash to show that parsing works. I'll elaborate further in the other subthread.