vlturner has asked for the wisdom of the Perl Monks concerning the following question:

Since everyone seems to have missed the question asked and moved on, I will aske the same question differently. poj is on the right track. So he sent this.

my %application = ( 'CI10335478' => { name => 'app_name1', IP => [ 'MemberIP 1', 'MemberIP 2', 'MemberIP 6', ], }, 'CI10334984' => { name => 'app_name2', IP => [ 'MemberIP 4', 'MemberIP 2', ], }, );

I have 250K records in XML that are parsing correctly. So now I need to get them into the hash. $data$name_pos contains the application name on application CI's. $data$application_pos contains the CI of the application we need to look up $data$name_pos on device CI's So $application_pos and $data$ci_pos should be equal. I have to process > 5K of records to start getting intersections of the data, and I don't want to put that here, and its all in XML. So my idea was to write my data to the hash or array when I come across them. So I don't have a blob to just push into the structure. So I start with this and use our since there are lots of subcomponents writing to this hash. One writes the CI and the name Then another writes the members. I am getting all sorts of warnings using the examples that were sent, mostly related to "experimental"

our %arx_data = (); Need code to build HOH here... Something like this. if ( $data[$class_pos] eq "cmdb_ci_appl" ) { my @ci_record = { description => $data[$ci_pos], name => +$data[$name_pos] }; push @{arx_data{ $data[$application_pos] }}, @ci_record; print Dumper( %arx_data ) ; } if ( $data[$application_pos] ne "" && $data[$ip_address_pos] n +e "" ) { my @member = { description => $data[$ci_pos], name => $da +ta[$name_pos] }; $arx_data{ $data[$application_pos] }{ 'members' } = $data[ +$ip_address_pos]; print Dumper( %arx_data ) ;

Since I have been criticized for no posting the code, here goes...

#!/usr/bin/perl use strict; use warnings; use 5.014; use MIME::Base64; use Data::Dumper; use utf8; use Text::Unidecode; use XML::Twig; use REST::Client; use Data::Validate::Domain qw(is_hostname); use Socket; use Net::DNS; use Net::MAC; # Just in case we want to request JSON instead of XML from CMDB #use JSON; Just in case we want to request JSON instead of XML from CM +DB my $scriptversion = "1.01 (Vance Turner, Staples Inc., 1/20/2016)"; our $header_count = 0; our $fqdn_pos = 0; our $ip_address_pos = 0; our $dns_domain_pos = 0; our $name_pos = 0; our $mac_address_pos = 0; our $application_pos = 0; our $ci_pos = 0; our $class_pos = 0; our @arx_data = (); our $arx_data_ref = \@arx_data; # declare our command line options variables. our $host = 'https://staplessb.service-now.com'; our $wanthelp = 0; our $record_limit = 10; our $class = ''; our $user = ''; our $pwd = ''; our $outputfile = ''; our $forcefile = 0; our $verbosity = 1; sub usage { my $err = shift and select STDERR; print <<EOU; usage: perl $0 [-r <max_records>] [-c <class>] [-q <quot>] [-w <width> +] [-d <dtfmt>] [-F] [-f] [-D <cols>] [-u] [-v <level>] [-o <outputfilename>] -t <host_url> use <host_url> for the CMDB connection to op +en. (https://staplessb.service-now.com) -r <record_limit> use <record_limit> as the maximum nomber of +CMDB records to retrieve, default = '100' -c <class> use <class> as the CMDB class records you wa +nt to retrieve. -u <username> CMDB username to use -p <password> CMDB password to use -v <level> verbosity, default = $verbosity, maximum = 4 +, written to stderr. -o <outputfilename> write output to file named <outputfilename>, + defaults to stdout if no outputfile is specified -f force usage of <outputfilename> if it alread +y exists (unlink before use) Examples: perl $0 -u user -p password -r max_records -o outputfile.csv (cre +ates outputfile.csv) perl $0 -u user -p password -v 4 -r max_records outputfile.csv (cre +ates outputfile.csv, shows progress msgs) EOU exit $err; } # usage use Getopt::Long qw(:config bundling nopermute passthrough); GetOptions ( "help|h|?" => \$wanthelp , "t=s" => \$host, "r=i" => \$record_limit, "c=s" => \$class, "u=s" => \$user, "p=s" => \$pwd, "o=s" => \$outputfile, "f" => \$forcefile, "v:1" => \$verbosity, ); if ($wanthelp) { usage(1); } if ($verbosity > 1) { print STDERR "\$0 script version " . $scriptversion . "\n\n"; } if ($verbosity > 1) { print STDERR "Output file is " . $outputfile . "\n"; } -s $outputfile && $forcefile and unlink $outputfile; if (-s $outputfile) { print STDERR "File '$outputfile' already exists. Overwrite? [y/N] +> N\b"; scalar <STDIN> =~ m/^[yj](es|a)?$/i or exit; } my $client = REST::Client->new(host => $host); my $encoded_auth = encode_base64("$user:$pwd", ''); $client->GET("/api/now/table/cmdb_ci?sysparm_limit=$record_limit", {'Authorization' => "Basic $encoded_auth", 'Accept' => 'application/xml'}); my $input_xml = $client->responseContent(); my $field= $ARGV[0] || 'u_ci_id'; my $twig = 'XML::Twig'->new; $twig->xparse($input_xml); my $root = $twig->root; my @records = $root->children; my @sorted = sort { $b->first_child( $field)->text cmp $a->first_child( $field)->text } @records; if ($outputfile ne "") { open my $outputfile_fh, '>:encoding(utf8)', $outputfile or die "Ca +nnot open: $outputfile: $!"; close($outputfile_fh) || warn "close failed: $!"; } my $header_printed = 0; #for my $record (@sorted) { for my $record (@records) { my @info_tags = $record->children; my @data; for my $info_tag (@info_tags) { my $extract = $header_printed ? 'text' : 'name'; if ( $header_printed != 1) { if ( $info_tag->$extract eq 'fqdn' ) { $fqdn_pos = $header_count; } if ( $info_tag->$extract eq 'ip_address' ) { $ip_address_pos = $header_count; } if ( $info_tag->$extract eq 'dns_domain' ) { $dns_domain_pos = $header_count; } if ( $info_tag->$extract eq 'name' ) { $name_pos = $header_count; } if ( $info_tag->$extract eq 'mac_address' ) { $mac_address_pos = $header_count; } if ( $info_tag->$extract eq 'u_application_id' ) { $application_pos = $header_count; } if ( $info_tag->$extract eq 'u_ci_id' ) { $ci_pos = $header_count; } if ( $info_tag->$extract eq 'sys_class_name' ) { $class_pos = $header_count; } } my $work = $info_tag->$extract; $work =~ s/\r|\n//g; # Cleanup Carraige Returns $work =~ s/, / /g; # Cleanup Comma Space $work =~ s/,/ /g; # Cleanup Comma $work =~ s/"//g; # Cleanup Parentheses $work =~ s/^\s+|\s+$//g; # Trim spaces $work =~ s/([^[:ascii:]]+)/unidecode($1)/ge; push @data, $work; $header_count++; } if ( $header_printed == 1) { # if ( $data[$ip_address_pos] eq "" && $data[$name_pos] ne "" & +& is_hostname($data[$name_pos])) { # my $address = inet_aton($data[$name_pos]) || "Error: Can' +t resolve."; # if ( $address ne "" && $address ne "Error: Can't resolve. +" ) { # $address = inet_ntoa($address); # splice @data, ($ip_address_pos), 1, $address; # my $cmdb_name = $data[$name_pos]; # my $cmdb_lc_name = lc $cmdb_name; # splice @data, ($name_pos), 1, $cmdb_lc_name; # } # } # if ( $data[$fqdn_pos] eq "" && $data[$ip_address_pos] ne "" ) + { # my $ip_addr = inet_aton($data[$ip_address_pos]) || "Erro +r: Can't resolve."; # if ( $ip_addr ne "" && $ip_addr ne "Error: Can't resolve. +" ) { # my $hostname = gethostbyaddr($ip_addr, AF_INET ) || +"Error: Can't resolve."; # if ( $hostname ne "" && $hostname ne "Error: Can't resolv +e." ) { # my $lc_hostname = lc $hostname; # splice @data, ($fqdn_pos), 1, $lc_hostname; # my $cmdb_name = $data[$name_pos]; # my $cmdb_lc_name = lc $cmdb_name; # splice @data, ($name_pos), 1, $cmdb_lc_name; # } # } # } # if ( $data[$mac_address_pos] ne "" ) { # my $mac = uc $data[$mac_address_pos]; # $mac =~ s/\'//g; # $mac =~ s/\-//g; # $mac =~ s/\://g; # $mac =~ s/\"//g; # splice @data, ($mac_address_pos), 1, $mac; # } # if ( $data[$dns_domain_pos] eq "" && $data[$fqdn_pos] ne "" ) + { # my $dns_domain = $data[$fqdn_pos]; # $dns_domain =~ s/.*?\.//; # splice @data, ($dns_domain_pos), 1, $dns_domain; # } if ( $data[$class_pos] eq "cmdb_ci_appl" ) { my @ci_record = { description => $data[$ci_pos], name => +$data[$name_pos] }; push(@$arx_data_ref, @ci_record); print Dumper( @$arx_data_ref ) ; } if ( $data[$application_pos] ne "" && $data[$ip_address_pos] n +e "" ) { my @member = { member => $data[$ip_address_pos] }; push(@$arx_data_ref, @member); print Dumper( @$arx_data_ref ) ; } } if ($outputfile ne "") { open my $outputfile_fh, '>>:encoding(utf8)', $outputfile or di +e "Cannot open: $outputfile: $!"; say $outputfile_fh join ',', map qq("$_"), @data; close($outputfile_fh) || warn "close failed: $!"; } else { say join ',', map qq("$_"), @data; } $header_printed = 1; $header_count = 0; }

Here is the XML mising the data. Also, the order spewed from the CMDB is not garenteed to be in the sam order each time for some reason.

<?xml version="1.0" encoding="UTF-8"?> <response> <result> <checked_in></checked_in> <change_request></change_request> <po_number></po_number> <correlation_id></correlation_id> <cfg_auto_provider></cfg_auto_provider> <supported_by></supported_by> <u_service_catalog></u_service_catalog> <first_discovered></first_discovered> <u_oem_part_numbe></u_oem_part_numbe> <sent_for_repair></sent_for_repair> <owned_by></owned_by> <gl_account></gl_account> <managed_by></managed_by> <asset></asset> <u_project_id></u_project_id> <maintenance_schedule></maintenance_schedule> <u_chg></u_chg> <category></category> <delivery_date></delivery_date> <install_status></install_status> <u_maintenance_window></u_maintenance_window> <virtual></virtual> <u_sub_class></u_sub_class> <dns_domain></dns_domain> <u_incident></u_incident> <u_parts_software_needed></u_parts_software_needed> <change_control></change_control> <checked_out></checked_out> <u_decommission_stamp></u_decommission_stamp> <purchase_date></purchase_date> <order_date></order_date> <u_template></u_template> <skip_sync></skip_sync> <lease_id></lease_id> <u_sevice_options></u_sevice_options> <vendor></vendor> <sys_id></sys_id> <u_maint_contract_sla></u_maint_contract_sla> <sys_created_by></sys_created_by> <u_sla_support_tie></u_sla_support_tie> <subcategory></subcategory> <start_date></start_date> <comments></comments> <location> <link></link> <value></value> </location> <unverified></unverified> <justification></justification> <u_disposal_date></u_disposal_date> <sys_tags></sys_tags> <sys_domain> <link></link> <value></value> </sys_domain> <sys_mod_count></sys_mod_count> <cost_cc></cost_cc> <u_total_purchase_cost></u_total_purchase_cost> <monitor></monitor> <sys_updated_on></sys_updated_on> <warranty_expiration></warranty_expiration> <invoice_number></invoice_number> <fqdn></fqdn> <cost></cost> <u_retail_grp></u_retail_grp> <u_server_type></u_server_type> <ip_address></ip_address> <u_manufacture_date_dummy></u_manufacture_date_dummy> <last_discovered></last_discovered> <model_id> <link></link> <value></value> </model_id> <manufacturer></manufacturer> <u_inc>false</u_inc> <company></company> <due></due> <cfg_auto_management_server></cfg_auto_management_server> <u_proposed></u_proposed> <u_change></u_change> <u_work_notes></u_work_notes> <u_store_location></u_store_location> <asset_tag></asset_tag> <discovery_source></discovery_source> <u_application_id></u_application_id> <assignment_group></assignment_group> <can_print></can_print> <u_problem></u_problem> <department></department> <u_manufacture_date></u_manufacture_date> <support_group> <link></link> <value></value> </support_group> <u_exclusion></u_exclusion> <sys_created_on></sys_created_on> <u_display_name></u_display_name> <u_alias></u_alias> <cost_center></cost_center> <short_description></short_description> <sys_updated_by></sys_updated_by> <name></name> <u_environment></u_environment> <u_function></u_function> <due_in></due_in> <u_change_approver></u_change_approver> <install_date></install_date> <assigned></assigned> <u_active_stamp></u_active_stamp> <u_retail_group></u_retail_group> <serial_number></serial_number> <u_work_notes__asset_></u_work_notes__asset_> <repair_contract_id></repair_contract_id> <assigned_to></assigned_to> <mac_address></mac_address> <model_number></model_number> <schedule></schedule> <u_ci_id></u_ci_id> <returned_from_repair></returned_from_repair> <ng_assignment_flag></ng_assignment_flag> <u_procurement_product_number></u_procurement_product_number> <sys_class_name></sys_class_name> <attributes></attributes> <u_location_stamp></u_location_stamp> <fault_count></fault_count> <cfg_auto_change></cfg_auto_change> </result> </response>';

Replies are listed 'Best First'.
Re: Need help with complex hash of hashes.
by jcb (Parson) on Jan 27, 2016 at 00:22 UTC

    The XML parsing in your code seems ... odd. Can you provide a sample of the XML input? Is it really XML-encoded CSV or does the input actually have structure? If I read the code correctly, your XML parsing is implicitly assuming that all of the XML fields will be in the same order in every record.

    Maybe something like:

    my @items; for my $record (@records) { my %item = (); for my $tag ($record->children) { $item{$tag->name} = $tag->text } push @items, {%item}; }

    This gets rid of all of the $*_pos variables and directly converts XML to an array of hashes for further processing.

    The series of regex substitutions can go in their own sub cleanup. Then change the relevant line to $item{cleanup $tag->name} = cleanup $tag->text assuming that cleanup is declared with the appropriate prototype.

    sub cleanup ($) { my $work = shift; $work =~ s/\r|\n//g; # Cleanup Carraige Returns $work =~ s/, / /g; # Cleanup Comma Space $work =~ s/,/ /g; # Cleanup Comma $work =~ s/"//g; # Cleanup Parentheses $work =~ s/^\s+|\s+$//g; # Trim spaces $work =~ s/([^[:ascii:]]+)/unidecode($1)/ge; return $work; }

    This will create some differences, for example, $data[$class_pos] is now $items[$n]{sys_class_name} after the XML parsing loop, where $n is an index into the @items array.

      I need the positions. The reason is I don't know anything about the data, and need to keep track fo things by column header. Yes, I will fix the cleanup, it was quick an dirty to just get rid of crap in the CMDB that shouldn't be there to begin with.

        I need the positions

        I don't understand this. With XML the order of the tags shouldn't matter. Anyway, here is my best guess at what you are trying to do, create a CSV of all the data and a hash structure of some of it.

        #!/usr/bin/perl use strict; use warnings; use XML::Twig; use Text::CSV; use Data::Dump 'pp'; my @header = (); my @csv_rows = (); my $order_by = 'u_ci_id'; my %seen = (); my %arx_data = (); # parse xml in chunks my $twig = 'XML::Twig'->new( twig_handlers => { result => \&record } ); my $input_xml = get_xml(); # change as required $twig->parse($input_xml); # data dump pp \%arx_data; #pp \@csv_rows; #pp \@header; # sort and dump csv my @sorted = sort { $b->[0] cmp $a->[0] } @csv_rows; unshift @sorted,[@header]; # add header my $outfile = 'xmldump.csv'; my $csv = Text::CSV->new ( { binary => 1 } ) or die "Cannot use CSV: ".Text::CSV->error_diag (); $csv->eol("\n"); if ($outfile){ open my $fh, ">:encoding(utf8)", $outfile or die "$outfile: $!"; for (@sorted){ shift @$_; # remove sort field $csv->print ($fh, $_); } close $fh or die "new.csv: $!"; } else { for (@sorted){ shift @$_; $csv->print (\*STDOUT, $_); } } # parse 1 record sub record { my ($e,$t) = @_; # csv records unless (@header){ my @info_tags = $t->children; @header = map{ $_->name } @info_tags; unshift @header,$order_by; # add sort field } # extract data in same order as header row my @data = map{ $t->field($_) } @header; push @csv_rows,clean(@data); #returns array ref # build hash structure my $app = $t->field('u_application_id'); my $u_ci = $t->field('u_ci_id'); my $class = $t->field('sys_class_name'); my $name = $t->field('name'); my $ip = $t->field('ip_address'); if ( $class eq 'cmdb_ci_appl'){ $arx_data{$u_ci}{'name'} = $name; } if ( $class eq 'cmdb_ci_svr' ){ next if $seen{$app}{$ip}++; push @{$arx_data{$app}{'members'}},$ip; } } sub clean { my @f = @_; for (@f){ s/\r|\n//g; # Cleanup Carraige Returns s/, / /g; # Cleanup Comma Space s/,/ /g; # Cleanup Comma s/"//g; # Cleanup Parentheses s/^\s+|\s+$//g; # Trim spaces s/([^[:ascii:]]+)/unidecode($1)/ge; } return \@f; }
        poj

        Does the first row returned contain a set of empty tags and subsequent rows contain the same tags with data? Something like:

        <result> <checked_in></checked_in> <change_request></change_request> <po_number></po_number> </result> <result> <checked_in>yesterday</checked_in> <change_request>AQX123</change_request> <po_number>42</po_number> </result>

        Until, of course, you get to:

        <result> <po_number>42</po_number> <checked_in>1 week ago</checked_in> <change_request>AQX120</change_request> </result>

        And your parsing by position produces bogus results.

        Or is it really XML-encoded CSV? More like:

        <result> <checked_in></checked_in> <change_request></change_request> <po_number></po_number> </result> <result> <bogon>yesterday</bogon> <bogon>AQX123</bogon> <bogon>42</bogon> </result>

        In the former case, you don't need the positions after all, and the fact that they change from run to run should indicate that they are not significant. In the latter case, your CMDB belongs on the Daily WTF. Your code mentions an option for JSON output. Perhaps that module is less broken?

Re: Need help with complex hash of hashes.
by Anonymous Monk on Jan 27, 2016 at 00:00 UTC

    :) That is a great first draft, you've got code, some warnings ... now narrow down the question until you have 20 lines of code

      This is the section I need to properly build out the HoH correctly. You can see it isn't working since VAR50 and VAR52 have the same description. It would be better if the CI was the key too.

      if ( $data[$class_pos] eq "cmdb_ci_appl" ) { my @ci_record = { description => $data[$ci_pos], name => +$data[$name_pos] }; push(@$arx_data_ref, @ci_record); print Dumper( @$arx_data_ref ) ; } if ( $data[$application_pos] ne "" && $data[$ip_address_pos] n +e "" ) { my @member = { description => $data[$application_pos], mem +ber => $data[$ip_address_pos] }; push(@$arx_data_ref, @member); print Dumper( @$arx_data_ref ) ; } $VAR49 = { 'name' => 'WINS', 'description' => 'CI10183028' }; $VAR50 = { 'description' => 'CI10199533', 'member' => '10.71.193.110' }; $VAR51 = { 'member' => '10.100.16.214', 'description' => 'CI10208139' }; $VAR52 = { 'member' => '10.100.32.156', 'description' => 'CI10199533' };

      See this message for your 20 lines, that was asked first. http://www.perlmonks.org/?node_id=1153574