How to Check Hashes for Missing Items when Keys can be Values and vice versa

ozboomer has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: How to Check Hashes for Missing Items when Keys can be Values and vice versa
by Athanasius (Cardinal) on Jul 26, 2017 at 08:38 UTC

Hello ozboomer,

The code allows me to see the sites used within each "dsk" item... but I also want to see the "dsk" items used at each site. Can I do that with a single hash... or (as I expect) I'll need to maintain at least a couple of hashes?

Yes, unless you change to a different approach (e.g. a database), you’ll need another hash for this. But building it is easy: just add another line to your second foreach loop:

...
my %site_2_dsk;

foreach my $key ( sort keys %data_hash ) {
    my ($site, $dsk) = split /:/, $key;
    push @{ $output_hash{$dsk} }, $site;
    push @{ $site_2_dsk{$site} }, $dsk;
}
...
[download]

BTW, note the use of my above. Why aren’t you useing strict (and warnings)??

Hope that helps,

Athanasius <°(((>< contra mundum Iustus alius egestas vitae, eros Piratica,

[reply]
[d/l]
[select]

Re: How to Check Hashes for Missing Items when Keys can be Values and vice versa
by haukex (Archbishop) on Jul 26, 2017 at 08:35 UTC

Personally I would probably build a hash of hashes, plus its inverse (no problem if the input data isn't too big):

use warnings;
use strict;

my (%sites,%dsks);
while (<DATA>) {
    my ($site, $buf) = split /,/;
    for (split /:/, $buf) {
        for my $dsk (grep {$_!=0} /([0-9]+)!([0-9]+)!$/) {
            $sites{$site}{$dsk}++;
            $dsks{$dsk}{$site}++;
        }
    }
}

for my $dsk (sort {$a<=>$b} keys %dsks) {
    print "Dsk $dsk: ", join(", ",
        sort {$a<=>$b} keys %{ $dsks{$dsk} } ), "\n";
}

for my $site (sort {$a<=>$b} keys %sites) {
    print "Site $site: ", join(", ",
        sort {$a<=>$b} keys %{ $sites{$site} } ), "\n";
}
[download]

As for your question about missing values, that's definitely a case of TIMTOWTDI. See for example the FAQ How do I compute the difference of two arrays? How do I compute the intersection of two arrays? You can also just iterate over the list of expected keys and check their existence in the target hash via exists, but that's the brute force method. There's also a trick I sometimes like to use that involves deleteing a hash slice (again, only if the input data isn't too big, because it's not the most efficient method), here I'll demonstrate by listing the "dsk"s (disks?) that are missing from each site:

my %alldsks = map {$_=>1} keys %dsks;
for my $site (sort {$a<=>$b} keys %sites) {
    my @sitedsks = keys %{ $sites{$site} };
    my %missingdsks = %alldsks;
    delete @missingdsks{@sitedsks};
    print "Site $site MISSING: ", join(", ",
        sort {$a<=>$b} keys %missingdsks), "\n";
}
__END__
Site 377 MISSING: 70, 71, 90, 91, 92, 93, 189, 190, 204, 205, 206, 207
+, 550, 551, 554
Site 512 MISSING: 71, 96, 97, 204, 205, 206, 207, 550, 551, 554
Site 587 MISSING: 70, 71, 90, 91, 92, 93, 96, 97, 189, 190, 204, 205, 
+206, 207
Site 1108 MISSING: 90, 91, 92, 93, 96, 97, 189, 190, 550, 551, 554
[download]

I'd also recommend you play it safer and Use strict and warnings.

[reply]
[d/l]
[select]

Re: How to Check Hashes for Missing Items when Keys can be Values and vice versa
by sn1987a (Curate) on Jul 26, 2017 at 11:21 UTC

Ultimately, I expect to use defined() to see if an element exists or not

In addion to the other, excellent comments:
To determine if a key exists in a hash use exists. The function defined is used to test for definedness (i.e.not undef).

[reply]
[d/l]
[select]

Re: How to Check Hashes for Missing Items when Keys can be Values and vice versa
by ozboomer (Friar) on Jul 26, 2017 at 12:07 UTC

Many thanks, everyone, for the useful responses.

I've had a bit of a go with some of the suggestions... and I have something that does what I need (I think - more testing required, as usual). The updated sample code follows:-

# Ref: http://www.perlmonks.com/?node_id=1196078

use Data::Dumper;

%data_hash   = ();
%output_hash = ();

@master_dsks  = ( 70, 71, 75, 90, 91, 92, 93, 96, 97, 
                  98, 99, 190, 204, 205, 550, 551 );   
                
@master_sites  = ( 350, 377, 510, 512, 580, 587, 
                   590, 1100, 1105, 1107, 1108 );

# ----
                   
printf("All Known Dsks:\n");       # Show ALL the known dsks
foreach (@master_dsks) {
   printf("%s ", $_);
}
printf("\n\n");

# ----
                   
printf("All Known Sites:\n");      # Show ALL the known sites
foreach (@master_sites) {
   printf("%s ", $_);
}
printf("\n\n");

# ----
                   
while( <DATA> ) {                          # Build list of unique (sit
+e:dsk) items
   ($site, $buf) = split(/,/, $_);

   @input_item = split(/:/, $buf);
   foreach $input_field (@input_item) {    # EX: "VAR8=36!206!207!"
      @dsk_list = ($input_field =~ /([0-9]+)!([0-9]+)!$/);  # Get last
+ 2 of 3 items
      
      foreach $dsk (@dsk_list) {           # Each dsk item in the inpu
+t...
         next if ($dsk == 0);              # Skip '0' dsk items
         $key = $site . ":" . $dsk;        # Build composite key
         $data_hash{$key}++;               # ...and save it
      }
   }
}

foreach $key ( sort keys %data_hash ) {    # Build list of dsk -> (mul
+ti sites)
   ($site, $dsk) = split(/:/, $key);
   
   push( @{ $output_hash{$dsk} }, $site ); # ... dsk -> (multi sites)
   push( @{ $site_2_dsk{$site} }, $dsk );  # !!! ADDITION !!! ... site
+ -> (multi dsks)
}

# ----

printf("List of sites for each used dsk:\n");
foreach $dsk (sort {$a <=> $b} keys %output_hash) {  # Show list of si
+tes for each dsk
   printf("Dsk: %d: ... ", $dsk);
   foreach $site (sort {$a <=> $b} @{$output_hash{$dsk}}) {
      printf("  %d ", $site);
   }
   printf("\n");
}
printf("\n");

printf("List of dsks for each used site:\n");
foreach $site (sort {$a <=> $b} keys %site_2_dsk) {  # Show list of ds
+ks for each site
   printf("Site: %d: ... ", $site);
   foreach $dsk (sort {$a <=> $b} @{$site_2_dsk{$site}}) {
      printf("  %d ", $dsk);
   }
   printf("\n");
}
printf("\n");

# ----

my %master_dsks_hash = map { $_ , "" } @master_dsks;   # Hash of ALL d
+sks
delete @master_dsks_hash{keys %output_hash};           # Delete the US
+ED dsks
@unused_dsks = (keys %master_dsks_hash);               # ...leaving th
+e UNUSED dsks

printf("Dsks that are known but unused:\n");
foreach (sort {$a<=>$b} @unused_dsks) {
   printf("%s ", $_);
}
printf("\n\n");

# ----

my %master_sites_hash = map { $_ , "" } @master_sites; # Hash of ALL s
+ites
delete @master_sites_hash{keys %site_2_dsk};           # Delete the US
+ED sites
@unused_sites = (keys %master_sites_hash);             # ...leaving th
+e UNUSED sites

printf("Sites that are known but unused:\n");
foreach (sort {$a<=>$b} @unused_sites) {
   printf("%s ", $_);
}
printf("\n\n");

__DATA__
1108,VAR6=36!204!205!:VAR8=36!206!207!:VAR13=36!70!0!:VAR14=36!70!71!:
+VAR15=36!71!0!
377,VAR12=36!97!96!
512,VAR6=36!90!91!:VAR8=36!92!93!:VAR11=36!0!70!:VAR12=36!189!190!
587,VAR2=36!550!0!:VAR4=36!554!0!:VAR6=36!551!0!
[download]

....and the output:-

All Known Dsks:
70 71 75 90 91 92 93 96 97 98 99 190 204 205 550 551 

All Known Sites:
350 377 510 512 580 587 590 1100 1105 1107 1108 

List of sites for each used dsk:
Dsk: 70: ...   512   1108 
Dsk: 71: ...   1108 
Dsk: 90: ...   512 
Dsk: 91: ...   512 
Dsk: 92: ...   512 
Dsk: 93: ...   512 
Dsk: 96: ...   377 
Dsk: 97: ...   377 
Dsk: 189: ...   512 
Dsk: 190: ...   512 
Dsk: 204: ...   1108 
Dsk: 205: ...   1108 
Dsk: 206: ...   1108 
Dsk: 207: ...   1108 
Dsk: 550: ...   587 
Dsk: 551: ...   587 
Dsk: 554: ...   587 

List of dsks for each used site:
Site: 377: ...   96   97 
Site: 512: ...   70   90   91   92   93   189   190 
Site: 587: ...   550   551   554 
Site: 1108: ...   70   71   204   205   206   207 

Dsks that are known but unused:
75 98 99 

Sites that are known but unused:
350 510 580 590 1100 1105 1107
[download]

BTW.. Not using the 'warnings' and 'strict' pragmas is fair enough comment.. but this is isolated, sample code... so I'm not too fussed about using them in this context.

Similarly, as I've been cutting code since the 1970s or something, I tend to pre-declare constants, variables, etc at the top of a block or module and then I know where to find all the initializations and comments about the identifiers I use in the code... instead of trying to find the 'first instance' (the 'my' declaration) of an identifier's use in some part of a mass of code when debugging/trying to understand some code - 'tis just easier for me.

..and Re: the issue of 'exists' ... trying to understand the perldoc description gives me too much of a headache:-

A hash or array element can be true only if it's defined and defined only if it exists, but the reverse doesn't necessarily hold true.

...but I take the point.

Thanks again for the most useful posts.

[reply]
[d/l]
[select]

Re^2: How to Check Hashes for Missing Items when Keys can be Values and vice versa

by choroba (Cardinal) on Jul 26, 2017 at 12:26 UTC

> I know where to find all the initializations and comments about the identifiers

With good variable names, no comments are needed. And there shouldn't be a block larger than one screen, so you don't have to scroll to find the initialization. See Skimmable Code by schwern.

($q=q:Sq=~/;[c](.)(.)/;chr(-||-|5+lengthSq)`"S|oS2"`map{chr |+ord
}map{substrSq`S_+|`|}3E|-|`7**2-3:)=~y+S|`+$1,++print+eval$q,q,a,
[download]

[reply]
[d/l]

Re^2: How to Check Hashes for Missing Items when Keys can be Values and vice versa

by BillKSmith (Monsignor) on Jul 26, 2017 at 13:18 UTC

Bill

[reply]

Re^2: How to Check Hashes for Missing Items when Keys can be Values and vice versa

by pryrt (Abbot) on Jul 26, 2017 at 13:31 UTC

trying to understand the perldoc description givees me too much of a headache:-

A hash or array element can be true only if it's defined and defined only if it exists, but the reverse doesn't necessarily hold true.

here's a Venn(ish) diagram in beautiful ASCII art that may or may not help, with examples on the side

universe of possible hash elements in Perl
+--------------------------------------+
| elements that exist                  |            $hash{element_exis
+ts};                  # this example: exists, undefined, false
|  +--------------------------------+  |
|  | elements that are defined      |  |            $hash{element_defi
+ned} = function_def();# this example: exists, defined, unknown false/
+true
|  |  +--------------------------+  |  |
|  |  |  elements that are true  |  |  |            $hash{element_true
+} = 1;                # this example: exists, defined, true
|  |  +--------------------------+  |  |
|  |                                |  |
|  |  +--------------------------+  |  |
|  |  : elements that are false  :  |  |            $hash{element_fals
+e} = function_false();# this example: exists, defined, false
|  |  :  +-------------------+   :  |  |
|  |  :  | false but defined |   :  |  |            $hash{element_fals
+e_defined} = 0;       # this example: exists, defined, false
|  |  :  +-------------------+   :  |  |
|  +--:--------------------------:--+  |
|     :                          :     |
|  +--:--------------------------:--+  |
|  |  :  +-------------------+   :  |  |
|  |  :  | false but undef   |   :  |  |            $hash{element_fals
+e_undefined} = undef; # this example: exists, undefined, false
|  |  :  +-------------------+   :  |  |
|  |  +--------------------------+  |  |
|  | elements that are undefined    |  |            $hash{element_unde
+fined};               # this example: exists, undefined, false
|  +--------------------------------+  |
+--------------------------------------+
[download]

Note that the ASCII art combined with wanting space for labels sometimes implies there is room in the Perl universe for combinations that aren't actually possible: for example, there are no elements that are undefined but not false, because perl coerces undefined to false.

[reply]
[d/l]

Re: How to Check Hashes for Missing Items when Keys can be Values and vice versa
by thanos1983 (Parson) on Jul 26, 2017 at 13:52 UTC

Hello ozboomer,

This is not a big improvement but just in case you are interested you can replace the foreach loops with while loops. See sample of code bellow:

#!/usr/bin/perl
use strict;
use warnings;
# use Benchmark qw(:all) ; # WindowsOS
use Benchmark::Forking qw( timethese cmpthese ); # UnixOS

my @preserved = @ARGV;

sub while_test {
    my %data_hash   = ();
    my %output_hash = ();

    @ARGV = @preserved; # restore original @ARGV

    while (<>) {                          # Build list of unique (site
+:dsk) items
    my ($site, $buf) = split(/,/);

    my @input_item = split(/:/, $buf);
    while ( defined ( my $input_field = shift @input_item ) ) { # EX: 
+"VAR8=36!206!207!"
        my @dsk_list = ($input_field =~ /([0-9]+)!([0-9]+)!$/);  # Get
+ last 2 of 3 items

        while ( defined ( my $dsk = shift @dsk_list ) ) { # Each dsk i
+tem in the input...
        next if ($dsk == 0);              # Skip '0' dsk items
        my $key = $site . ":" . $dsk;        # Build composite key
        $data_hash{$key}++;               # ...and save it
        }
    }
    }

    my @sort_data_keys = sort keys %data_hash;
    while ( defined ( my $key = shift (@sort_data_keys) ) ) { # Build 
+list of dsk -> (multi sites)
    my ($site, $dsk) = split(/:/, $key);
    push( @{$output_hash{$dsk} }, $site );
    }

    my @sort_output_hash = sort {$a <=> $b} keys %output_hash;
    while ( defined ( my $dsk = shift (@sort_output_hash) ) ) { # Show
+ list of sites for each dsk
    # printf("Dsk: %d:\n", $dsk);
    foreach my $site (sort {$a <=> $b} @{$output_hash{$dsk}}) {
        # printf("  %d\n", $site);
    }
    # printf("\n");
    }
}

sub foreach_test {
    my %data_hash   = ();
    my %output_hash = ();

    @ARGV = @preserved; # restore original @ARGV

    while(<>) {                          # Build list of unique (site:
+dsk) items
    my ($site, $buf) = split(/,/);

    my @input_item = split(/:/, $buf);
    foreach my $input_field (@input_item) {    # EX: "VAR8=36!206!207!
+"
        my @dsk_list = ($input_field =~ /([0-9]+)!([0-9]+)!$/);  # Get
+ last 2 of 3 items

        foreach my $dsk (@dsk_list) {           # Each dsk item in the
+ input...
        next if ($dsk == 0);              # Skip '0' dsk items
        my $key = $site . ":" . $dsk;        # Build composite key
        $data_hash{$key}++;               # ...and save it
        }
    }
    }

    foreach my $key ( sort keys %data_hash ) {    # Build list of dsk 
+-> (multi sites)
    my ($site, $dsk) = split(/:/, $key);
    push( @{$output_hash{$dsk} }, $site );
    }

    foreach my $dsk (sort {$a <=> $b} keys %output_hash) {  # Show lis
+t of sites for each dsk
    # printf("Dsk: %d:\n", $dsk);
    foreach my $site (sort {$a <=> $b} @{$output_hash{$dsk}}) {
        # printf("  %d\n", $site);
    }
    # printf("\n");
    }
}

my $results = timethese(1000000, { While => \&while_test,
                   ForEach => \&foreach_test, }, 'none');
cmpthese( $results );

__END__

$ perl test.pl in.txt
           Rate   While ForEach
While   14286/s      --    -10%
ForEach 15898/s     11%      --
[download]

Keep in mind that all the arrays that we use in the while loops are destroyed because of shift. In case that you do not need to use the arrays again try this it should give a small boost.

Hope this helps, BR.

Seeking for Perl wisdom...on the process of learning...not there...yet!

[reply]
[d/l]
[select]

Re: How to Check Hashes for Missing Items when Keys can be Values and vice versa
by ozboomer (Friar) on Jul 31, 2017 at 00:52 UTC

For what it's worth.. and I don't know if it's "too clever for my own good", here's something I've built to assist in the creation of the '2-way' structures.

...and all I'll need to do as time goes on is add to the conditional where there's a '@B_list = ' in the code...

Just for the curious :) ...

# --------------------------------------------------------------------
+--------------
#  Build_Joined_Hashes - Build hashes to assist '2-way' queries
#  
#  Description:
#    Create hashes where B -> (A1, A2, ...) and A -> (B1, B2, ...)
#    That is: 
#       1.  given B, get a list of A's that refer to B
#       2.  given A, get a list of B's that refer to A
#
#  Uses Globals:
#
#  Notes:
#    - VP DSK hash: 2324(LX INT) -> VAR2=36!550!0!:VAR4=36!554!0!:VAR6
+=36!551!0!
#    - TC DSK hash: 9705(LX TC) -> 11,13-JAN-2014:13-JAN-2014
# --------------------------------------------------------------------
+--------------

sub Build_Joined_Hashes
{
   my ($input_hash_ref, $type, $A_B_hash_ref, $B_A_hash_ref) = @_;

   my ($buf, $A_item, $input_field, $B_item, $key);
   
   my (@input_item, @B_list);
   
   my (%tmp_hash);
   
   
   %tmp_hash      = ();  # hash: A:B -> (count)

   %$A_B_hash_ref = ();  # List of Bs that are used in As
   %$B_A_hash_ref = ();  # List of As that are used in Bs
   
   foreach $A_item (keys %$input_hash_ref) {           # For each 'A' 
+item...
      $buf = $$input_hash_ref{$A_item};                # ..get 'B usag
+e list' record
      @input_item = split(/:/, $buf);                  # Get each 'B u
+sage' item
      
      foreach $input_field (@input_item) {             # For each 'B u
+sage' item...
      
         if ($type eq "VAR") {                         # Get list of '
+B' instances...
            @B_list = ($input_field =~                 # ... when VAR.
+..
                          /([0-9]+)!([0-9]+)!$/); 
            
         } elsif ($type eq "TC") {
            @B_list = ($input_field =~                 # ... when TC..
+.
                          /^([0-9]+),/);
         
         }         
      
         foreach $B_item (@B_list) {                   # Each 'B' inst
+ance...
            next if ($B_item == 0);                    # Skip '0' item
+s
            $key = $A_item . ":" . $B_item;            # Make composit
+e key: A:B -> (count)
            $tmp_hash{$key}++;   
         }
      }
   }
   
   foreach $key ( sort keys %tmp_hash ) {              # For every 'B 
+usage in A' instance...
      ($A_item, $B_item) = split(/:/, $key);
   
      push( @{ $$A_B_hash_ref{$B_item} }, $A_item );   # Build list of
+ B -> (A1, A2, ...)
      push( @{ $$B_A_hash_ref{$A_item} }, $B_item );   # ...       and
+ A -> (B1, B2, ...)
   }

   return;

} # end Build_Joined_Hashes
[download]

Perhaps it would be a better approach(?) to simply do things using DBD::CSV and treat everything as a database(!) and use SQL or sumfin'...

[reply]
[d/l]