Print hash keys and lookup the keys for values in another filr

Magnolia25 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Print hash keys and lookup the keys for values in another filr by haukex (Archbishop) on Feb 01, 2017 at 15:55 UTC
Hi Magnolia25, Note that your code contains some errors that prevent it from compiling. I'm going to assume you copied the code you showed out of a larger program and that you've actually got `use warnings; use strict;` and the missing variable declarations. In the future, please try to provide code that compiles (SSCCE). First, in `last if !line;`, you're missing the $ sigil from `$line`. Second, note how you've got a comma in `(split(/;/, $line)),[1,0]`. This is creating several values: the return values of split, plus the anonymous array `[1,0]`. What you probably meant is `(split...)[1,0]`, which returns a list consisting of the second return value of split followed by the first return value of the split. If you fix these issues, you've still got a conceptual problem. The way I understand it, you're trying to store a hash with multiple values per key. One way to do this is with a "hash of arrays" or "hash of hashes", which are described with lots of example code in perldsc. Because this sounds very much like a homework assignment, I'm reluctant to do too much of your work for you :-) However, if you look at perlreftut and perldsc, I hope it will become more clear how you can accomplish your tasks: Read the file line-by-line, as you already are. split the line into its columns. Store the values in your hash of hashes, such as `$hash{$col[1]}{$col[0]}=1;`, or into a hash of arrays via `push @{ $hash{$col[1]} }, $col[0];` (see push). For parsing files like this, it really would be best to use a module like Text::CSV. Your "Step 2" sounds very much like "Step 1", except with different columns. If you also need to filter the "Step 2" input file, I would suggest that you store the first column from your "Step 1" input file in a hash, and when you're parsing your "Step 2" input file, seeing if that value exists in the hash you created. Hope this helps, -- Hauke D	[reply] [d/l] [select]
Re^2: Print hash keys and lookup the keys for values in another filr by Lotus1 (Vicar) on Feb 01, 2017 at 16:45 UTC
I didn't comment on this section before since the program seems to have had very little effort put into it. The `last if !line;` part will never do anything even if corrected since $line is only declared and assigned a value if the regex matches. `my $line = $_ if /\bWood\b/; chomp ($line); last if !line;` [download] Perhaps something like `last if /^\s*$/` placed ahead of the `my $line = $_ ...` would work.	[reply] [d/l] [select]
Re^3: Print hash keys and lookup ... by haukex (Archbishop) on Feb 02, 2017 at 12:43 UTC
Hi Lotus1, The `last if !line;` part will never do anything even if corrected Turns out I missed it yesterday - the line does do something: `$ perl -wMstrict -n my $line = $_ if /\bWood\b/; chomp ($line); print "BANG!\n" if !$line; __END__ Hello, Wood World Foobar` [download] Outputs: `Use of uninitialized value $line in scalar chomp at - line 2, <> line +2. BANG!` [download] The `... if !$line` gets triggerd when `$line` is false, which is a case that can happen. There's another thing I missed yesterday: `my $line = $_ if /\bWood\b/;`, to which perlsyn has to say this: NOTE: The behaviour of a `my`, `state`, or `our` modified with a statement modifier conditional or loop construct (for example, `my $x if ...`) is undefined. The value of the `my` variable may be `undef`, any previously assigned value, or possibly anything else. Don't rely on it. Future versions of perl might do something different from the version of perl you try it out on. Here be dragons. So really, that entire loop body needs a rewrite, maybe: `while (<$fh>) { chomp; next unless /\bWood\b/; ... }` [download] Regards, -- Hauke D	[reply] [d/l] [select]
Re^4: Print hash keys and lookup ... by Lotus1 (Vicar) on Feb 02, 2017 at 14:42 UTC
Re^3: Print hash keys and lookup ... (updated) by haukex (Archbishop) on Feb 01, 2017 at 17:00 UTC
Hi Lotus1, The `last if !line;` part will never do anything even if corrected You're right of course, I was at that moment more focused on getting the code compiling, and there was lots more to explain, so I skipped the line ~~that doesn't do anything~~ :-) Update: Oops, the line does do something, see my other reply. Thanks, -- Hauke D	[reply] [d/l]
Re^2: Print hash keys and lookup the keys for values in another filr by Magnolia25 (Sexton) on Feb 02, 2017 at 04:53 UTC
in last if !line;, you're missing the $ sigil from $line. Sorry that was an typo error	[reply]
Re: Print hash keys and lookup the keys for values in another filr by Lotus1 (Vicar) on Feb 01, 2017 at 15:42 UTC
From the first file you are trying to find a unique list of the counties that match the word 'Wood'. That means the counties need to be the keys and the values can be an empty anoynymous array. For the second file you only need to iterate through the lines, split each line and lookup the third column in the hash. If the key `exists` then push the first column into an array at the value for that key in your hash. After reading through file 2 you can make a simple function to print out the values. This is a great chance for you to learn how to use a hash of arrays. Let us know what questions you have as you work through these steps.	[reply] [d/l]
Re: Print hash keys and lookup the keys for values in another file -- oneliner and more explained by Discipulus (Canon) on Feb 02, 2017 at 10:07 UTC
dear Magnolia25, Make sure to follow the wiser advice you already received. Be sure to understand all haukex said and read the docs linked by him. Pay attention to the final datastructure you want to fill: under the `Wood` key you are talking about an HashOfHashes like `$hoh{Wood}{CLAYCOUNTY}` values `119736, 448094,..` I think, as haukex already suggested, an HashOfArrays in the form: `$hoh{Wood}{CLAYCOUNTY}[119736, 448094,..]` is more appropriate. Infact when you arrange your data into the most appropriate datastructure, everything runs smoother. In principle the basis of all programming is a plain description of the solution. This is also in the signature of one of our estimated brother (the name is missing from my memory at the moment). Your plain solution or pseudocode, looks like: parse file-1 for every line: remove newline split it using ';' filling an array check if the element 1 (array start from 0) of such array is 'Wood' if yes create a sub key using element zero of the above array and as +sign as value an anonymous, empty array [] close file-1 parse file-2 for every line: remove newline split it using ';' filling an array if the element 2 is an already present key in the hash push the element 0 into the values of such key (we have defined it + as an anonymous array [] above) close file-2 print out the datastructure in the way you like [download] This is the basic; you can make it complex at your will. Then YOU must translate into Perl code: this heavily depend on the level you are. Go slowly translating into Perl using well known best practices and idioms. Ask only for parts you do not understand how to do the translation or that give you back unexpected results. For your information (but eminently for my own amusement!) you must be aware that Perl is powerfull enought to get that job done in few keystroke. `perlrun` describes switches used below, but the below code is tricky and use some advanced tecniques like the `BEGIN` and `END` block, the ternary operator ' ? : ' and the `eof` trick (setting `$f=0` where it means `$first` ) to parse the second file in a different way from first one... If you want to learn more you are welcome! Given `file-1.txt` and `file-2.txt` the two files you describe, the following oneliner do the trick: # pay attention to windows doublequote around the oneliner, on Linux i +s perl -e '...' not perl -e ".." perl -MData::Dumper -F";" -lane "BEGIN{$f=1};$f?$hoh{$F[1]}{$F[0]}=[] +:push @{$hoh{Wood}{$F[2]}},$F[0];$f=0 if eof; END {print Dumper \%hoh}" file-1.tx +t file-2.txt $VAR1 = { 'Wood' => { 'NASSAUCOUNTY' => [ '920232', '727659', '471817', '983043', '578286' ], 'SUWANNEECOUNTY' => [ '640802', '403866', '828788', '751490', '972562', '367541', '481360' ], 'CLAYCOUNTY' => [ '119736', '448094', '206893', '333743', '172534', '785275', '995932', '223488', '433512' ] } }; [download] In the case of such cryptic oneliner you can enjoy the ability of the core module B::Deparse to help you understanding: as for the synopsis of such module you just need to call the oneliner prepending `perl -MO=Deparse` to see it a bit more readable: # spacing and comments added perl -MO=Deparse -MData::Dumper -F";" -lane "BEGIN{$f=1};$f?$hoh{$F[1 +]}{$F[0]}=[]:push @{$hoh{Wood}{$F[2]}},$F[0]; $f=0 if eof; END{print Dumper \%hoh}" + file-1.txt file-2.txt # B::Deparse translate the above into the below: # the following block is added by the -l switch # automatically handling newlines BEGIN { $/ = "\n"; $\ = "\n"; } use Data::Dumper; # this while block is added by th -n switch (see also -p for completne +ss) LINE: while (defined($_ = <ARGV>)) { chomp $_; # the special @F array (see perlvar) is called into play by the -a + (autosplit) switch # (UPDATE) the -F";" switch states that we are going to automatica +lly split using ';' instead of the default (space) our(@F) = split(/;/, $_, 0); # this is our BEGIN block esplicitally put in pur oneliner setting + $f = 1 before doing anything # please note the BEGIN blocks are executed as they are seen by th +e compiler, so even if put inside the while # it is executed only once, as it is seen sub BEGIN { $f = 1; } # our part: the ternary '? :' is ' IF ? THEN : ELSE' # IF $f is true (ie we are processing the first file) $f ? # THEN use @F elements to create a subkey assigning to it an empty + anonymous array [] $hoh{$F[1]}{$F[0]} = [] : # ELSE (we are processing the second file) # push into the subkey (that holds an array) the value of the firs +t element push(@{$hoh{'Wood'}{$F[2]};}, $F[0]); # the trick: eof is true when we reach the end of a file: so at th +e end of file-1 it happens to be true: we trap it and # we change $f to 0 ie we are stating we are processing, from now +on, the second (or third) file $f = 0 if eof; # our END block executed only once at the end of program is used t +o dump the datastructure # it can be conceiled using the 'eskimo operator' trick, I prefere + Data::Dump over Data::Dumper sub END { print Dumper(\%hoh); } ; } # B::Deparse tell us the syntax is OK -e syntax OK [download] L* There are no rules, there are no thumbs.. Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.	[reply] [d/l] [select]
Re^2: Print hash keys and lookup the keys for values in another file -- oneliner and more explained by Magnolia25 (Sexton) on Feb 10, 2017 at 16:05 UTC
#/usr/bin/perl -w use warnings; use strict; use Data::Dumper; my (@county,%result); my $inFile01 ="FILE01.dat"; #CLAYCOUNTY;Wood;statecode=FL #CLAYCOUNTY;Wood;statecode=FL #SUWANNEECOUNTY;Wood;statecode=FL #SUWANNEECOUNTY;Wood;statecode=TX #SUWANNEECOUNTY;Wood;statecode=TX #SUWANNEECOUNTY;Wood;statecode=TX #NASSAUCOUNTY;Wood;statecode=UT open(DATA01,'<',$inFile01)or die("Can't open input file\"$inFile01\":$ +!\n"); while (<DATA01>) { # Skipping if the line is empty or a comment next if ( $_ =~ /^\s$/ ); next if ( $_ =~ /^#\s/ ); my ($county,$srch,$stcode) = split(";",$_); chomp($srch,$county); if ($srch eq "Wood") { push (@county,$county) } } close(DATA01); my $inFile02 ="FILE02.dat"; #119736;Residential;CLAYCOUNTY #448094;Residential;CLAYCOUNTY #206893;Residential;CLAYCOUNTY #333743;Residential;CLAYCOUNTY #172534;Residential;CLAYCOUNTY #785275;Residential;CLAYCOUNTY #995932;Residential;CLAYCOUNTY #223488;Residential;CLAYCOUNTY #433512;Residential;CLAYCOUNTY #640802;Residential;SUWANNEECOUNTY #403866;Residential;SUWANNEECOUNTY #828788;Residential;SUWANNEECOUNTY #751490;Residential;SUWANNEECOUNTY #972562;Residential;SUWANNEECOUNTY #367541;Residential;SUWANNEECOUNTY #481360;Residential;SUWANNEECOUNTY #920232;Residential;NASSAUCOUNTY #727659;Residential;NASSAUCOUNTY #471817;Residential;NASSAUCOUNTY #983043;Residential;NASSAUCOUNTY #578286;Residential;NASSAUCOUNTY foreach my $cnty (@county) { my @countycode; open(DATA02,'<',$inFile02)or die("Can't open input file\"$inFi +le02\":$!\n"); while (<DATA02>) { # Skipping if the line is empty or a comment next if ( $_ =~ /^\s$/ ); next if ( $_ =~ /^#\s/ ); my ($code,$attr,$countyy) = split (";",$_); chomp ($code,$attr,$countyy); if ($countyy eq $cnty) { push @countycode, $code; } $result{$cnty} = [@countycode] } } close(DATA02); print Dumper \%result; foreach my $key (keys %result) { print "$key" . "\n"; my $op = join "\|", @{$result{$var}}; print "$op" . "\n"; } [download] I am able to get thw desired values. One last thing with printing the output to an another file now which is required as below: for all the values in @{$result{$var}} I need to print as follows to a output file (No particular order) - for 119736 Need two lines in file as below (similarly for all). `L\|A\|119736\|119736\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| M\|A\|119736\|\|\|\|Wood\|Wood\|CONSTANT_STRING` [download] complete file looks like `L\|A\|119736\|119736\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| M\|A\|119736\|\|\|\|Wood\|Wood\|CONSTANT_STRING L\|A\|448094\|119736\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| M\|A\|448094\|\|\|\|Wood\|Wood\|CONSTANT_STRING L\|A\|206893\|206893\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| M\|A\|206893\|\|\|\|Wood\|Wood\|CONSTANT_STRING L\|A\|333743\|333743\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| M\|A\|333743\|\|\|\|Wood\|Wood\|CONSTANT_STRING L\|A\|172534\|172534\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\| M\|A\|172534\|\|\|\|Wood\|Wood\|CONSTANT_STRING ..... .... ....` [download] Thanks.	[reply] [d/l] [select]
Re^3: Print hash keys and lookup the keys for values in another file -- oneliner and more explained by poj (Abbot) on Feb 10, 2017 at 17:18 UTC
Use a hash rather than 2 loops. Format the output with printf #/usr/bin/perl use warnings; use strict; use Data::Dumper; my $search = 'Wood'; my %file01 = (); my $inFile01 = "FILE01.dat"; open DATA01,'<',$inFile01 or die "Can't open file '$inFile01' : $!"; while (<DATA01>) { chomp; # Skipping if the line is empty or a comment next if ( $_ =~ /^\s$/ ); next if ( $_ =~ /^#\s/ ); my ($county,$var,$stcode) = split ";",$_; if ($var eq $search) { $file01{$county} = 1; } } close DATA01; print Dumper \%file01; my %result = (); my $inFile02 = "FILE02.dat"; open DATA02,'<',$inFile02 or die "Can't open file '$inFile02' : $!"; while (<DATA02>) { chomp; # Skipping if the line is empty or a comment next if ( $_ =~ /^\s$/ ); next if ( $_ =~ /^#\s/ ); my ($code,$attr,$county) = split ";",$_; if ( exists $file01{$county} ) { push @{ $result{$county} },$code } } close DATA02; print Dumper \%result; my $fmt_1 = "L\|A\|%d\|%d\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\|\n"; my $fmt_2 = "M\|A\|%d\|\|\|\|%s\|%s\|CONSTANT_STRING\n"; foreach my $county (keys %result) { for my $id ( sort @{$result{$county}} ){ printf $fmt_1,$id,$id; printf $fmt_2,$id,$search,$search; } } [download] poj	[reply] [d/l]
Re^4: Print hash keys and lookup the keys for values in another file -- oneliner and more explained by Magnolia25 (Sexton) on Feb 12, 2017 at 15:09 UTC