Magnolia25 has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I have two files FILE-1 & FILE-2 in given format.

FILE-1

CLAYCOUNTY;Wood;statecode=FL CLAYCOUNTY;Wood;statecode=FL SUWANNEECOUNTY;Wood;statecode=FL SUWANNEECOUNTY;Wood;statecode=TX SUWANNEECOUNTY;Wood;statecode=TX SUWANNEECOUNTY;Wood;statecode=TX NASSAUCOUNTY;Wood;statecode=UT

Step 1: Parse file FILE-1 with search key "Wood" and store the file columns into hash. I want to use 2nd and 3rd column and put it into a hash. with the 2nd column being the keys and 1st column being the distinct values.

e.g. for Key "Wood" values are
CLAYCOUNTY,
SUWANNEECOUNTY,
NASSAUCOUNTY

open (my $fh, "<", $file) or die "Can't open the file $file: "; while (<$fh>) { my $line = $_ if /\bWood\b/; chomp ($line); last if !$line; %hash2 = (%hash2 , (split(/;/, $line)),[1,0]); } print Dumper \%hash2;

FILE-2 Format as below

119736;Residential;CLAYCOUNTY 448094;Residential;CLAYCOUNTY 206893;Residential;CLAYCOUNTY 333743;Residential;CLAYCOUNTY 172534;Residential;CLAYCOUNTY 785275;Residential;CLAYCOUNTY 995932;Residential;CLAYCOUNTY 223488;Residential;CLAYCOUNTY 433512;Residential;CLAYCOUNTY 640802;Residential;SUWANNEECOUNTY 403866;Residential;SUWANNEECOUNTY 828788;Residential;SUWANNEECOUNTY 751490;Residential;SUWANNEECOUNTY 972562;Residential;SUWANNEECOUNTY 367541;Residential;SUWANNEECOUNTY 481360;Residential;SUWANNEECOUNTY 920232;Residential;NASSAUCOUNTY 727659;Residential;NASSAUCOUNTY 471817;Residential;NASSAUCOUNTY 983043;Residential;NASSAUCOUNTY 578286;Residential;NASSAUCOUNTY

Step 2: From above step take the column 1 distict value if FILE-1 and find all the values in column 1 of my FILE-2 and write them to output file.
In FILE-2 I want to use the 3rd and 1st column, with the 3rd column being key and 1st column being the values.

e.g. For key
CLAYCOUNTY values are 119736, 448094, 206893, 333743, 172534, 785275, 995932, 223488, 433512
SUWANNEECOUNTY are 640802,403866,828788,751490,972562,367541,481360
NASSAUCOUNTY are 920232,727659, 471817, 983043, 578286

Please provide suggestions to move ahead on this. Thanks.

Replies are listed 'Best First'.
Re: Print hash keys and lookup the keys for values in another filr
by haukex (Archbishop) on Feb 01, 2017 at 15:55 UTC

    Hi Magnolia25,

    Note that your code contains some errors that prevent it from compiling. I'm going to assume you copied the code you showed out of a larger program and that you've actually got use warnings; use strict; and the missing variable declarations. In the future, please try to provide code that compiles (SSCCE).

    First, in last if !line;, you're missing the $ sigil from $line.

    Second, note how you've got a comma in (split(/;/, $line)),[1,0]. This is creating several values: the return values of split, plus the anonymous array [1,0]. What you probably meant is (split...)[1,0], which returns a list consisting of the second return value of split followed by the first return value of the split.

    If you fix these issues, you've still got a conceptual problem. The way I understand it, you're trying to store a hash with multiple values per key. One way to do this is with a "hash of arrays" or "hash of hashes", which are described with lots of example code in perldsc.

    Because this sounds very much like a homework assignment, I'm reluctant to do too much of your work for you :-) However, if you look at perlreftut and perldsc, I hope it will become more clear how you can accomplish your tasks:

    1. Read the file line-by-line, as you already are.
    2. split the line into its columns*.
    3. Store the values in your hash of hashes, such as $hash{$col[1]}{$col[0]}=1;, or into a hash of arrays via push @{ $hash{$col[1]} }, $col[0]; (see push).

    * For parsing files like this, it really would be best to use a module like Text::CSV.

    Your "Step 2" sounds very much like "Step 1", except with different columns. If you also need to filter the "Step 2" input file, I would suggest that you store the first column from your "Step 1" input file in a hash, and when you're parsing your "Step 2" input file, seeing if that value exists in the hash you created.

    Hope this helps,
    -- Hauke D

      I didn't comment on this section before since the program seems to have had very little effort put into it. The last if !line; part will never do anything even if corrected since $line is only declared and assigned a value if the regex matches.

      my $line = $_ if /\bWood\b/; chomp ($line); last if !line;

      Perhaps something like last if /^\s*$/ placed ahead of the my $line = $_ ... would work.

        Hi Lotus1,

        The last if !line; part will never do anything even if corrected

        Turns out I missed it yesterday - the line does do something:

        $ perl -wMstrict -n my $line = $_ if /\bWood\b/; chomp ($line); print "BANG!\n" if !$line; __END__ Hello, Wood World Foobar

        Outputs:

        Use of uninitialized value $line in scalar chomp at - line 2, <> line +2. BANG!

        The ... if !$line gets triggerd when $line is false, which is a case that can happen.

        There's another thing I missed yesterday: my $line = $_ if /\bWood\b/;, to which perlsyn has to say this:

        NOTE: The behaviour of a my, state, or our modified with a statement modifier conditional or loop construct (for example, my $x if ...) is undefined. The value of the my variable may be undef, any previously assigned value, or possibly anything else. Don't rely on it. Future versions of perl might do something different from the version of perl you try it out on. Here be dragons.

        So really, that entire loop body needs a rewrite, maybe:

        while (<$fh>) { chomp; next unless /\bWood\b/; ... }

        Regards,
        -- Hauke D

        Hi Lotus1,

        The last if !line; part will never do anything even if corrected

        You're right of course, I was at that moment more focused on getting the code compiling, and there was lots more to explain, so I skipped the line that doesn't do anything :-) Update: Oops, the line does do something, see my other reply.

        Thanks,
        -- Hauke D

      in last if !line;, you're missing the $ sigil from $line.

      Sorry that was an typo error

Re: Print hash keys and lookup the keys for values in another filr
by Lotus1 (Vicar) on Feb 01, 2017 at 15:42 UTC

    From the first file you are trying to find a unique list of the counties that match the word 'Wood'. That means the counties need to be the keys and the values can be an empty anoynymous array. For the second file you only need to iterate through the lines, split each line and lookup the third column in the hash. If the key exists then push the first column into an array at the value for that key in your hash. After reading through file 2 you can make a simple function to print out the values. This is a great chance for you to learn how to use a hash of arrays. Let us know what questions you have as you work through these steps.

Re: Print hash keys and lookup the keys for values in another file -- oneliner and more explained
by Discipulus (Canon) on Feb 02, 2017 at 10:07 UTC
    dear Magnolia25,

    Make sure to follow the wiser advice you already received. Be sure to understand all haukex said and read the docs linked by him.

    Pay attention to the final datastructure you want to fill: under the Wood key you are talking about an HashOfHashes like $hoh{Wood}{CLAYCOUNTY} values 119736, 448094,.. I think, as haukex already suggested, an HashOfArrays in the form: $hoh{Wood}{CLAYCOUNTY}[119736, 448094,..] is more appropriate.

    Infact when you arrange your data into the most appropriate datastructure, everything runs smoother.

    In principle the basis of all programming is a plain description of the solution. This is also in the signature of one of our estimated brother (the name is missing from my memory at the moment).

    Your plain solution or pseudocode, looks like:

    parse file-1 for every line: remove newline split it using ';' filling an array check if the element 1 (array start from 0) of such array is 'Wood' if yes create a sub key using element zero of the above array and as +sign as value an anonymous, empty array [] close file-1 parse file-2 for every line: remove newline split it using ';' filling an array if the element 2 is an already present key in the hash push the element 0 into the values of such key (we have defined it + as an anonymous array [] above) close file-2 print out the datastructure in the way you like

    This is the basic; you can make it complex at your will. Then YOU must translate into Perl code: this heavily depend on the level you are. Go slowly translating into Perl using well known best practices and idioms. Ask only for parts you do not understand how to do the translation or that give you back unexpected results.

    For your information (but eminently for my own amusement!) you must be aware that Perl is powerfull enought to get that job done in few keystroke. perlrun describes switches used below, but the below code is tricky and use some advanced tecniques like the BEGIN and END block, the ternary operator ' ? : ' and the eof trick (setting $f=0 where it means $first ) to parse the second file in a different way from first one... If you want to learn more you are welcome!

    Given file-1.txt and file-2.txt the two files you describe, the following oneliner do the trick:

    # pay attention to windows doublequote around the oneliner, on Linux i +s perl -e '...' not perl -e ".." perl -MData::Dumper -F";" -lane "BEGIN{$f=1};$f?$hoh{$F[1]}{$F[0]}=[] +:push @{$hoh{Wood}{$F[2]}},$F[0];$f=0 if eof; END {print Dumper \%hoh}" file-1.tx +t file-2.txt $VAR1 = { 'Wood' => { 'NASSAUCOUNTY' => [ '920232', '727659', '471817', '983043', '578286' ], 'SUWANNEECOUNTY' => [ '640802', '403866', '828788', '751490', '972562', '367541', '481360' ], 'CLAYCOUNTY' => [ '119736', '448094', '206893', '333743', '172534', '785275', '995932', '223488', '433512' ] } };

    In the case of such cryptic oneliner you can enjoy the ability of the core module B::Deparse to help you understanding: as for the synopsis of such module you just need to call the oneliner prepending perl -MO=Deparse to see it a bit more readable:

    # spacing and comments added perl -MO=Deparse -MData::Dumper -F";" -lane "BEGIN{$f=1};$f?$hoh{$F[1 +]}{$F[0]}=[]:push @{$hoh{Wood}{$F[2]}},$F[0]; $f=0 if eof; END{print Dumper \%hoh}" + file-1.txt file-2.txt # B::Deparse translate the above into the below: # the following block is added by the -l switch # automatically handling newlines BEGIN { $/ = "\n"; $\ = "\n"; } use Data::Dumper; # this while block is added by th -n switch (see also -p for completne +ss) LINE: while (defined($_ = <ARGV>)) { chomp $_; # the special @F array (see perlvar) is called into play by the -a + (autosplit) switch # (UPDATE) the -F";" switch states that we are going to automatica +lly split using ';' instead of the default (space) our(@F) = split(/;/, $_, 0); # this is our BEGIN block esplicitally put in pur oneliner setting + $f = 1 before doing anything # please note the BEGIN blocks are executed as they are seen by th +e compiler, so even if put inside the while # it is executed only once, as it is seen sub BEGIN { $f = 1; } # our part: the ternary '? :' is ' IF ? THEN : ELSE' # IF $f is true (ie we are processing the first file) $f ? # THEN use @F elements to create a subkey assigning to it an empty + anonymous array [] $hoh{$F[1]}{$F[0]} = [] : # ELSE (we are processing the second file) # push into the subkey (that holds an array) the value of the firs +t element push(@{$hoh{'Wood'}{$F[2]};}, $F[0]); # the trick: eof is true when we reach the end of a file: so at th +e end of file-1 it happens to be true: we trap it and # we change $f to 0 ie we are stating we are processing, from now +on, the second (or third) file $f = 0 if eof; # our END block executed only once at the end of program is used t +o dump the datastructure # it can be conceiled using the 'eskimo operator' trick, I prefere + Data::Dump over Data::Dumper sub END { print Dumper(\%hoh); } ; } # B::Deparse tell us the syntax is OK -e syntax OK

    L*

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.
      #/usr/bin/perl -w use warnings; use strict; use Data::Dumper; my (@county,%result); my $inFile01 ="FILE01.dat"; #CLAYCOUNTY;Wood;statecode=FL #CLAYCOUNTY;Wood;statecode=FL #SUWANNEECOUNTY;Wood;statecode=FL #SUWANNEECOUNTY;Wood;statecode=TX #SUWANNEECOUNTY;Wood;statecode=TX #SUWANNEECOUNTY;Wood;statecode=TX #NASSAUCOUNTY;Wood;statecode=UT open(DATA01,'<',$inFile01)or die("Can't open input file\"$inFile01\":$ +!\n"); while (<DATA01>) { # Skipping if the line is empty or a comment next if ( $_ =~ /^\s*$/ ); next if ( $_ =~ /^#\s*/ ); my ($county,$srch,$stcode) = split(";",$_); chomp($srch,$county); if ($srch eq "Wood") { push (@county,$county) } } close(DATA01); my $inFile02 ="FILE02.dat"; #119736;Residential;CLAYCOUNTY #448094;Residential;CLAYCOUNTY #206893;Residential;CLAYCOUNTY #333743;Residential;CLAYCOUNTY #172534;Residential;CLAYCOUNTY #785275;Residential;CLAYCOUNTY #995932;Residential;CLAYCOUNTY #223488;Residential;CLAYCOUNTY #433512;Residential;CLAYCOUNTY #640802;Residential;SUWANNEECOUNTY #403866;Residential;SUWANNEECOUNTY #828788;Residential;SUWANNEECOUNTY #751490;Residential;SUWANNEECOUNTY #972562;Residential;SUWANNEECOUNTY #367541;Residential;SUWANNEECOUNTY #481360;Residential;SUWANNEECOUNTY #920232;Residential;NASSAUCOUNTY #727659;Residential;NASSAUCOUNTY #471817;Residential;NASSAUCOUNTY #983043;Residential;NASSAUCOUNTY #578286;Residential;NASSAUCOUNTY foreach my $cnty (@county) { my @countycode; open(DATA02,'<',$inFile02)or die("Can't open input file\"$inFi +le02\":$!\n"); while (<DATA02>) { # Skipping if the line is empty or a comment next if ( $_ =~ /^\s*$/ ); next if ( $_ =~ /^#\s*/ ); my ($code,$attr,$countyy) = split (";",$_); chomp ($code,$attr,$countyy); if ($countyy eq $cnty) { push @countycode, $code; } $result{$cnty} = [@countycode] } } close(DATA02); print Dumper \%result; foreach my $key (keys %result) { print "$key" . "\n"; my $op = join "|", @{$result{$var}}; print "$op" . "\n"; }

      I am able to get thw desired values. One last thing with printing the output to an another file now which is required as below: for all the values in @{$result{$var}} I need to print as follows to a output file (No particular order) - for 119736 Need two lines in file as below (similarly for all).

      L|A|119736|119736||||||||||||||||||||||| M|A|119736||||Wood|Wood|CONSTANT_STRING

      complete file looks like

      L|A|119736|119736||||||||||||||||||||||| M|A|119736||||Wood|Wood|CONSTANT_STRING L|A|448094|119736||||||||||||||||||||||| M|A|448094||||Wood|Wood|CONSTANT_STRING L|A|206893|206893||||||||||||||||||||||| M|A|206893||||Wood|Wood|CONSTANT_STRING L|A|333743|333743||||||||||||||||||||||| M|A|333743||||Wood|Wood|CONSTANT_STRING L|A|172534|172534||||||||||||||||||||||| M|A|172534||||Wood|Wood|CONSTANT_STRING ..... .... ....
      Thanks.

        Use a hash rather than 2 loops. Format the output with printf

        #/usr/bin/perl use warnings; use strict; use Data::Dumper; my $search = 'Wood'; my %file01 = (); my $inFile01 = "FILE01.dat"; open DATA01,'<',$inFile01 or die "Can't open file '$inFile01' : $!"; while (<DATA01>) { chomp; # Skipping if the line is empty or a comment next if ( $_ =~ /^\s*$/ ); next if ( $_ =~ /^#\s*/ ); my ($county,$var,$stcode) = split ";",$_; if ($var eq $search) { $file01{$county} = 1; } } close DATA01; print Dumper \%file01; my %result = (); my $inFile02 = "FILE02.dat"; open DATA02,'<',$inFile02 or die "Can't open file '$inFile02' : $!"; while (<DATA02>) { chomp; # Skipping if the line is empty or a comment next if ( $_ =~ /^\s*$/ ); next if ( $_ =~ /^#\s*/ ); my ($code,$attr,$county) = split ";",$_; if ( exists $file01{$county} ) { push @{ $result{$county} },$code } } close DATA02; print Dumper \%result; my $fmt_1 = "L|A|%d|%d|||||||||||||||||||||||\n"; my $fmt_2 = "M|A|%d||||%s|%s|CONSTANT_STRING\n"; foreach my $county (keys %result) { for my $id ( sort @{$result{$county}} ){ printf $fmt_1,$id,$id; printf $fmt_2,$id,$search,$search; } }
        poj