darkmoon has asked for the wisdom of the Perl Monks concerning the following question:

Hi, I am new to perl. I have two files, I need to do comparison to find out the matching and non-matching data. I got two problems now: Question 1: one of my hashes can only capture the 2nd row of the 'num', i tried to use `push @{hash1{name1}},$x1,$y1,$x2,$y2` , but it still returning the 2nd row of the 'num'.

File1 :
name foo num 111 222 333 444 name jack num 999 111 222 333 num 333 444 555 777
File2:
name jack num 999 111 222 333 num 333 444 555 777 name foo num 666 222 333 444
This is my code:
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $input1=$ARGV[0]; my $input2=$ARGV[1]; my %hash1; my %hash2; my $name1; my $name2; my $x1; my $x2; my $y2; my $y1; open my $fh1,'<', $input1 or die "Cannot open file : $!\n"; while (<$fh1>) { chomp; if(/^name\s+(\S+)/) { $name1 = $1; } if(/^num\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/) { $x1 = $1; $y1 = $2; $x2 = $3; $y2 = $4; } $hash1{$name1}=[$x1,$y1,$x2,$y2]; } close $fh1; print Dumper (\%hash1); open my $fh2,'<', $input2 or die "Cannot open file : $!\n"; while (<$fh2>) { chomp; if(/^name\s+(\S+)/) { $name2 = $1; } if(/^num\s+(\S+)\s+(\S+)\s+(\S+)\s+(\S+)/) { $x1 = $1; $y1 = $2; $x2 = $3; $y2 = $4; } $hash2{$name2}=[$x1,$y1,$x2,$y2]; } close $fh2; print Dumper (\%hash2);
My output:
$VAR1 = { 'jack' => [ '333', '444', '555', '777' ], 'foo' => [ '111', '222', '333', '444' ] }; $VAR1 = { 'jack' => [ '333', '444', '555', '777' ], 'foo' => [ '666', '222', '333', '444' ] };
My expected Output:
$VAR1 = { 'jack' => [ '999', '111', '222', '333', '333', '444', '555', '777' ], 'foo' => [ '111', '222', '333', '444' ] }; $VAR1 = { 'jack' => [ '999', '111', '222', '333', '333', '444', '555', '777' ], 'foo' => [ '666', '222', '333', '444' ] };

Question 2: I tried to use this foreach loop to do the matching of keys and values and print out in a table format. I tried this :

print "\t\tFIle1\t\t\t\t\tFile2\n"; print "Name\tX1\tY1\tX2\tY2\t\t\tX1\tY1\tX2\tY2\n"; foreach my $k1(keys %hash1) { foreach my $k2(keys %hash2) { if($hash1{$name} eq $hash2{$name2}) { if($hash1{$x1}{$y1}{$x2}{$y2} == $hash2 +{$x1}{$y1}{$x2}{$y2}) { print "$name\$x1\$y1\$x +2\$y2\n"; } } } }

but Im getting the header only.

File1 File2 Name X1 Y1 X2 Y2 X1 Y1 X2 Y2
my desired output for matching :
File1 File2 Name x1 y1 x2 y2 x1 y1 x2 y2 jack 999 111 222 333 999 111 222 333 333 444 555 777 333 444 555 777
Any help?

Replies are listed 'Best First'.
Re: how to push multiples row of values into hash and do comparison
by Discipulus (Canon) on Oct 18, 2018 at 07:27 UTC
    Hello darkmoon and welcome to the monastery and to the wonderful world of Perl!

    For the first question: you are dealing with a multiline match, in the sense that after name is found any num belong to it. This is achieved better with something like $current_name initialized out of the loop. you were right with the `push @{hash1{name1}},$x1,$y1,$x2,$y2` tecnique but you missed a dollar sigyl before name1: in your actual code with the statement $hash1{$name1}=[$x1,$y1,$x2,$y2]; you are reinizialing the value to a new array each time.

    Notice that I used push @{$hash1{$current_name}}, $_ for $line =~ /\d+/g pushing all numbers into the array: perhaps you want more robust control appending a check like: if $line =~/^num\s+\d/

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my $current_name; my %hash1; while (my $line = <DATA>){ chomp $line; if($line =~ /^name\s+(\S+)/) {$current_name = $1;} push @{$hash1{$current_name}}, $_ for $line =~ /\d+/g } print Dumper \%hash1; __DATA__ name foo num 111 222 333 444 name jack num 999 111 222 333 num 333 444 555 777

    See the above running at webperl demo by haukex (oh oh oh!;)

    L*

    UPDATE

    for the second question you can profit of the exists function: foreach key of the first hash, if exist the same in the second one, you print also the second list. If you want values in group of four it's a bit trickier but feasible

    UPDATE October 22 in reponse to Re^2: how to push multiples row of values into hash and do comparison

    $_ is the default input variable in perl: see perlvar

    In my code I use it only in the push @{$hash1{$current_name}}, $_ for $line =~ /\d+/g statement. Let's reduce the example a bit. The g modifier in the regex in list context returns the list of all matches: see it in perlrequick so $line =~ /\d+/g returns a list.

    The first part push @{$hash1{$current_name}}, $_ push into the array the default variable of the foreach loop aka $_ so translated to english sounds like: push in this array every match obtained matching one or more number in this line.

    There are no rules, there are no thumbs..
    Reinvent the wheel, then learn The Wheel; may be one day you reinvent one of THE WHEELS.

      hi, thanks for the response and sorry for the late reply. May i know $_ referring to which variable? I'm quite confusing about this as I can see many of this sign in other place.

Re: how to push multiples row of values into hash and do comparison
by thanos1983 (Parson) on Oct 18, 2018 at 08:40 UTC

    Hello darkmoon,

    Welcome to the Monastery.

    Regarding your second question you can use something like that:

    #!/usr/bin/perl use strict; use warnings; use Data::Dumper; my %some_data = ( jack => ['999', '111', '222', '333', '333', '444', '555', '777'], foo => [ '111', '222', '333', '444'], ); my %other_data = ( jack => ['999', '111', '222', '333', '333', '444', '555', '777'], foo => ['111', '222', '333', '444'], ); # print Dumper \%other_data; use Test::More tests => 1; is_deeply(\%other_data, \%some_data, 'data structures should be the sa +me'); __END__ $ perl test.pl 1..1 ok 1 - data structures should be the same

    Simple and effective. Hope this helps, BR.

    Seeking for Perl wisdom...on the process of learning...not there...yet!
      Thank you for your response ! :)
Re: how to push multiples row of values into hash and do comparison
by BillKSmith (Monsignor) on Oct 18, 2018 at 13:46 UTC

    Your use of 'strict' is an excellent practice. However, declaring all your variables at the start of your file defeats much of the advantage. You should declare each variable in the smallest possible scope. (Strict will usually tell you if place a declaration in to small a scope or if you try to use a variable that is out of scope.) This style makes it nearly impossible to accidentally use data left over from a previous iteration or from elsewhere in your program. When you are debugging a problem with a variable, you can be certain that the error occurs in the block where it is declared.

    Consider an example from your code. It appears that you are storing stale values of $x1, $x2, etc. in your hash. Even if it is not a problem, you must disprove it. If those variables were declared inside the loop, the question would never arise.

    Bill

      Hi, thanks for the reply. There was an error if i declare the variable inside the while loop. So i was thinking that maybe i can declare it as global variable so that I still can get the output after the while loop.

        Before addressing your comment, let me clear up one point of confusion. In perl, the term 'global variable' means the same thing as 'package variable' (declared with 'our' or 'use vars'). You did not declare any of these. By moving the declaration, you increased the scope of a lexical variable to include the entire file.

        when you see a message that a variable is not declared, it can mean any of several things.

        • You forgot to declare the variable.
        • You misspelled the variable
        • You used the variable some place you did not intend.
        • You declared the variable in too narrow a scope.
        Removing 'strict' will suppress the message, but not fix any of the errors. Moving the declaration to the start of the file will probably also suppress the message. It will 'fix' the scope problem, but none of the others. In exchange, you have given up this protection against errors you may make in the future.

        The solution is 'smallest possible scope'. Move the declaration not to the start of the file, but to the start of the smallest block which includes every necessary use of the variable.

        Bill
Re: how to push multiples row of values into hash and do comparison
by tybalt89 (Monsignor) on Oct 18, 2018 at 20:31 UTC
    #!/usr/bin/perl # https://perlmonks.org/?node_id=1224208 use strict; use warnings; use Data::Dump 'dd'; my $file1 = <<END; name foo num 111 222 333 444 name jack num 999 111 222 333 num 333 444 555 777 END my $file2 = <<END; name jack num 999 111 222 333 num 333 444 555 777 name foo num 666 222 333 444 END my %hash1 = gethash($file1); my %hash2 = gethash($file2); #dd 'file1', \%hash1, 'file2', \%hash2; print "\t\tFile1\t\t\t\t\tFile2\n"; print "Name\tX1\tY1\tX2\tY2\t\tX1\tY1\tX2\tY2\n"; for my $key ( keys %hash1 ) { my $name = $key; for ( match( $hash1{$key}, $hash2{$key} ) ) { print "$name\t$_\t\t$_\n"; $name = ''; } } sub match { my ($one, $two) = @_; my @same; for my $item (@$one) { push @same, grep $item eq $_, @$two; } return @same; } sub gethash { local $_ = shift; s/^\s+//gm; s/[ \t]+/\t/g; my %hash; while( /^name\s+(\S+)\n(num.*\n)*/gm ) { my $name = $1; $hash{$name} = [ $& =~ /^num\s*(.*)/gm ]; } return %hash; }

    Outputs :

    File1 File2 Name X1 Y1 X2 Y2 X1 Y1 X2 + Y2 jack 999 111 222 333 999 111 222 + 333 333 444 555 777 333 444 555 + 777

      Thanks you for your response. I tried your code and it is awesome! However , I cannot get the output after i tried to modify your code. i want to get the input by using the $ARGV method.

      #!/usr/bin/perl use strict; use warnings; use Data::Dump 'dd'; my $input1=$ARGV[0]; my $input2=$ARGV[1]; open my $fh1, '<', $input1 or die "Cannot open file : $!\n"; my %hash1 = gethash($fh1); close $fh1; open my $fh2,'<',$input2 or die "Cannot open file : $!\n"; my %hash2 = gethash($fh2); close $fh2; dd 'file1', \%hash1, 'file2', \%hash2; print "\t\tFile1\t\t\t\t\tFile2\n"; print "Name\tX1\tY1\tX2\tY2\t\tX1\tY1\tX2\tY2\n"; for my $key ( keys %hash1 ) { my $name = $key; for ( match( $hash1{$key}, $hash2{$key} ) ) { print "$name\t$_\t\t$_\n"; $name = ''; } } sub match { my ($one, $two) = @_; my @same; for my $item (@$one) { push @same, grep $item eq $_, @$two; } return @same; } sub gethash { local $_ = shift; s/^\s+//gm; s/[\t]+/\t/g; my %hash; while(/^name\s+(\S+)\n(num.*\n)*/gm ) { my $name = $1; $hash{$name} = [$& =~ /^num\s*(.*)/gm ]; } return %hash; }

      Any idea ?

        It would help to actually read the file data (untested change)

        sub gethash { my $fh = shift; local $/; local $_ = <$fh>; s/^\s+//gm; s/[\t]+/\t/g; my %hash; while(/^name\s+(\S+)\n(num.*\n)*/gm ) { my $name = $1; $hash{$name} = [$& =~ /^num\s*(.*)/gm ]; } return %hash; }
Re: how to push multiples row of values into hash and do comparison (SQL DBD::CSV)
by Anonymous Monk on Oct 18, 2018 at 09:15 UTC
    Use DBD::CSV treat the files like sql database