Re: Comparing / Searching through Hashes

Hello to you too, Athanasius! :)

I'm afraid that the "goed braaf" duplicate is a mistake that slipped in by means of copying and pasting... I have already corrected it.

I have to thank you though, you're giving me such helpful advice - in a language I can understand! I hope you will continue doing so, if you don't mind.. :)

aaron_baugher The first columns definitely have to match and the second column has to match partly, yes. The reason for this is because they are false friends of one another.

In the first column we have the Afrikaans word 'irriterend'. In Dutch, the word for 'irriterend' can either be 'irritant OR vervelend'

Like in English and German: 'Chef' in English means ‘someone who cooks for a living’ but a 'Chef' in German in the 'director or boss of a company'.. Does that make better sense now? I don’t really know how else to explain it, I'm afraid..

Thanks for all the advice, though! You're really helping me to make progress! :)

And I would also like to apologise to anyone that I might have annoyed by posting a new question.. I'm still new at this whole posting thing. Hopefully it will improve just like my coding hopelfully will!

Comment on Re: Comparing / Searching through Hashes

Replies are listed 'Best First'.
Re^2: Comparing / Searching through Hashes by aaron_baugher (Curate) on Jun 28, 2012 at 21:15 UTC
Thanks for the clarification. In that case, I think you're on the right track: put fileB in a hash, then go through fileA checking each key from fileA for existence as a key in fileB. That's the standard idiom for this kind of thing, but in your case there will be the extra step that once you find a match on the keys from the first column, you'll also need to check for a match on the second column. That might look something like the code below (untested). The tricky part may be that inner `if` comparison. In mine, I'm just testing to see if either value is found as a substring in the other. If you need something more sophisticated, you'll have to adjust that there. # %b is a hash already containing the values from fileB, with the # first column as keys and the second column as values. # $file_of_matches is a file descriptor opened to one output file # $file_of_misses is a file descriptor for the other output file open my $fileA, '<', 'fileA' or die $!; while( my $line = <$fileA> ){ # get a line from fileA chomp $line; my( $k, $v ) = split /\t/, $line; # split the line on tab if( $b{$k} ){ # do first columns match? if( $b{$k} =~ /$v/ or $v =~ /$b{$k}/ ){ # does one second column v +alue contain # the other as a substring +? print $file_of_matches "$line\n"; # yes, so print it to the +match file next; # and loop to the next lin +e } } print $file_of_misses "$line\n"; # no, so print it to the n +on-match file } [download] By the way, note that this: `while( my $line = <$fileA> ){ # do stuff with $line # replaces this: while( <$fileA> ){ my $line = $_; # do stuff with $line` [download] It saves a line and avoids potential bugs that may be caused by using $_ sort of halfway. Aaron B. Available for small or large Perl jobs; see my home node.	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^2: Comparing / Searching through Hashes
by aaron_baugher (Curate) on Jun 28, 2012 at 21:15 UTC

Thanks for the clarification. In that case, I think you're on the right track: put fileB in a hash, then go through fileA checking each key from fileA for existence as a key in fileB. That's the standard idiom for this kind of thing, but in your case there will be the extra step that once you find a match on the keys from the first column, you'll also need to check for a match on the second column. That might look something like the code below (untested). The tricky part may be that inner if comparison. In mine, I'm just testing to see if either value is found as a substring in the other. If you need something more sophisticated, you'll have to adjust that there.

# %b is a hash already containing the values from fileB, with the 
# first column as keys and the second column as values.
# $file_of_matches is a file descriptor opened to one output file
# $file_of_misses is a file descriptor for the other output file
open my $fileA, '<', 'fileA' or die $!;
while( my $line = <$fileA> ){               # get a line from fileA
  chomp $line;
  my( $k, $v ) = split /\t/, $line;         # split the line on tab
  if( $b{$k} ){                             # do first columns match?
    if( $b{$k} =~ /$v/ or $v =~ /$b{$k}/ ){ # does one second column v
+alue contain
                                            # the other as a substring
+?
      print $file_of_matches "$line\n";     # yes, so print it to the 
+match file
      next;                                 # and loop to the next lin
+e
    }
  }
  print $file_of_misses "$line\n";          # no, so print it to the n
+on-match file
}
[download]

By the way, note that this:

while( my $line = <$fileA> ){
   # do stuff with $line

# replaces this:

while( <$fileA> ){
  my $line = $_;
  # do stuff with $line
[download]

It saves a line and avoids potential bugs that may be caused by using $_ sort of halfway.

Aaron B.
Available for small or large Perl jobs; see my home node.

[reply]
[d/l]
[select]