SayWhat?! has asked for the wisdom of the Perl Monks concerning the following question:

Hello wise ones!

I'm currently trying to compare my hash's keys and values by means of a regex. Here is a sample of my input (a tab separated two column text file:

vriendelik aardig polisieman agent net-net amper gedierte beest naak bloot homoseksueel flikker bronstig geil menskop hoofd toedraai inpakken dierekop kop onskuldig onnozel perdeteler paardenfokker dennebol pijnappel

as well as my code thus far:

#!/usr/bin/perl-w use strict; use warnings; use open ':utf8'; use autodie; open NONMATCHINPUT, "<OutputNonMatchedWords.txt"; open OIC, ">OutputIdenticalCognates.txt"; open ONIC, ">OutputNonIdenticalCognates.txt"; open ONC, ">OutputNonCognates.txt"; my %nonmatchhash; while (my $line = <NONMATCHINPUT>) { chomp $line; #split the line on tab my ($nonmatchhashkeys, $nonmatchhashvalues) = split /\t/, $line; $nonmatchhash{$nonmatchhashkeys} = $nonmatchhashvalues; #if the values of the hash are exactly the same as the keys of the + hash if ($nonmatchhash{$nonmatchhashkeys} = $nonmatchhash{$nonmatchhash +values}) { #print both key and value to OutputIdenticalCognates.txt, sepa +rated by a tab print OIC "$nonmatchhashkeys\t$nonmatchhashvalues\n"; } #assign each key in the hash to $AfrColumn1token foreach my $AfrColumn1token(keys %nonmatchhash) { #if the Afrikaans word ($AfrColumn1token) contains: anything, +followed by 'agtig', followed by 'e' or 'er' or 'ste' (optional), at +the end of the string if ($AfrColumn1token =~ /(.*)(agtig)(e|er|ste)?$/) { #then, by using a foreach, assign each value in the hash t +o $DutColumn2token foreach my $DutColumn2token (values %nonmatchhash) { #And then, if the Dutch word ($DutColumn2token) contai +ns: anything, followed by 'achtig', followed by 'e' or 'er' or 'ste' +(optional), at the end of the string if ($DutColumn2token =~ /(.*)(achtig)(e|er|ste)?$/) { #print it to OutputNonIdenticalCognates.txt print ONIC "$AfrColumn1token\t$DutColumn2token\n"; } } } else { #else, print it to OutputNonCognates.txt print ONC "$AfrColumn1token\t$DutColumn2token\n"; } } }

I want to check if the hash's key consists of that which I entered into the regex. If that is true, I want to do a similar check with the hash's values - again with a regex.

To explain:

if $AfrColumn1token consists of: anything, followed by 'agtig', followed by 'e' or 'er' or 'ste' (optio +nal), at the end of the string, #then check to see if DutColumn2token consists of: anything, followed by 'ac +htig', followed by 'e' or 'er' or 'ste' (optional), at the end of the + string. #then that particular key and value must be written to OutputNonIdenticalCog +nates.txt, else write the pair to OutputNonCognates.txt.

I then want to repeat the complete foreach loop eleven (11) times, because I have 11 different rules I need to implement in my program.

What I would like to know now: is that foreach allowed in perl? If so, what is wrong with it, because the output files OutputNonIdenticalCognates.txt andOutputNonCognates.txt are empty. If it's not allowed, how cant I change it so it does the same thing I's like it to do..?

Thank you in advance! :)

Replies are listed 'Best First'.
Re: Comparing hash keys and values with Regular Expressions
by CountZero (Bishop) on Jun 30, 2012 at 17:41 UTC
    This looks strange:
    if ($nonmatchhash{$nonmatchhashkeys} = $nonmatchhash{$nonmatchhash +values})
    You are using the "=" assignment operator in a test. If you want to test if key and value are equal you should use "eq".

    By the way: the Dutch word for "onskuldig" is "onnozel" (with a "z" rather than a "s").

    CountZero

    A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

    My blog: Imperial Deltronics

      Hi there! I tested

      if ($nonmatchhash{$nonmatchhashkeys} = $nonmatchhash{$nonmatchhash{values})

      with both '=' and 'eq', and the output was the same both times: correct, which is exactly what I wanted it to be. But the thing is, my problem lies at the 'foreach', actually.. But thanks anyway and thanks for the correction of 'onnozel'. :)

        Yet, you are doing totally different things:
        • = assignment operator
        • == numeric test for equality
        • eq string test for equality

        CountZero

        A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

        My blog: Imperial Deltronics
        1. "with both '=' and 'eq', and the output was the same both times: correct" is NOT a comprehensive test of the truth or falsity of the advice you were given... which is itself correct, while your usage is NOT.
        2. You initially addressed your question to "wise ones." It would be consistent to carefully consider their answers, rather applying your logically-inadequate testing and then knocking out a answer that denies their wisdom.

        with both '=' and 'eq', and the output was the same both times: correct, which is exactly what I wanted it to be.

        That is what they all say

        my %foo; my %bar = ( 1, 2 ); if( $foo{1} = $bar{1} ){ warn "assignment is assignment"; } if( $foo{1} = 'any true value' ){ warn "assign any true value, expression is true"; } warn "foo eq bar ", int ( $foo{1} eq $bar{1} ); __END__ assignment is assignment at - line 5. assign any true value, expression is true at - line 9. foo eq bar 0 at - line 11.

        Just because the output is the same doesn't mean much, even a broken clock is right twice a day

        Think of it a different way, you have a problem you can't solve and you're asking for help in solving it --- maybe, just maybe, those you're asking for help know something you don't know

        Update: Created on the basis of my own mistaken belief that the initial reply had hit the bit-bucket thanks to some error on my part. Ignore till reaped.

        You addressed your op to "wise ones".

        You have answers from amongst the wisest.

        It would be consistent if you were to heed the response (and now, responses) from those from whom you sought answers... rather than relying on a logically insufficient test of your mistake as a rebuttal.

        CountZero

        Ok, I admit I made a mistake when saying the output was the same.. Sorry 'bout that, but that's how we learn, right? After testing that piece of code again, with '=', I got the output I desired. But when tested with 'eq', I got the following message as well as an empty output:

        Useless use of String eq in void context

        I also got this error message for every single line of my code:

        Use of uninitialized value in String eq

        So what's that all about then? :s And how can I correct it by using 'eq' instead of '=' then?

Re: Comparing hash keys and values with Regular Expressions
by zentara (Cardinal) on Jun 30, 2012 at 16:11 UTC
    Look at dialup spam removal with Net::POP3 and see what are called pre-compiled regexes from an array of strings. Aristotle shows improved code, in his response. But the idea is to make precompiled regexes one time from your strings, then you loop thru your data only one time, checking all lines against the precompiled regexes.

    I'm not really a human, but I play one on earth.
    Old Perl Programmer Haiku ................... flash japh
Re: Comparing hash keys and values with Regular Expressions
by Kenosis (Priest) on Jun 30, 2012 at 22:49 UTC

    You've received some excellent advice about your code. At the risk of confusing the issue (and I apologize in advance if it does), consider the following:

    use Modern::Perl; use open ':utf8'; use autodie; my (%nonmatchhash, @hashvalues); open my $NONMATCHINPUT, '<', 'OutputNonMatchedWords.txt'; do { /(.*)\t(.*)/; $nonmatchhash{$1} = $2; push @hashvalues, $2 } for <$NONMATCHINPUT>; close $NONMATCHINPUT; while ( my ( $key, $value ) = each %nonmatchhash ) { given ($key) { when (/\A$value\z/) { say "key '$key' eq value '$value'"; } when (/(.*)(agtig)(e|er|ste)?$/) { say "$key\t$_" for grep /(.*)(achtig)(e|er|ste)?$/, @hashvalues; } default { say "No match for key '$key'"; } } }

    From what I could gather from your code, it looks like you're taking an action after testing for three conditions, viz., 1) key/value equality, 2) matching a key, then a value under that key, and 3) no match. The script above handles these three cases, and may assist your coding decisions.

    Hope this helps!