iterating hash keys?

R56 has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: using hashes by BrowserUk (Patriarch) on Sep 26, 2013 at 14:56 UTC
and iterate through all the hash keys Don't ever iterate hash keys! (Well, hardly ever :) The major purpose of hashes is that you can lookup the value associated with any key directly, avoiding iteration. For your purpose, the major part of the code should be something like: `while( <$names_to_be_replaced_file> ) { ## read each line s[\b([a-z]+)\b][ $name_id{ $1 } ]ge; ## find words, look them up + and replace them with the id print; ## Send the modified lines +to stdout }` [download] Simple and very efficient. With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l]
Re^2: using hashes by R56 (Sexton) on Sep 26, 2013 at 16:51 UTC
Thanks for the help Browser, but apparently I'm way behind in Perl knowledge yet, as I don't really get the code... That's how I know I'm overcomplicating something that is really simple :\| Is the $1 var pointing to the value of the hash?	[reply]
Re^3: using hashes by BrowserUk (Patriarch) on Sep 26, 2013 at 17:09 UTC
Is the $1 var pointing to the value of the hash? `$1` captures the words in the string one at a time. This `$hash{ $1 }` looks that word up in the hash and returns the associates value (id). The `ge` causes the ids to be substituted for every word in the line. Perhaps this will clarify things? `%hash = ( brown=>1, fox=>2, quick=>3, the=>4 );; $line = 'the quick brown fox';; $line =~ s[\b([a-z]+)\b][ $hash{ $1 } ]ge;; print $line;; 4 3 1 2` [download] With the rise and rise of 'Social' network sites: 'Computers are making people easier to use everyday' Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice.	[reply] [d/l] [select]
Re^4: using hashes by R56 (Sexton) on Sep 26, 2013 at 18:21 UTC
Re^3: using hashes by hdb (Monsignor) on Sep 26, 2013 at 17:30 UTC
In order to make things even more complicated I recommend to replace `$hash{ $1 }` with `$hash{ $1 } // $1` [download] which means if `$1` is not found in your hash, then replace your word with itself, ie leave it unchanged.	[reply] [d/l] [select]
Re^3: using hashes by aaron_baugher (Curate) on Sep 26, 2013 at 17:16 UTC
`s[\b([a-z]+)\b][ $name_id{ $1 } ]ge;` The 's' at the beginning says to find a pattern and replace it. The 'g' at the end says to repeat this process as many times as possible. The 'e' at the end says that the replacement part should be evaluated as code, not treated as literal text. In the first part, the pattern, the \b matches a "word boundary," the boundary between word characters and non-word characters like your commas. `[a-z]+` means a string of 1 or more consecutive lowercase letters. The parentheses around that capture whatever is matched within them and save it in the special variable $1. In the replacement part, $1 contains the matched word, so this becomes a simple lookup for that word as a key in the `%name_id` hash, replacing it with the value corresponding to that key. As mentioned before, because of the 'g', this entire process is repeated for each match found in the line. Aaron B. Available for small or large Perl jobs; see my home node.	[reply] [d/l] [select]
Re: using hashes by kennethk (Abbot) on Sep 26, 2013 at 15:06 UTC
First, what mtmcc said. Second, a quote from the illustrious prophet: Doing linear scans over an associative array is like trying to club someone to death with a loaded Uzi. -- TimToady You should put your keys into a hash, yes, but then just iterate over your array. The array values are exactly what you need to access the hash values. So it might look like: `my %id = (bananas => 456, oranges => 23, peaches => 897236, kiwis => 3726, ); my @replaces = ('kiwis','oranges','bananas','bananas'); for my $i (0 .. $#replaces) { $replaces[$i] = $id{$replaces[$i]}; }` [download] If I were going to actually write this, I'd take advantage of the fact that the loop iterator for Foreach Loops is an lvalue for the array element (`$_ = $id{$_} for @replaces;`), but that might be a little to magical for your taste given your familiarity with the language. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l] [select]
Re^2: using hashes by R56 (Sexton) on Sep 26, 2013 at 15:53 UTC
Thanks Kenneth, I understand your code, but I may have more than one name in the same line, such as: bananas,peaches,kiwis peaches,peaches pineapple (...) So we couldn't use that kind of cycling on the array positions, right? Or am I missing something?	[reply]
Re^3: using hashes by AnomalousMonk (Archbishop) on Sep 26, 2013 at 16:17 UTC
... am I missing something? You're missing what BrowserUk said here, an approach that processes an entire line at a time. The next part to think about is what happens if you encounter a 'word' in a line that doesn't exist in your translation hash, e.g., the line `"peaches,peaches,foobar,kiwis\n"` (hints: exists, next, maybe `//` (defined-or) or `?:` (ternary/conditional operator) – see perlop for the latter two).	[reply] [d/l] [select]
Re: iterating hash keys? by kcott (Archbishop) on Sep 26, 2013 at 18:59 UTC
G'day R56, Welcome to the monastery. Firstly, a word about your data. The term list has a special meaning in Perl: see "perldata: List value constructors". I've taken what you've described as lists to be records in files. Given you wrote "... the 'names to be replaced' file ...", that seems correct for the second list; although, until I had read that far, I initially thought you might have been talking about a list of lists (which is something different — see perllol). Anyway, this means you (probably) have a CSV (comma-separated values) file which is best read using a module like Text::CSV. The reason for this is that there are all sorts of gotchas with CSV files which have already been coded for in these modules. As an example, consider two records: "`apples, red,cherries`" and "`apples, red cherries`". If you had an ID for "`apples, red`", how would you handle the replacement in those two records. So, I'd suggest you check whether your data really is as simple as the examples you've posted; and consider the chances of it staying that way in the future. You may need to revisit whatever solution you choose based on those findings. The solution I provide below assumes nothing more complex than what you currently show. Here's my take on a solution. I create a hash mapping names to IDs (same as you). Next, I use the keys of that hash to create a regex with an alternation (e.g. `bananas\|oranges\|...`) such that only the names with IDs will be matched. Finally, the replacements are made and the new data is output. #!/usr/bin/env perl use strict; use warnings; use autodie; my $in_file_name_id = 'pm_1055846_name_id_data.txt'; my $in_file_name_replace = 'pm_1055846_name_replace_data.txt'; my $out_file_name_replaced = 'pm_1055846_name_replaced_out.txt'; open my $in_id_fh, '<', $in_file_name_id; my %id_for = map { split } <$in_id_fh>; close $in_id_fh; my $re = '\b(' . join('\|', keys %id_for) . ')\b'; open my $in_replace_fh, '<', $in_file_name_replace; open my $out_replaced_fh, '>', $out_file_name_replaced; while (<$in_replace_fh>) { s/$re/$id_for{$1}/g; print $out_replaced_fh $_; } [download] Here's the files. Notice I added "`pineapples`", which didn't have an ID, and so wasn't replaced. `$ cat pm_1055846_name_id_data.txt bananas 456 oranges 23 peaches 897236 kiwis 3726` [download] `$ cat pm_1055846_name_replace_data.txt bananas,oranges peaches,peaches,peaches kiwis oranges kiwis,oranges,bananas,bananas bananas,oranges,pineapples,peaches,kiwis` [download] `$ cat pm_1055846_name_replaced_out.txt 456,23 897236,897236,897236 3726 23 3726,23,456,456 456,23,pineapples,897236,3726` [download] -- Ken	[reply] [d/l] [select]
Re^2: iterating hash keys? by R56 (Sexton) on Sep 27, 2013 at 12:11 UTC
Hey Ken, good to be here :) Thank you for the patience to write all that. I don't know yet if the data will be this simple at all times, but it's always better to cover all the options if it doesn't sacrifice speed. Will definitely try out your code to see if I can improve this!	[reply]
Re^2: iterating hash keys? by R56 (Sexton) on Sep 27, 2013 at 14:09 UTC
Well, comparing to what I had, your code is faster than the speed of light! Is there a simple way for the s// to also include names with hyphens in the middle?	[reply]
Re^3: iterating hash keys? by kcott (Archbishop) on Sep 28, 2013 at 06:29 UTC
"Well, comparing to what I had, your code is faster than the speed of light!" That's a good start. :-) "Is there a simple way for the s// to also include names with hyphens in the middle?" The short answer is: yes. The longer answer depends on details. I found a reference you made to input data with hyphens in "Re^8: using hashes"; however, you provided no indication of the output you wanted (except that `20-10,25` was the wrong output when `bana-na,banana` was the input). The following is based on the code I provided earlier. Given these input files: `$ cat pm_1055846_name_id_data.txt bananas 456 oranges 23 peaches 897236 kiwis 3726 banana 25 bana 20 bana-na 15 na 10` [download] `$ cat pm_1055846_name_replace_data.txt bananas,oranges peaches,peaches,peaches kiwis oranges kiwis,oranges,bananas,bananas bananas,oranges,pineapples,peaches,kiwis bana-na,banana ba-na-na,bana-bana,bana-nana` [download] If you want output like this: `$ cat pm_1055846_name_replaced_out.txt 456,23 897236,897236,897236 3726 23 3726,23,456,456 456,23,pineapples,897236,3726 15,25 ba-10-10,20-20,20-nana` [download] Change `my $re = '\b(' . join('\|', keys %id_for) . ')\b';` [download] to `my $re = '\b(' . join('\|', sort { $b cmp $a } keys %id_for) . ')\b';` [download] If you want output like this: `$ cat pm_1055846_name_replaced_out.txt 456,23 897236,897236,897236 3726 23 3726,23,456,456 456,23,pineapples,897236,3726 15,25 ba-na-na,bana-bana,bana-nana` [download] Change `my $re = '\b(' . join('\|', keys %id_for) . ')\b';` [download] to `my $re = '(^\|,)(' . join('\|', sort { $b cmp $a } keys %id_for) . ')(?= +,\|$)';` [download] and `s/$re/$id_for{$1}/g;` [download] to `s/$re/$1$id_for{$2}/g;` [download] If you want something different to these, and are unable to work it out for yourself, provide details as outlined in the "How do I post a question effectively?" guidelines. It would also be useful to advise what version of Perl you're using: I wrote those changes for v5.8; a more efficient version could have been written for a later version. As a hint for doing this yourself, see `(?<=pattern) \K` under Look-Around Assertions in "perlre: Extended Patterns" — `\K` was introduced in v5.10.0 (see "perl5100delta: Regular expressions" for this, and other, regex enhancements). -- Ken	[reply] [d/l] [select]
Re^4: iterating hash keys? by R56 (Sexton) on Sep 30, 2013 at 14:12 UTC
Re: using hashes by mtmcc (Hermit) on Sep 26, 2013 at 14:51 UTC
What have you already tried? Also, have a look through this: How do I post a question effectively?	[reply]
Re^2: using hashes by R56 (Sexton) on Sep 26, 2013 at 15:08 UTC
Something like this: (assuming @lines as the array that has the input) `for my $line (@lines) { while(my ($find, $replace) = each %ids) { s/$find/$replace/g } }` [download]	[reply] [d/l]
Re^3: using hashes by kennethk (Abbot) on Sep 26, 2013 at 15:47 UTC
This should work, and is clear to read. While it is not optimally efficient, efficiency shouldn't be your concern at this stage. If this isn't working, you need to post more information about your actual script. Posting real input, expected output, and actual code (all wrapped in `<code>` tags) will greatly facilitate the debugging. As discussed in How do I post a question effectively?. #11929 First ask yourself `How would I do this without a computer?' Then have the computer do it the same way.	[reply] [d/l]
Re^4: using hashes by R56 (Sexton) on Sep 26, 2013 at 16:44 UTC
Re^5: using hashes by kennethk (Abbot) on Sep 26, 2013 at 17:44 UTC
Some notes below your chosen depth have not been shown here
Re^3: using hashes by arrestee (Novice) on Sep 26, 2013 at 15:58 UTC
For an effective solution to your problem, see BrowserUK's comment below. As to why the code you've shown doesn't work, it's probably because you're storing each line of your file/array in $line, but doing your substitution against `$_`. Try this: `$line =~ s/$find/$replace/g`.	[reply] [d/l] [select]
Re^4: using hashes by R56 (Sexton) on Sep 26, 2013 at 16:46 UTC

Don't ever iterate hash keys! (Well, hardly ever :)