Word frequency in an array

monkeybus has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Word frequency in an array by FunkyMonk (Bishop) on Jun 10, 2007 at 13:51 UTC
What is the best way to count how many times the word "foo" occurs in @array? If you only want to count occurances, use grep in scalar context `my $count_foo = grep (/foo/, @array);` [download]	[reply] [d/l]
Re^2: Word frequency in an array by McDarren (Abbot) on Jun 10, 2007 at 16:33 UTC
Lets use that in an example... `my @array = qw(foo bar food foofighters kung-foo); my $count_foo = grep (/foo/, @array); print "$count_foo\n";` [download] Prints "4" - which may or may not be what the OP was after (I suspect not). `my $count_foo = grep { $_ eq 'foo' } @array;` [download] ..might be more what the OP was looking for. Cheers, Darren :)	[reply] [d/l] [select]
Re: Word frequency in an array by moritz (Cardinal) on Jun 10, 2007 at 13:46 UTC
If you have multiple accesses, build a hash: my %f; for (@array) { $f{$_}++; } That doesn't work if you are interested in substrings or array elements. Perl 6 in German	[reply]
Re: Word frequency in an array by McDarren (Abbot) on Jun 10, 2007 at 16:47 UTC
Hang on, I figured it out. `@fool = grep (/foo/, @array);` Are you sure? What you have done there is extracted all the elements of @array that contain the string 'foo', and placed them in another array called @fool. If you want to know how many there are, you still need to do something with @fool. Of course, this is as simple as: `my $count_foo = scalar @fool;` [download] But there is no real need for the intermediate array, so you could have just done: `my $count_foo = grep (/foo/, @array);` [download] But wait, are you sure that is what you really want? Consider the following example array - how many matches would you expect? `my @array = qw(foo bar food foofighters kung-foo);` [download] If the answer is only one, then you need to use something like this: `my $count_foo = grep { $_ eq 'foo' } @array;` [download] Cheers, Darren :)	[reply] [d/l] [select]
Re: Word frequency in an array by blazar (Canon) on Jun 10, 2007 at 22:03 UTC
Hang on, I figured it out. `@fool = grep (/foo/, @array);` FWIW you may rewrite that as `@fool = "@array" =~ /foo/g;` [download] But I suspect you really want `my $fool = grep $_ eq 'foo', @array;` [download]	[reply] [d/l] [select]
Re: Word frequency in an array by cool (Scribe) on Jun 10, 2007 at 15:28 UTC
Its just avoiding grep; again a trivial soln. `#! /usr/bin/perl use strict; use warnings; my $x=1; my @arr= qw(foo cho roh foo kho foo moo foo); foreach(@arr){print $x++ if (/foo/)}` [download] But it can be done using spl variable of reg ex also, if I am right?? Any takers? `#! /usr/bin/perl use strict; use warnings; my $x=1; my @arr= qw(foo cho roh foo kho foo moo foo); my $str=join ' ',@arr; $str=~ /foo/; print $&; #### In place of $&; we can use that for no #### for no. of matches.` [download]	[reply] [d/l] [select]
Re^2: Word frequency in an array by davido (Cardinal) on Jun 10, 2007 at 16:20 UTC
Ok, your solutions: The first one is less than optimal. First, you're starting with $x = 1, which means that after the loop terminates $x will overstate the count by one. Why not start with $x = 0, and then pre-increment instead of post-incrementing $x? In other words, ++$x, instead of $x++. The next issue is the regexp you used. It will match just about anything containing "foo", including "foolish". Is that intentional? Maybe `/^foo$/` would be better, or perhaps `/\bfoo\b/`. And the last thing to mention is the use of print within the loop. You're printing on each iteration, which creates an IO bottleneck, plus a lot of clutter. If $x started at zero, you could print after the loop terminates. Your second solution goes to a lot of extra work and memory inefficiency by creating $str as a temporary stringified version of @arr. And the other problem is that $& only shows the actual most recent match, not some count of the number of possible times the regular expression could have matched. Don't use a special variable, use this: `my $count = () = $str =~ m/\bfoo\b/g;` [download] But I still feel it's a bad solution because you're creating a temporary string unnecessarily. The grep solution is probably the best for a one-time count. The hash solution is probably better if you're doing the count several times, but it does have two problems: you're still creating the temporary copy (the hash), and the creation of a hash is a more computationally expensive operation than running through the array one time counting, as is done in the grep method. One other thing: "utioecia". There you go; the keystrokes you saved by abbreviating "special" and "solution." You can cut and paste them into your future posts so that you can retain clarity without wasting those eight keystrokes. ;) Dave	[reply] [d/l] [select]
Re^3: Word frequency in an array by blazar (Canon) on Jun 11, 2007 at 17:06 UTC
Why not start with `$x = 0`, and then pre-increment instead of post-incrementing `$x`? In other words, `++$x`, instead of `$x++`. Indeed. In fact it's also worth reminding incidentally that {pre,post}-{increment,decrement} behave intelligently by first of all not complaining under warnings and, in the case of post-ones, to "coerce to numeric value", that is, to return `0`: `errol:~ [19:01:32]$ perl -wMstrict -le 'my $x; print map $x++, (1) x 3 +' 012` [download]	[reply] [d/l] [select]
Re^3: Word frequency in an array by cool (Scribe) on Jun 10, 2007 at 19:03 UTC
Hi Dave, Thanks for giving insight of the solutions and giving me prototypes to copy and paste ;) And the other problem is that $& only shows the actual most recent match, not some count of the number of possible times Actually that is what I mentioned pl read comments in `#! /usr/bin/perl use strict; use warnings; my $x=1; my @arr= qw(foo cho roh foo kho foo moo foo); my $str=join ' ',@arr; $str=~ /foo/; print $&; #### In place of $&; we can use that for no #### for no. of matches.` [download] Now, I posted this piece to get suggestion from people, what in regular expression can be used (in place of $&) But it can be done using spl variable of reg ex also, if I am right?? Any takers? to count the no of matches in one go using special variable, if there is any!! and I think I encountered that somewhere!	[reply] [d/l]
Re^2: Word frequency in an array by blazar (Canon) on Jun 11, 2007 at 17:29 UTC
Its just avoiding grep; again a trivial soln. cool, I know that u r c00l and all, but could u plz stop im-talking? Many find it plain annoying and it hinders communication between people here. It is appropriate where it is appropriate, that is in IMs et simila, in which case your primary goal is speed and not clarity. But here it's just the opposite. `my $x=1; foreach(@arr){print $x++ if (/foo/)}` [download] Initialization and "efficiency" issues apart, which were duly pointed out by davido, just reasoning solely in terms of user interaction, what benefit could come of incrementally printing the counter at each iteration? You're only interested in the final value anyway. Not to mention the `/foo/` gotcha mentioned several times in this thread. But it can be done using spl variable of reg ex also, if I am right?? Any takers? cool, I know that u r c00l and all, but could u plz stop im-talking? `my $str=join ' ',@arr;` [download] That is just like `my $str="@arr";` that is, unless you've changed `$"`. And if you haven't then it's a very convenient idiom. If you have, then you should have done so locally in a block anyway, unless yours is a very very special situation. `$str=~ /foo/;` [download] That is just like `"@arr" =~ /foo/;` no need for an intermediate variable. `print $&; #### In place of $&; we can use that for no #### for no. of matches.` [download] I understand what you mean, but: your match is not a global one (you have to use the `/g` modifier for that), so the number of matches will always be at most one; no, there's not such a variable and no need for it, since a match in global context will return the list of all the matches (or of all the captures if capturing parens are there) and one can use that list in scalar context to get the number of them.	[reply] [d/l] [select]