Finding duplicate words in a paragraph

perladdict has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Finding duplicate words in a paragraph by Zaxo (Archbishop) on Jun 05, 2006 at 11:58 UTC
It's easier to split to do that. When you want to handle uniqueness or duplication, think of using a hash. `my %hash; while (<>) { for (split) { print "dup word '$_' occurs at line $.\n" if $hash{$_}++; } }` [download] You can make that `$hash{+lc}++` if you want to ignore case. After Compline, Zaxo	[reply] [d/l] [select]
Re: Finding duplicate words in a paragraph by liverpole (Monsignor) on Jun 05, 2006 at 11:37 UTC
Hi perladdict, First of all, wrapping your code in "codetags" means that you need to start with "<code>", NOT with "<here is my code>". You'll get more people who are willing to take a shot at helping you if they can read the code easily from the beginnning. Secondly, what is the "expected result"? We have no way for sure of knowing unless you tell us that up front! s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/	[reply]
Re: Finding duplicate words in a paragraph by msk_0984 (Friar) on Jun 05, 2006 at 11:36 UTC
Hi i feel we can do it in this way i have used my own example which is in the cookbook to explian it. To generalise it better to find the N th match in a string, not just the first one. For example, you'd like to find the word preceding the third occurrence of "fish": `One fish two fish red fish blue fish` [download] Solution Use the /g modifier in a while loop, keeping count of matches: $WANT = 3; $count = 0; while (/(\w+)\s+fish\b/gi) { if (++$count == $WANT) { print "The third fish is a $1 one.\n"; # Warning: don't `last' out of this loop } } [download] `The third fish is a red one.` Or use a repetition count and repeated pattern like this: `/(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;` [download] Code tags added by GrandFather	[reply] [d/l] [select]
Re: Finding duplicate words in a paragraph by ciderpunx (Vicar) on Jun 05, 2006 at 12:36 UTC
howzabout: `#!/usr/bin/perl -w use strict; my @para = qw\|this is my paragraph. this is not all that interesting. +hey ho\|; my %hash = (); print "non-unique words: \n"; ## the interesting bit ## map {$hash{$_}++} @para; for (keys %hash){print "$_ " if($hash{$_}>1)}; ######################### print "\n";` [download] gives: $ perl tmp.pl non-unique words: is, this, $ [download] -- charlieharvey.org.uk	[reply] [d/l] [select]
Re: Finding duplicate words in a paragraph by japhy (Canon) on Jun 05, 2006 at 13:43 UTC
It looks like you mean you want to find a word that is followed by itself in a paragraph, right? If so, I think the only real error in your code is `$/ = " "`, which should be `$/ = ""`. That is no, the empty string, not a string with a space in it. Jeff `japhy` Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and `perl` hacker How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart	[reply] [d/l] [select]