perladdict has asked for the wisdom of the Perl Monks concerning the following question:

Hi Monks,

I am tring to find the duplicates words in a paragraph.

#!/usr/bin/perl -w $/=" "; while(<>) { while($_=~m/\b(\w+\b)(\s+\1)\b/xig) { print "dup words '$1' occurs at paragraph $.\n"; } }

After running this stuff,i unable to get the expected result.so,pls suggest me to get the correct result.

Code tags added by GrandFather

Replies are listed 'Best First'.
Re: Finding duplicate words in a paragraph
by Zaxo (Archbishop) on Jun 05, 2006 at 11:58 UTC

    It's easier to split to do that. When you want to handle uniqueness or duplication, think of using a hash.

    my %hash; while (<>) { for (split) { print "dup word '$_' occurs at line $.\n" if $hash{$_}++; } }

    You can make that $hash{+lc}++ if you want to ignore case.

    After Compline,
    Zaxo

Re: Finding duplicate words in a paragraph
by liverpole (Monsignor) on Jun 05, 2006 at 11:37 UTC
    Hi perladdict,

    First of all, wrapping your code in "codetags" means that you need to start with "<code>", NOT with "<here is my code>".  You'll get more people who are willing to take a shot at helping you if they can read the code easily from the beginnning.

    Secondly, what is the "expected result"?  We have no way for sure of knowing unless you tell us that up front!


    s''(q.S:$/9=(T1';s;(..)(..);$..=substr+crypt($1,$2),2,3;eg;print$..$/
Re: Finding duplicate words in a paragraph
by msk_0984 (Friar) on Jun 05, 2006 at 11:36 UTC

    Hi i feel we can do it in this way i have used my own example which is in the cookbook to explian it.

    To generalise it better to find the N th match in a string, not just the first one. For example, you'd like to find the word preceding the third occurrence of "fish":

    One fish two fish red fish blue fish

    Solution

    Use the /g modifier in a while loop, keeping count of matches:

    $WANT = 3; $count = 0; while (/(\w+)\s+fish\b/gi) { if (++$count == $WANT) { print "The third fish is a $1 one.\n"; # Warning: don't `last' out of this loop } }
    The third fish is a red one.

    Or use a repetition count and repeated pattern like this:

    /(?:\w+\s+fish\s+){2}(\w+)\s+fish/i;

    Code tags added by GrandFather

Re: Finding duplicate words in a paragraph
by ciderpunx (Vicar) on Jun 05, 2006 at 12:36 UTC
    howzabout:
    #!/usr/bin/perl -w use strict; my @para = qw|this is my paragraph. this is not all that interesting. +hey ho|; my %hash = (); print "non-unique words: \n"; ## the interesting bit ## map {$hash{$_}++} @para; for (keys %hash){print "$_ " if($hash{$_}>1)}; ######################### print "\n";
    gives:
    $ perl tmp.pl non-unique words: is, this, $
Re: Finding duplicate words in a paragraph
by japhy (Canon) on Jun 05, 2006 at 13:43 UTC
    It looks like you mean you want to find a word that is followed by itself in a paragraph, right? If so, I think the only real error in your code is $/ = " ", which should be $/ = "". That is no, the empty string, not a string with a space in it.

    Jeff japhy Pinyan, P.L., P.M., P.O.D, X.S.: Perl, regex, and perl hacker
    How can we ever be the sold short or the cheated, we who for every service have long ago been overpaid? ~~ Meister Eckhart