Duplicate Words

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: Duplicate Words by Anonymous Monk on Apr 20, 2001 at 02:17 UTC
`s/(homework\s*){2}/homework /;`	[reply] [d/l]
Re: Re: Duplicate Words by bent (Scribe) on Apr 20, 2001 at 03:51 UTC
Hmm, What about: `s/(\b\w+?\b)\s+\1/$1/g;` [download] This will match "This is a sentence sentence." as well as "This is a sentence sentence to.". bent	[reply] [d/l]
Re: Re: Re: Duplicate Words by Anonymous Monk on Apr 20, 2001 at 17:57 UTC
If you want to remove dup words from anywhere in the string rather than just consecutive duplicates, try: `s/\w+\s/$words{$&}++?'':$&/ge;` [download] or to ignore case: `s/\w+\s/$words{"\L$&"}++?'':$&/ge;` [download]	[reply] [d/l] [select]
homework! (Re: Duplicate Words) by tye (Sage) on Apr 20, 2001 at 02:17 UTC
At least this homework assignment wasn't quite so obvious as the last time. - tye (but my friends call me "Tye")	[reply]
Re: Duplicate Words by jbert (Priest) on Apr 20, 2001 at 17:13 UTC
OK. It is homework, but some general comments on coding: You have code duplicated in both the 'if' and the 'else' part - this could and should be moved to the end of the loop. You have two variables (i and j), one of which is always one more than the value of the other. Your code would be cleaner if you only used 'i' and replaced 'j' with '(i+1)'. You are using a windows system, which pretends to prefer to use '\' to separate folders. The '\' character is special in many places, including inside strings marked with double quotes ". You can either use a single quotes ' to mark your string (which means you don't have to double your \\ to make them work) or (much better) you can take advantage of the fact that windows is happy to use '/' as a file seperator. i.e. `$File = "P:/K/kmartin/set1.txt";` [download] should work fine, and has the advantage of working on Unix or Windows boxes. Perl has lots of features to make code like this simpler. For example, reading an entire file into one variable doesn't require a loop (check the documentation for the '$/' variable in 'perlvar', you can loop through arrays without using 'i' or 'j' as index variables (check documentation for 'foreach' and 'shift,unshift,push and pop') Most importantly, perl is really a nice language built around an excellent regular expression engine. For the kinds of text processing you want to do, check the 'perlre' documentation. Its the "right way" to do this kind of job. Good luck with the coding. PS. Where I refer to 'perlvar', 'perlre' etc, these are some of the standard documentation which comes with perl. On Windows with Activestate perl, you can often find this in HTML format on the Start button/Programs/Activeperl, and on all systems you can type "perldoc perlvar" (or whatever) at a command prompt and get the information.	[reply] [d/l]
Re: Duplicate Words by orkysoft (Friar) on Apr 20, 2001 at 03:18 UTC
`while (<@Lines>) {` Cool, I didn't know you could do it that way as well. Still I think foreach is a lot less confusing.	[reply] [d/l]
Re: Re: Duplicate Words by ok (Beadle) on Apr 20, 2001 at 07:44 UTC
Would not `foreach` load all the lines into memory first?	[reply] [d/l]
Re: Re: Re: Duplicate Words by mullr (Sexton) on Apr 20, 2001 at 10:43 UTC
They're already in memory. (I'm pretty sure)	[reply]
Re: Duplicate Words by Anonymous Monk on Apr 20, 2001 at 17:08 UTC
How about changing the it to finish like this: `@Lines = split /\W+/, $Contents; While (<@Lines>) { $Contents ~= s/$_//; }` [download] and save the $Contents as your returning string. This is my guess, I'm totally new to Perl. 2001-04-20 Edit by Corion : Added CODE tags	[reply] [d/l]
Re: Duplicate Words by Anonymous Monk on Apr 20, 2001 at 22:31 UTC
The main thing that will help you here is to realize that you can use parentheses around the part of the regular expression that matches any word, and then use "\1" in the same regular expression to test for the immediate second occurrence of that word. Actual code of course is the assignment. If you tell us when it's due, maybe we'll post solutions (probably pithier than teacher's :-) afterwards.	[reply]
Re: Duplicate Words by Anonymous Monk on Apr 20, 2001 at 14:06 UTC
`#!/usr/local/bin/perl $var="word1 word2 word word word3 someth word3"; print $var."\n"; @words=split(/ /,$var); $len=@words; $var=""; for ($i=0; $i<=$len-1; $i++) { if ($words[$i] eq $words[$i+1]) {print $words[$i]," detected\n"} else {$var.=@words[$i]." "} } print $var."\n";` [download]	[reply] [d/l]
Re: Duplicate Words by chorg (Monk) on Apr 20, 2001 at 18:40 UTC
I believe that the new Camel has this problem as one of the examples - either chapter 2 or the chapter 5... _______________________________________________ "Intelligence is a tool used achieve goals, however goals are not always chosen wisely..."	[reply]
Re: Duplicate Words by Sprad (Hermit) on Apr 20, 2001 at 19:53 UTC
One thing to consider is the fact that that particular course of action might not be correct in all cases. Note the completely proper and desired usage of "that that" in the previous sentence. But then you're getting into grammar checking, and that's probably beyond the scope of your class. --- I'm too sexy for my .sig.	[reply]
Re: Re: Duplicate Words by Ay_Bee (Monk) on Apr 22, 2001 at 08:49 UTC
`For example :- Had had had "had had", had Had had had "had" Had would have been corre +ct. The above was a comment about the grammar of an essay written by an author named Had. My memory concerns me - but I forget why !!!` [download]	[reply] [d/l]
A reply falls below the community's threshold of quality. You may see it by logging in.