counting the number of occurrances of a word using regex

abhishes has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.
Re: counting the number of occurrances of a word using regex by Ovid (Cardinal) on Dec 03, 2002 at 06:10 UTC
And in the spirit of TIMTOWTDI: `my $string = "The quick brown fox jumps over the lazy dog."; my $count = 0; $count++ while $string =~ /(the)/gi;` [download] This is useful if, for example, you want to operate on a particular occurence of a match. For example, if you wanted to grab the second occurence of particular item in every line of log file: `while (<IN_FILE>) { my $count = 0; MATCH: while (/(\Q$some_string\E)/gi) { $count++; if ( 2 == $count ) { # do something last MATCH; } } }` [download] Cheers, Ovid New address of my CGI Course. Silence is Evil (feel free to copy and distribute widely - note copyright text)	[reply] [d/l] [select]
Re: counting the number of occurrances of a word using regex by Abigail-II (Bishop) on Dec 03, 2002 at 12:40 UTC
There's no point in putting parenthesis around the search pattern, is there? I find it remarkably that 3 out of the 5 followups so far put parenthesis around the search pattern, without using `$1`. Abigail	[reply] [d/l]
Re: Re: counting the number of occurrances of a word using regex by Ovid (Cardinal) on Dec 03, 2002 at 16:46 UTC
You're right. I was just a-cuttin' and a-pastin' and not paying attention. That's a very bad habit of mine. Thanks for the reminder. Cheers, Ovid New address of my CGI Course. Silence is Evil (feel free to copy and distribute widely - note copyright text)	[reply]
Re: counting the number of occurrances of a word using regex by BrowserUk (Patriarch) on Dec 03, 2002 at 06:06 UTC
Your problem is that tr/// is not a regex operator in the true sense. It always operates character by character. What you have actually asked perl to do with the line `my $count = ($str =~ tr/(the)//);` Inspect the variable $str, Look for any of the characters '(', 't', 'h', 'e',')', and if it finds them, as the replacement list is empty, just count them. return a count of the number of characters found in $str that were in the searchlist. One way to count the occurances of a given word in a string would be to use the m// operator with the /g option and force a list context as in `my $str = "The quick brown fox jumps over the lazy dog"; my $count = () = $str =~ m/(the)/ig; print $count;` [download] will print 2. However, that is still not quite right as using it on the string 'There are three theatres in the town' and it will print 3! This is because the regex /(the)/ will also match the first 3 chars of 'There' and 'theatre'. To ensure that you will only match whole words you can bracket the work with \b - 'word boundary zero-width assertions' like this `my $str = "There are three theatres in the town"; my $count = () = $str =~ m/(\bthe\b)/ig; print $count;` [download] which will correctly print 1. `my $str = "The quick brown fox jumps over the lazy dog"; my $count = () = $str =~ m/(\bthe\b)/ig; print $count;` [download] which will correctly print 2. (Note: the /i modifier to make the match case independant.) Okay you lot, get your wings on the left, halos on the right. It's one size fits all, and "No!", you can't have a different color. Pick up your cloud down the end and "Yes" if you get allocated a grey one they *are* a bit damp under foot, but someone has to get them. Get used to the wings fast cos its an 8 hour day...unless the Govenor calls for a cyclone or hurricane, in which case 16 hour shifts are mandatory. Just be grateful that you arrived just as the tornado season finished. Them buggers are real work.	[reply] [d/l] [select]
Re: counting the number of occurrances of a word using regex by graff (Chancellor) on Dec 03, 2002 at 05:40 UTC
The "tr///" operator only works on individual characters, not on strings. To do what you want, you need to evaluate the "m//" operator in a list (actually, array) context, and then count the elements in the resulting array, thusly: `my $str = "The quick brown fox jumped over the lazy dog"; my @the = ( $str =~ /\bthe\b/gi ); print scalar @the, $/;` [download] That prints "2". update: It would be clearer (I hope) to say that "tr///" treats every character in the left-hand side as a member of a character class; it cannot treat any sequence of characters as a contiguous string to be matched; only the "m//" operator (and the "s///" operator) can do that. Look carefully at the "perlop" man page for more complete descriptions of these three operators.	[reply] [d/l]
Re: counting the number of occurrances of a word using regex by Enlil (Parson) on Dec 03, 2002 at 05:52 UTC
tr does not do what you think it is doing. It does not even start up the regex engine (so you are not even using a regular expression). What is does is transliteration (exchanges each occurance of a character in the searchlist with the corresponding character from the replacement list. (i.e. `tr/SEARCHLIST/REPLACEMENTLIST/`). You are probably looking for something more along these lines: `use warnings; use strict; my $str = "The quick brown fox jumps over the lazy dog"; my $count = ($str =~ s/(the)/$1/gi); print "$count\n";` [download] Though I am sure there are more elegant ways to do this. The 6 comes from: 1. h in The 2. e in The 3. e in over (4,5,6) t,h,e in the which are characters occuring in the Replacement list. -enlil	[reply] [d/l] [select]
Re: counting the number of occurrances of a word using regex by djantzen (Priest) on Dec 03, 2002 at 05:41 UTC
Probably `tr` isn't what you want here, and there's no need to capture the word, so no parentheses. A simple solution is: `my @count = $str =~ /\bthe\b/gi; print scalar @count, "\n";` [download] The idea is that `g` searches globally while `i` makes the search case-insensitive. Assigning the results to an array means that the `@count` is populated with all of the successful matches. Placing the array in scalar context will return the number of matches.	[reply] [d/l] [select]