extract sentences with certain number of a character

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Replies are listed 'Best First'.

Re: extract sentences with certain number of a character
by mscharrer (Hermit) on Apr 27, 2008 at 09:32 UTC

tr/ / /

tr/abc/efg/

tr/r/r/

tr/r//

E.g.:

while (my $line = <FILE>) {
   if ( ($line =~ tr/r//) == 4 ) {
       print $line;
   }
}
[download]

[reply]
[d/l]
[select]

Re: extract sentences with certain number of a character
by linuxer (Curate) on Apr 27, 2008 at 09:23 UTC

You shouldn't match for any character (.) but for any non-r character ( [^r] ) between your wanted 'r'.

Added: Additionally you should use anchors in your regex, otherwise it will match any line, in which your regex can match 4 r, even if there are more r's in that line.

[reply]

Re^2: extract sentences with certain number of a character

by olus (Curate) on Apr 27, 2008 at 10:03 UTC

code:

/^[^r]*r[^r]*r[^r]*r[^r]*r[^r]*$/

[reply]
[d/l]

Re^3: extract sentences with certain number of a character

by Jenda (Abbot) on Apr 27, 2008 at 11:51 UTC

/^(?:[^r]*r){4}[^r]*$/

Jenda
Support Denmark!
Defend the free world!

[reply]
[d/l]

Re: extract sentences with certain number of a character
by ww (Archbishop) on Apr 27, 2008 at 14:20 UTC

Or (this horse ain't quite dead, yet),

#!/usr/bin/perl
use strict;
use warnings;

my(@lines, $line);

print "\n\tUsing tr\n";

@lines = <DATA>;

for $line(@lines) {
  chomp $line;
  my $line1 = $line;
  if ( ($line1 =~ tr/[r|R]/r/) == 4 ) {   # Case insensitive
        print "$line1\n";
  }
}

print "\n\t using match:\n";
for $line(@lines) {
    if ( $line =~ /
           ^          # start at beginning of line
           ([^rR]*)   # 0 or more non-r chars
           r          # match "r"
           ([^rR]*)   # negative lookahead, 0 or more non-r chars
           r
           ([^rR]*)   # alternately, could be ?=[^r]*
           r
           ([^rR]*) 
           r
           ([^rR]*) 
           $          # end of line
           /ix) {     # extended syntax; end of match; end of conditio
+n
        print "$line\n"
    }
}
print "\n\t using match2:\n";
for $line(@lines) {
    if ( $line =~ /^([^r]*|[^R]*)r([^r]*|[^R]*)r([^r]*|[^R]*)r([^r]*|[
+^R]*)r([^r]*|[^R]*)$/i) {
        print "$line\n";
    }
}
print "\n and the data is:\n";
for $line(@lines) {
    print $line . "\n";
}
print "\nDone\n";


__DATA__
There are four "r"s in this sentence. 4
This one has how many? 0
None in this. 0
But where can there be as many words with 'r's as there are here? 7
Still, rrrr makes no sense. 4
Drill for sentences with multiple instances of "are" and "were" regula
+rly. 5
There are four "r"s in this sentence. 4
This one has how many? 0
None in this. 0
But where can there be as many words with 'r's as there are here? 7
Still, rrrr makes no sense. 4
Drill for sentences with multiple instances of "are" and "were" regula
+rly. 5
Argh. Right you are, Randy! 4 matches if insensitive (but only 2 match
+ when case sensitive).
[download]

output

        Using tr
There are four "r"s in this sentence. 4
Still, rrrr makes no sense. 4
There are four "r"s in this sentence. 4
Still, rrrr makes no sense. 4
Argh. right you are, randy! 4 matches if insensitive but only 2 match 
+when case sensitive.

         using match:
There are four "r"s in this sentence. 4
Still, rrrr makes no sense. 4
There are four "r"s in this sentence. 4
Still, rrrr makes no sense. 4
Argh. Right you are, Randy! 4 matches if insensitive but only 2 match 
+when case sensitive.

         using match2:
There are four "r"s in this sentence. 4
Still, rrrr makes no sense. 4
There are four "r"s in this sentence. 4
Still, rrrr makes no sense. 4
Argh. Right you are, Randy! 4 matches if insensitive but only 2 match 
+when case sensitive.

 and the data is: ....
[download]

Re-updated; Found ~~copied wrong~~ right code. duh!

And, of course, this being Perl, there are many other ways, too. One could (if one wished to besmirch the virtue of laziness, capture each [rR], push them to an array, and count the elements. But bottom line: tr/r/r/ (preserving the original data, case sensitive) or tr/[r|R]/r/ (counting the upper case "R"s, but losing the original case) may be preferred.

[reply]
[d/l]
[select]

Re^2: extract sentences with certain number of a character

by linuxer (Curate) on Apr 28, 2008 at 16:20 UTC

If I remember correctly (perldoc perlop):

tr/// does not support character classes, so tr/[r|R]// does count any of the characters [, r, |, R and ].

And character classes don't need | for the alternatives; so [rR] would be enough for a class matching r and R.

tr/rR// should be enough to count the occurences of r's and R's.

[reply]

Re: extract sentences with certain number of a character
by FunkyMonk (Bishop) on Apr 27, 2008 at 09:20 UTC

~~Why don't you show us what you've tried so far, or ask us for help on what you're stuck with?~~

Have you read perlretut?

Update: It's obviously too early for me. Sorry, AnonyMonk

Update^2: Have a look Using character classes particularly the bit about negated character classes.

Unless I state otherwise, my code all runs with strict and warnings

[reply]

Re: extract sentences with certain number of a character
by toolic (Bishop) on Apr 27, 2008 at 18:18 UTC

When I hear "count characters", grep in scalar context is usually the first thing to enter my brain.

use warnings;
use strict;

while (<DATA>) {
    print if ((grep /r/, split //) == 4);
}

__DATA__
r 3 r r
rr
4 r r r r
nada nothing zilch
to many rr rrrr rrr 's
[download]

Not sure how this compares to others' regex and tr solution's from a performance standpoint (and I'm too Lazy to benchmark it :)

[reply]
[d/l]