in reply to Match only certain characters?

When I first looked at this, I came up with something similar to what Cristoforo did.

#!/usr/bin/perl -w use strict; my @strings = ('ABCAAAABBBCCCCCAAABBBCCC', 'ASRGRTGRT89579843rrrrrr', 'A98797BqrtoiquyrtoCafdgagfd'); + foreach (@strings) { # Here the scalar count value of tr is used # tr is very fast but lacks the flexibility of # regex # prints the string unless it has some char that is # not an A, B, C. print "$_\n" if !(tr/ABC//c); } # Prints: ABCAAAABBBCCCCCAAABBBCCC
This is a great idea if you know that ABC are the things in advance.

If we have a "standard" string by which others will be compared, and that string is a variable. We we are going compare that "standard" string against many lines, and create a string with the unique chars in the "standard" string and use that string in a simple regex to see if they are there. No fancy look ahead required.

my $standard='ABCAAAABBBCCCCCAAABBBCCC'; my %seen; my @unique_letters = grep{$seen{$_}++ <1 } split(//,$standard); my $unique_letters = join("",@unique_letters); # these above two lines could be combined, but I think it # reads better this way, and YES, it is completely legal in Perl # to have an array variable named unique_letters and a # string named the same thing. Update: with same name not same "thing" +. foreach (@strings) { print "$_\n" if (/[^$unique_letters]/); } #prints: #ASRGRTGRT89579843rrrrrr #A98797BqrtoiquyrtoCafdgagfd #change to "not" of these: (!/[^$unique_letters]/); #to get: ABCAAAABBBCCCCCAAABBBCCC

Replies are listed 'Best First'.
Re^2: Match only certain characters?
by AnomalousMonk (Archbishop) on Nov 09, 2009 at 07:23 UTC
    The steps taken to 'uniqify' the characters of the character set are not necessary; repeated characters in a regex character set have no effect on pattern recognition. (I also think repeated characters make no difference in the execution time of the regex, but I cannot come up with a reference on this at the moment.) However, repeated characters do seem to take up space in the regex object.
    >perl -wMstrict -le "my $standard = 'ABCAAAABBBCCCCCAAABBBCCC'; foreach (@ARGV) { my $non_standard = /[^$standard]/; print qq{'$_' }, $non_standard ? 'no match' : 'match'; } " "" A B C ABC ABCABCABC xA Ax xABC ABCx '' match 'A' match 'B' match 'C' match 'ABC' match 'ABCABCABC' match 'xA' no match 'Ax' no match 'xABC' no match 'ABCx' no match