comment on

This post is seeking clarification:

Are you trying to do something like this?:

my $chars = q{abcde};           # Specify the characters.
my $matchset = "[$chars]";      # Set up a character class.
my $reg_match = qr{$matchset};  # Turn it into a regexp.
my @strings = qw/ apple logs frog elmo /;
foreach my $string ( @strings ) {
    print "$_ matched\n" if $string =~ $reg_match;
}
__END__
apple
elmo
[download]

What I'm getting at is are you just looking for a way to check if one of a bunch of wanted characters are found, all in one pass? If that's the case, a character set might be all that is needed. If your tokens are larger than one character you could use alternation. Or you could construct a list of regular expressions, all held in an array, and then do a ~~ smart match with the target string on the left side and the match list on the right. That will give an "any" relationship (the string on the left matches any of the expressions on the right).

Or are you looking for something completely different, in which case (speaking only for myself here), I would need a little more explanation of the problem.

Update:...Or...

Are you looking to ensure that the item on the left only contains elements of the multiset specified on the right (and nothing else)? Going back to my earlier example:


my $matchset = "[^$chars]"; # <-- Changed to a negated char class.
# ....
foreach my $string ( @strings ) {
    print "$_ is pure" if not $string =~ $reg_match; # Disqualify stri
+ngs with illegal chars.
}
[download]

Update2:Now to add the notion of uniqueness, you might do this:

package Multiset;

use strict;
use warnings;

sub new {
    my $class = shift;
    my %self;
    $self{set_chars} = shift;
    my $neg_char_class = "[^$self{set_chars}]";
    $self{set_regexp} = qr/$neg_char_class/;
    $self{entries} = {};
    bless \%self, $class;
}

sub is_valid {
    my $self = shift;
    my $string = shift;
    return 0 if $string =~ $self->{set_regexp}; # Contains bad chars.
    return 1;
}

sub _normalize {
    my $self = shift;
    my $string = shift;
    return( join '', sort split //, $string );
}

sub is_unique {
    my $self = shift;
    my $string = shift;
    my $normalized = _normalize( $self, $string );
    return 0 if exists $self->{entries}{$normalized};
    return 1;
}

sub add_entry {
    my $self = shift;
    my $string = shift;
    die "Invalid entry." unless is_valid( $self, $string );
    my $entry = _normalize( $self, $string );
    return 0 unless is_unique( $self, $entry );
    $self->{entries}{$entry}++;
}

sub remove_entry {
    my $self = shift;
    my $string = shift;
    my $entry = _normalize( $self, $string );
    die "Invalid entry." unless is_valid( $self, $entry );
    die "Entry doesn't exist." if not is_unique( $self, $entry );
    delete $self->{entries}{$entry};
    return 1;
}

sub list_seen {
    my $self = shift;
    return keys $self->{entries};
}

1;

package main;

use strict;
use warnings;

my @strings = qw/ ab ac ca bb dc abc cba /;
my $set = q/abcde/;

my $multi = Multiset->new( $set );

foreach my $string ( @strings ) {
    print "$string\n";
    if( $multi->is_valid( $string ) ) {
        print "\tis valid.\n";
    } else {
        print "\tis not valid.\n";
        next;
    }
    if( $multi->is_unique( $string ) ) {
        print "\tis unique.\n";
        $multi->add_entry( $string );
    } else {
        print "\tis not unique.\n";
    }
}

my @found = $multi->list_seen();
print "Entries:\n\t@found\n";
[download]

That's a real rough draft that strives for explicitness and simplicity rather than cleverness. It takes a set of characters and turns them into a negated character class. This will be used to test if a target string contains any non-set characters. Next it normalizes the string (alphabetizes the string's characters). It will check if the alphabetized or normalized string is unique or not. If it is unique, it can add the string to its list. Entries may be removed, though my quick test script doesn't exercise that option. In the end you can list all unique strings. Their original character order isn't preserved.

The output is as follows:

ab
    is valid.
    is unique.
ac
    is valid.
    is unique.
ca
    is valid.
    is not unique.
bb
    is valid.
    is unique.
dc
    is valid.
    is unique.
abc
    is valid.
    is unique.
cba
    is valid.
    is not unique.
Entries:
    ab cd abc bb ac
[download]

I only spent a couple minutes skimming Multiset, so hopefully what I've provided can serve as a starting point from which a more exacting solution may be derived. In particular, it would be pretty easy to change to handle sets of numbers (hash keys would be like $hash{123,2,4}). For this to work you would have to also modify your regular expression match such that instead of using a character class it uses alternation. You would still need to sort elements to normalize them, but would insert a comma between each element.

Dave

In reply to Re: match something else than strings by davido
in thread match something else than strings by Anonymous Monk

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.