Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

Is there a library (any language) that provides matching on multisets like Perl regex work on strings? That is to say, I provide a declarative notation of what I want to match in a multiset of characters (which is unordered as opposed to an ordered string of characters), and the engine goes and tries to find a solution without me needing to specify a concrete algorithm.

Replies are listed 'Best First'.
Re: match something else than strings
by BrowserUk (Patriarch) on Jul 21, 2011 at 15:59 UTC

    What is a "multiset of characters"? How about you give an example or two of inputs and desired outputs?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.
Re: match something else than strings
by davido (Cardinal) on Jul 21, 2011 at 16:30 UTC

    This post is seeking clarification:

    Are you trying to do something like this?:

    my $chars = q{abcde}; # Specify the characters. my $matchset = "[$chars]"; # Set up a character class. my $reg_match = qr{$matchset}; # Turn it into a regexp. my @strings = qw/ apple logs frog elmo /; foreach my $string ( @strings ) { print "$_ matched\n" if $string =~ $reg_match; } __END__ apple elmo

    What I'm getting at is are you just looking for a way to check if one of a bunch of wanted characters are found, all in one pass? If that's the case, a character set might be all that is needed. If your tokens are larger than one character you could use alternation. Or you could construct a list of regular expressions, all held in an array, and then do a ~~ smart match with the target string on the left side and the match list on the right. That will give an "any" relationship (the string on the left matches any of the expressions on the right).

    Or are you looking for something completely different, in which case (speaking only for myself here), I would need a little more explanation of the problem.

    Update:...Or...

    Are you looking to ensure that the item on the left only contains elements of the multiset specified on the right (and nothing else)? Going back to my earlier example:

    my $matchset = "[^$chars]"; # <-- Changed to a negated char class. # .... foreach my $string ( @strings ) { print "$_ is pure" if not $string =~ $reg_match; # Disqualify stri +ngs with illegal chars. }

    Update2:Now to add the notion of uniqueness, you might do this:

    That's a real rough draft that strives for explicitness and simplicity rather than cleverness. It takes a set of characters and turns them into a negated character class. This will be used to test if a target string contains any non-set characters. Next it normalizes the string (alphabetizes the string's characters). It will check if the alphabetized or normalized string is unique or not. If it is unique, it can add the string to its list. Entries may be removed, though my quick test script doesn't exercise that option. In the end you can list all unique strings. Their original character order isn't preserved.

    The output is as follows:

    I only spent a couple minutes skimming Multiset, so hopefully what I've provided can serve as a starting point from which a more exacting solution may be derived. In particular, it would be pretty easy to change to handle sets of numbers (hash keys would be like $hash{123,2,4}). For this to work you would have to also modify your regular expression match such that instead of using a character class it uses alternation. You would still need to sort elements to normalize them, but would insert a comma between each element.


    Dave