Matching permutations with regex

QM has asked for the wisdom of the Perl Monks concerning the following question:

Today I learned that a regex can match permutations of a given string.

Update: Oops, I had the terms reversed on the regex, it should be $i =~ $x .... I've corrected it, but it doesn't really change the results.

For single characters without repeats, such as any permutation of 'abc', this seems the shortest, most readable method:

@x = glob('{a,b,c}'x3); # for testing: all permutations, plus many mor
+e

$x = '(?:([a-c])(?!.*\1)){3}'; # regex

for $i (@x) {
    $i =~ $x and say $i; # was $x =~ $i
}
[download]

which outputs:

abc
acb
bac
bca
cab
cba
[download]

For single character, with limited repeats, such as any permutation of 'aabbcc', a slightly different approach is needed. I tried the following, but it matches almost everything. I suspect there's some unexpected behavior in the backtracking?

@x = glob('{a,b,c}'x6);
$x = '(?:a()|a()|b()|b()|c()|c()){6}\1\2\3\4\5\6'; # limited repeats a
+llowed

for $i (@x) {
    $i =~ $x and say $i; # was $x =~ $i
}
[download]

which outputs 540 of the 728 strings given (at least 1 of each char is present):

aaaabc
aaaacb
aaabac
aaabbc
aaabca
...
cccbba
cccbca
ccccab
ccccba
[download]

Is there some other magic to DWIM?

For nonrepeating, multichar permutations, even when individual chars are shared between tokens, this approach works:

@x = glob('{abc,bcd,cde}'x3);
# permutations of 'abc/bcd/cde'
$x = '(?:abc()|bcd()|cde()){3}\1\2\3';

for $i (@x) {
    $i =~ $x and say $i; # was $x =~ $i
}
[download]

which outputs:

abcbcdcde
abccdebcd
bcdabccde
bcdcdeabc
cdeabcbcd
cdebcdabc
[download]

I found this here on Stack Overflow.

-QM
--
Quantum Mechanics: The dreams stuff is made of

Comment on Matching permutations with regex Select or Download Code

Replies are listed 'Best First'.
Re: Matching permutations with regex by Eily (Monsignor) on Nov 20, 2018 at 14:01 UTC
Wow, I didn't know the empty capture and backreference trick, nice! I also didn't know the trick where you put the pattern on the left side of the =~ operator and still get results somehow ;-) I suspect there's some unexpected behavior in the backtracking? I tried: `use v5.20; use Data::Dump "pp"; my @z = glob('{a,b,c}'x6); my $z = '(?:a()\|a()\|b()\|b()\|c()\|c()){6}\1\2\3\4\5\6'; for my $j (@z) { $j =~ $z and say pp {$j => \@- }; }` [download] And I got a bunch of values likes: `... { acbcaa => [0, 6, 6, 3, 3, 4, 4] } { acbcab => [0, 5, 5, 6, 6, 4, 4] } { acbcac => [0, 5, 5, 3, 3, 6, 6] } { acbcba => [0, 6, 6, 5, 5, 4, 4] } { acbcca => [0, 6, 6, 3, 3, 5, 5] } { accaab => [0, 5, 5, 6, 6, 3, 3] } { accaba => [0, 6, 6, 5, 5, 3, 3] } ... { cccbab => [0, 5, 5, 6, 6, 3, 3] } { cccbac => [0, 5, 5, 4, 4, 6, 6] } { cccbba => [0, 6, 6, 5, 5, 3, 3] } { cccbca => [0, 6, 6, 4, 4, 5, 5] } { ccccab => [0, 5, 5, 6, 6, 4, 4] } { ccccba => [0, 6, 6, 5, 5, 4, 4] }` [download] Where each pair of alternative match exactly (eg \1 and \2) at the same place, no matter what. I'd suspect that the identical branches are actually merged by the optimizer. Is there some other magic to DWIM? There's this: `my @y = glob('{a,b,c}'x6); my $y = '(?:(?!\1)a()\|(?!\2)a()\|(?!\3)b()\|(?!\4)b()\|(?!\5)c()\|(?!\6)c( +)){6}\1\2\3\4\5\6'; for my $j (@y) { $j =~ $y and say $j; }` [download] `aabbcc aabcbc aabccb aacbbc aacbcb aaccbb ababcc abacbc abaccb ... ccbaab ccbaba ccbbaa` [download] Edit: this also works actually (without \1\2\3\4\5\6 at the end): `# edit reformatted as a multiline regex for clarity my $y = qr/(?: (?!\1) a () \| (?!\2) a () \| (?!\3) b () \| (?!\4) b () \| (?!\5) c () \| (?!\6) c () ){6} /x;` [download] `So TIL, (?!\x)XXX()` is a pattern to only allow XXX to match once in the whole regex... Cool :-)	[reply] [d/l] [select]
Re^2: Matching permutations with regex by QM (Parson) on Nov 22, 2018 at 09:27 UTC
I also didn't know the trick where you put the pattern on the left side of the =~ operator and still get results somehow ;- Ugh. Major brain fart. And it still works! The `(?!\1)xxx(()` element is great, thanks. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re: Matching permutations with regex by tybalt89 (Monsignor) on Nov 20, 2018 at 12:51 UTC
`#!/usr/bin/perl -l # https://en.wikipedia.org/wiki/Permutation#Generation_in_lexicographi +c_order use strict; use warnings; $_ = 'aabbcc'; 1 while print, s/.\K # find the last (.) # char such that (.) # there is a later (latest) (.)(??{$1 ge $3 and '(F)'}) # char greater than it (.) # and get rest # swap those two chars ( $1 & $3 ) # then reverse everything after the first swapped char / $3 . reverse $2.$1.$4 /xe` [download]	[reply] [d/l]
Re^2: Matching permutations with regex by QM (Parson) on Nov 22, 2018 at 09:20 UTC
OK. That's probably pretty fast too. But I think you misunderstood? I can generate them, but I wanted to _match_ any permutation of a given string. Update: Oh, I had the `=~` backwards, and that led you astray. (Not shown in the OP is a programmatic way to generate the regex solution based on the input string, but that was left out for clarity.) -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]
Re^3: Matching permutations with regex by tybalt89 (Monsignor) on Nov 22, 2018 at 15:52 UTC
Like so ? `#!/usr/bin/perl -l # https://perlmonks.org/?node_id=1226058 use strict; use warnings; $_ = 'aabbcc'; my $len = length; my $inner = join ',', split //, tr///csr; my $glob = "{$inner}" x $len; my $re = "^(?=.{$len}\$)"; while( /./ ) { my $count = s/$&//g; $re .= "(?=(?:.*$&){$count})"; } $_ =~ $re and print for glob $glob;` [download] Outputs the 90 valid permutations.	[reply] [d/l]
Re: Matching permutations with regex by tybalt89 (Monsignor) on Nov 20, 2018 at 15:22 UTC
`#!/usr/bin/perl -l # https://perlmonks.org/?node_id=1226058 use strict; use warnings; $_ = 'abc'; /(.).*\1/ or print for glob "{@{[ s/\B/,/gr ]}}" x length;` [download]	[reply] [d/l]
Re: Matching permutations with regex by rsFalse (Chaplain) on Nov 21, 2018 at 14:10 UTC
If @x can become huge (by generating all combinations with glob), another variant is - to try to match against multicated string, joined (or prepended) by some separator, which "blocks" backtracking to previous strings. In this case, all combinations will be tested, unless to use FAIL somewhere in a middle of search. In my opinion, this approach is not truly 'regex-like', because here a simple hash for counting occurrences is used. #!/usr/bin/perl # https://www.perlmonks.org/?node_id=1226058 use warnings; use re 'eval'; $\ = $/; for my $tc ( [ 'abc', 1 ], [ 'abc', 2 ], [ 'abc', 3 ], ){ my( $string, $times ) = @{ $tc }; $_ = ( ',' . reverse $string ) x ( $times * length $string ); print; my @permutations; my %occ; my $re = ', \w(\w)\w (?: (?{ $occ{ $^N } ++ }) \| (?{ $occ{ $^N } -- }) (F) ) ' x ( $times length $string ); ; / $re $ (??{ ( grep $_ != $times, values %occ ) ? '(F)' : '' }) (?{ push @permutations, join '', grep length, $1, $2, $3, $4, $5, $6, $7, $8, $9, }) (F) /x; print for @permutations, ~~ @permutations ; } [download] Almost the same code, except that I don't use '$1, $2, $3, $4, ...', but rather use a variable for saving current permutation string: Read more... (1457 Bytes) And if I won't to check if my current generating permutation is bad, I include that test '`(??{ ( grep $_ > $times, values %occ ) ? "(F)" : "" })`' inside '$re'. Maybe in some cases it can save searching time: `my $re = ', \w(\w)\w* (?: (?{ $occ{ $^N } ++ }) \| (?{ $occ{ $^N } -- }) (F) ) (??{ ( grep $_ > $times, values %occ ) ? "(F)" : "" }) ' x ( $times * length $string ); ;` [download]	[reply] [d/l] [select]
Re^2: Matching permutations with regex by QM (Parson) on Nov 22, 2018 at 09:32 UTC
I had the regex terms backwards. And the `glob` was just for testing purposes -- I wasn't trying to generate permutations, but to match one permutation of a known string. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply] [d/l]

Back to Seekers of Perl Wisdom