I spent some more time playing with this, trying out different things, like recursion, or backtracking control verbs, similar to what you did here. I also tried inverting the condition, that is, match strings where each character is repeated at least once. Sadly, test cases like "abacbc" and "abaxcbc" leave me doubting whether it is possible at all with pure regexes, although now that I've said that some regex wizard will probably prove me wrong ;-)

So despite not having achieved success yet, I thought I'd share one or two of the avenues I explored. One thing I tried was thinking about it as a prefix + single character + suffix. Defining the latter two is fairly easy, but the prefix is not, since for the test case "abaxcbc" I haven't yet found a way for either the prefix to "mark" the second b as "seen", or the suffix to look back and notice that b has been seen before. This is what I was meant regarding variable-width lookbehind, the (*SKIP)(*FAIL) trick shown by AnomalousMonk looked interesting, but does not seem applicable here.

Another, simpler problem is that in a string like "abccacb", the regex will easily misidentify the second a as the "single" character, since I couldn't find a way to look back - while I did manage to cover some of those cases with recursion, by doing stuff like / (?<r1>.)\g{r1}* (?&subpat) \g{r1}++ /, I still couldn't cover all the cases. For example, I couldn't come up with a way to match "abaxcbc" against /a.*a/, /b.*b/, and /c.*c/ simultaneously and in a way that advanced the match position in a useful manner. For example, in "abaxcbc"=~/b(.*)b/, $1 eq 'axc', and while cbc could be matched with a lookahead, how does one determine that the a has been seen before? Alternatively, one could match "abaxcbc"=~/ab(?=.*b)a/, but when you get to cbc, how do you determine that b has been seen before? (This is basically the same problem as mentioned above, no variable-width lookbehind.)

Of course, I may have painted myself into a corner with all my (over)thinking, so now I'm really curious if there is a solution :-) (Update: There is!)

I also spent some time on generating test cases, which was an interesting little problem in itself. Although I am out of time for now, maybe something in the following is useful:

#!/usr/bin/env perl use warnings; use strict; use feature 'say'; use Test::More; my $re = qr{ \A ## Reference Implementation ## (?<prefix> .* ) (?<single> (??{ length $+{prefix} ? '[^'.quotemeta($+{prefix}).']' : '.' }) ) (?<suffix> (?: (?! \g{single} ) . )* ) \z }msx; my $re_nope = qr{ \A ## one of my early failed attempts # the prefix contains only characters that repeat later on (?<prefix> (?: (?<rep> . ) (?= .* \g{rep} ) )* ) # the single character obviously can't be part of the prefix (?<single> (?! \g{rep} ) . ) # the suffix must not contain the "single" character (?<suffix> (?: (?! \g{single} ) . )* ) \z }msx; # run the regex against the test cases for my $len (1..7) { my $odo = deduped_odo( map ['a'..chr(ord('a')+$_-1)], 1..$len ); while (my @c = $odo->()) { my $str = join '', @c; my $expect = do { my %h; $h{$_}++ for @c; my %r = reverse %h; exists $r{1} }; is $str=~$re, $expect, $str.($expect?' =~':' !~').' re' # or, for regexes that match strings with no single chars: #is $str!~$re2, $expect, $str.($expect?' !~':' =~').' re2' or diag explain [ \%-, map [ $-[$_], eval "\$$_", $+[$_] ], 1..@+ ]; } } done_testing; say "FAIL: $_->{name}" # a shorter summary of failures for grep { !$_->{ok} } Test::More->builder->details; exit; # test case generation stuff follows sub odometer { # http://www.perlmonks.org/?node_id=1197785 my @w = map { [ 1, ref eq 'ARRAY' ? @$_ : $_ ] } @_; my $done; return sub { if ($done) { $done=0; return } my @cur = map {$$_[$$_[0]]} @w; for (my $i=$#w; $i>=0; $i--) { last if ++$w[$i][0]<@{$w[$i]}; $w[$i][0]=1; $done=1 unless $i; } return wantarray ? @cur : join '', @cur; } } sub deduped_odo { my $odo = odometer(@_); # this sequence can probably be generated directly... my %seen; return sub { while (1) { my @next = $odo->() or return; my $pat = do { my ($i,%h)=('A'); join '', map {$h{$_}//=$i++} @next }; if ( not $seen{$pat}++ ) { return wantarray ? @next : join '', @next } } } }

Update 2019-10-05: I've just released Algorithm::Odometer::Tiny!


In reply to Re^3: Regex: matching character which happens exactly once by haukex
in thread Regex: matching character which happens exactly once by LanX

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.