in reply to Re: Nonrepeating characters in an RE (updated)
in thread Nonrepeating characters in an RE

I think it's not so bad doing it with a regexp if you're generating it programmatically. And dealing with it via split is easy when all letters must be distinct, but also gets more complex when you have to honour the template.

I think the direct version for the given template would look something like the below, which doesn't look too hard to generate programmatically either for this template or for other templates more generally:

m{ (.) (?!\1) (.) (?!\1|\2) (.) (?!\1|\2|\3) (.) (?!\1|\2|\3|\4) (.) (?!\1|\2|\3|\4|\5) (.) \1 }x

For your update version, thanks - I've not noticed before that we have \g-1 (though I think I'd want to add the optional braces - it doesn't look remotely atomic to me without them). But I believe your example is also for "all distinct letters", it isn't clear to me how you'd use it for BernieC's templates.

For performance: I'm assuming this is for matching something like English words in a dictionary, so the length is unlikely to be a problem - I'd look for a different approach if the templates were commonly going to have more than 10-20 distinct letters in them. There's a lot gained by the fact that matching a regexp is a single perl op: a split-based solution is likely to lose a lot more on the op overhead than it gains on the algorithmic complexity, if the target really is something like dictionary words.

Replies are listed 'Best First'.
Re^3: Nonrepeating characters in an RE (updated)
by AnomalousMonk (Archbishop) on Aug 16, 2022 at 07:27 UTC

    I think a hash approach is likely to be faster.

    If the regex approach is taken, a simpler regex would seem to be
        qr{ ([\Q$template\E]) .*? \1 }xms
    where $template is the template string. Any $string that matches against this regex has a repeated character of the template. Repeated characters in the string that are not in the template are ignored (I assume this is what BernieC wants). If there are repeated characters in the template, the repeats are effectively ignored. However, this qr// will not handle an empty-string template.


    Give a man a fish:  <%-{-{-{-<

Re^3: Nonrepeating characters in an RE (updated)
by LanX (Saint) on Aug 16, 2022 at 14:30 UTC
    > OP > For example, my "template" might look like this: "abcdefa" and I already have the code that generates (.)?????\1. I can't figure how to make the "?"s say "these guys all have to be distinct".

    > But I believe your example is also for "all distinct letters", it isn't clear to me how you'd use it for BernieC's templates.

    OK taking under assumption, that what the OP meant was to only match strings build from a "template" with an anchored substring having the constraint of (certain?) unique letters, one can easily adapt the above regex solution.

    use v5.12; use warnings; use Data::Dump; #ddx my @words = map { "..X${_}X.." } <{a,b,c,d}{a,b,c,d}{a,b,c,d}>; my $class = "[abc]"; #$class = '.'; #uncomment to match d too my $len = 3; for (@words) { say "$_" if $_ =~ / X # start anchor ( ($class) (?! .{0,$len} \g{-1} # relative backreference .*? X ) ){$len} X # end anchor /x; }

    ..XabcX.. ..XacbX.. ..XbacX.. ..XbcaX.. ..XcabX.. ..XcbaX..

    Hope that's clearer now. :)

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

    update

    added part

    .*? X # end anchor

    update

    replaced .*? with .{0,$len}

    NB: technically $len-1 would be better, but the end anchor makes this redundant

    update

    OK I agree that this is a mess. But I'd rather wait for clarifiation from the OP before suggesting better solutions.

Re^3: Nonrepeating characters in an RE (simple)
by LanX (Saint) on Aug 16, 2022 at 09:11 UTC
    > But I believe your example is also for "all distinct letters", it isn't clear to me how you'd use it for BernieC's templates.

    I find the wording "template" confusing and thought it's about constructing a complicated regex by templates, i.e. like it's done with HTML.

    If "template" is supposed to mean character class $chars = "abc.." whose chars are never repeated anywhere in the string, a negated approach is probably the simplest

    $str !~ / ([$chars]) .* \1 /x

    edit

    use v5.12; use warnings; my @words = qw"abc aab abb aba abcd abca"; my $chars = "ad"; for (@words) { say "$_" if $_ !~ / ([$chars]) .* \1 /x }

    abc abb # NB: b wasn't in chars abcd

    Cheers Rolf
    (addicted to the Perl Programming Language :)
    Wikisyntax for the Monastery

      When the OP talks about "a template that might look like abcdefa", I interpret that to represent a pattern that words should follow, as in a substitution cypher: the first and seventh letters should be the same, all other letters should be different from those and from each other. Thus it should match "suitors" and "realtor", but not "bracken" or "suffers" or "straits" or "albania".

      So the regexp translation becomes: for each letter in the template, if it has not been seen before introduce a new capture, and insist it doesn't match any of the previous captures; if it has been seen before, just match the appropriate previous capture:

      sub template_to_regexp { my($template) = @_; my $seen_count; my %seen_at; my $regexp = ''; for my $i (0 .. length($template) - 1) { my $chr = substr($template, $i, 1); my $seen = $seen_at{$chr}; if (defined $seen) { $regexp .= "\\$seen"; next; } # else it's a new template character $regexp .= sprintf '(?!%s)', join '|', map "\\$_", 1 .. $seen_coun +t if $seen_count; $seen_at{$chr} = ++$seen_count; $regexp .= '(.)'; } return qr{$regexp}; }
        I agree with your interpretation. It seems to imply that the length of the target must match the length of the 'template'. It sounds like a tool for solving Jumble by searching a dictionary for valid words that match the given pattern.
        Bill
        > I interpret that to represent a pattern that words should follow, as in a substitution cypher: the first and seventh letters should be the same,

        or he made a typo. :)

        best to wait for clarification...

        Cheers Rolf
        (addicted to the Perl Programming Language :)
        Wikisyntax for the Monastery