comment on

Interesting problem. This is the best I can do so far. It extracts all singleton characters from a string. It needs Perl 5.10 regex extensions, but I think those are kosher. The (?(condition)yes-pattern) is used with (?{ code }) for the (condition) and I'm not sure if the stink of (?{ code }) is dispelled by its use in a conditional regex expression. Of course, the most damning thing is the use of a hash to keep track of characters already seen, but I can't get around this (update: yet). (I'm running under 5.10 so I have to use a local our %seen hash, but I understand that 5.18+ supports my variables at last.)

File singleton_chars_1.pl:

use 5.010;  # need regex extensions

use warnings;
use strict;

use Test::More
    'no_plan'  # safer to use done_testing()
    ;
use Test::NoWarnings;

# test datasets ####################################################

use constant TEST_VECTOR_SET_1 => (
  "each contain one or more single character",
  [ qw(a a) ], [ qw(ab a b) ], [ qw(abc a b c) ],
  [ qw(aba b) ], [ qw(abb a) ], [ qw(aab b) ],
  [ qw(cpcdeqe p d q) ], [ qw(pcdcq p d q) ], [ qw(cpdqc p d q) ],
  [ qw(aapdq p d q) ], [ qw(apadq p d q) ], [ qw(apdaq p d q) ], [ qw(
+apdqa p d q) ],
  [ qw(paadq p d q) ], [ qw(padaq p d q) ], [ qw(padqa p d q) ],
  [ qw(pdaaq p d q) ], [ qw(pdaqa p d q) ],
  [ qw(pdqaa p d q) ],

  "expected results from LanX pm#1201799",
  [ qw(aaaab b) ], [ qw(aaaba b) ], [ qw(aabaa b) ], [ qw(abaaa b) ],
  [ qw(abbbb a) ], [ qw(baaaa b) ], [ qw(babbb a) ], [ qw(bbabb a) ],
  [ qw(bbbab a) ], [ qw(bbbba a) ],

  "none of these contain any single character",
  [ 'aa' ], [ 'aaa' ], [ 'aabb' ], [ 'aaabbb' ],
  [ 'abcabc' ], [ 'abcxcbax' ], [ 'xabccbax' ], [ 'abcxxcba' ],
  [ 'abacbc' ],
  );

# functions under test #############################################

sub rx_1 {

    my ($string,
        ) = @_;

    local our %seen;

    my $singleton = qr{  # captures a singleton
      (.)                             # capture/test this char
      (?(?{ $seen{$^N}++ }) (*FAIL))  # fail if seen before
      (?(?= .*? \g{-1})     (*FAIL))  # fail if seen later in string
      }xms;

    # extract all singletons.
    return [ $string =~ m{ (?= $singleton) }xmsg ];

    }

# testing, testing... ##############################################

FUNT:
for my $ar_funt (
  # function
  # name             comment
  [ 'rx_1',          '1st try - not "pure"', ],
  ) {

  my ($func_name, $func_note) = @$ar_funt;

  *singletons = do { no strict 'refs';  *$func_name; };

  defined $func_note ? note "\n $func_name() -- $func_note \n\n"
                     : note "\n $func_name() \n\n"
                     ;

  VECTOR:
  for my $ar_vector (TEST_VECTOR_SET_1) {

    if (not ref $ar_vector) {  # comment string if not vector ref.
      note $ar_vector;
      next VECTOR;
      }

    my ($string, @expected) = @$ar_vector;

    is_deeply singletons($string), \@expected,
              qq{'$string' -> (@expected)},
              ;

    }  # end for VECTOR

  }  # end for FUNT

note "\n done testing functions \n\n";

done_testing();

exit;

# utility functions ################################################

# none
[download]

Output:

c:\@Work\Perl\monks\LanX>perl singleton_chars_1.pl
#
#  rx_1() -- 1st try - not "pure"
#
# each contain one or more single character
ok 1 - 'a' -> (a)
ok 2 - 'ab' -> (a b)
ok 3 - 'abc' -> (a b c)
ok 4 - 'aba' -> (b)
ok 5 - 'abb' -> (a)
ok 6 - 'aab' -> (b)
ok 7 - 'cpcdeqe' -> (p d q)
ok 8 - 'pcdcq' -> (p d q)
ok 9 - 'cpdqc' -> (p d q)
ok 10 - 'aapdq' -> (p d q)
ok 11 - 'apadq' -> (p d q)
ok 12 - 'apdaq' -> (p d q)
ok 13 - 'apdqa' -> (p d q)
ok 14 - 'paadq' -> (p d q)
ok 15 - 'padaq' -> (p d q)
ok 16 - 'padqa' -> (p d q)
ok 17 - 'pdaaq' -> (p d q)
ok 18 - 'pdaqa' -> (p d q)
ok 19 - 'pdqaa' -> (p d q)
# expected results from LanX pm#1201799
ok 20 - 'aaaab' -> (b)
ok 21 - 'aaaba' -> (b)
ok 22 - 'aabaa' -> (b)
ok 23 - 'abaaa' -> (b)
ok 24 - 'abbbb' -> (a)
ok 25 - 'baaaa' -> (b)
ok 26 - 'babbb' -> (a)
ok 27 - 'bbabb' -> (a)
ok 28 - 'bbbab' -> (a)
ok 29 - 'bbbba' -> (a)
# none of these contain any single character
ok 30 - 'aa' -> ()
ok 31 - 'aaa' -> ()
ok 32 - 'aabb' -> ()
ok 33 - 'aaabbb' -> ()
ok 34 - 'abcabc' -> ()
ok 35 - 'abcxcbax' -> ()
ok 36 - 'xabccbax' -> ()
ok 37 - 'abcxxcba' -> ()
ok 38 - 'abacbc' -> ()
#
#  done testing functions
#
1..38
ok 39 - no warnings
1..39
[download]

Give a man a fish: <%-{-{-{-<

In reply to Re: Regex: matching character which happens exactly once by AnomalousMonk
in thread Regex: matching character which happens exactly once by LanX

Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.