I really like Eily's ++approach: it's short and sweet and probably sufficiently fast for short strings and word lists. It has the acknowledged problem of doing a lot of redundant look-aheading and so may be too slow for long strings (for some definition of "long"), but I'd be willing to cross that bridge when I come to it.

One issue I don't see addressed stems from the fundamental nature of Perl 5 ordered alternations: the first match in the list of alternations is the final match. There is no consideration of "longest match", etc.; it's first come, first served.

Consider

c:\@Work\Perl>perl -wMstrict -le "use Data::Dump qw(pp); ;; sub multi_cap_Eily { ;; my ($word, @words) = @_; ;; my ($dict) = map qr{ $_ }xms, join '|', @words; ;; return $word =~ m{ \G ($dict) (?= $dict* \z) }xmsg; } ;; pp multi_cap_Eily('abcd', qw(ab cd a b c d abcd)); " ("ab", "cd")
This uses what I would call a "naive" alternation producing the sub-strings  'ab' 'cd' that are simply the first to match of the list of "words" that happened to be supplied. Is this result correct?

Because the word patterns being supplied are simple character groups, it's easy to introduce a notion of shortest or longest matching. The statement
    my ($dict) = map qr{ $_ }xms, join '|', sort @words;
with a default ascending lexical sort produces the sub-strings
    ("a" .. "d")
and the descending reverse sort
    my ($dict) = map qr{ $_ }xms, join '|', reverse sort @words;
produces
    "abcd"
I would consider either of those results to be more correct than the naive result if only because they are both independent of the adventitious ordering of the list of input sub-words.

(I'm avoiding the whole business of converting back and forth between mixed- and single-case because that just seems tangential to the issue I'm trying to address.)

This aspect of ordered alternation matching is often overlooked. Sometimes it can be easily addressed.


Give a man a fish:  <%-{-{-{-<


In reply to Re: Multi_captures needed by AnomalousMonk
in thread Multi_captures needed by rsFalse

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.