Dear Fellow Monks,
Let S be a set of given strings containing no newline character. I would like to "categorize" the strings in S by constructing a set R of n regular expressions r1, ..., rn of the form qr/^...$/ such that S1, ..., Sn - where Si denotes the subset of S consisting of the strings matched by ri (i=1, ..., n) - is a partition of S.
Needless to say I'm not interested in trivial partitions like induced by R = { qr/^.*$/ } or R = { qr/^s$/ for s in S }. The set R should rather be minimal in a sense I'm still unsure of how to define it but the following example should make it clear: For S = { a, b, c, aa, bb, ccc, aaaaa, bbb, cccccc, aaaaaaa, bbbb, ccccccc } I would like to construct R = { qr/^a+$/, qr/^b+$/, qr/^c+$/ }.
Do you know of a CPAN module that solves this problem? Or of any research papers that deal with it? I might have been searching for the wrong terms but Google hasn't been able to point me to something useful, yet.
In reply to Partitioning a set of strings by regular expressions by Locutus
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |