Although these modules don't solve the problem you're asking directly, maybe they are a starting point: Regexp::Trie, Regexp::Assemble, Regex::PreSuf. Note that in your problem statement, using the a strings as an example, you're saying that you're looking for either 1, 2, 5, or 7 consecutive a's, but a+ actually matches more than that. Note how these optimizers are taking the number of a's into account.
use warnings; use strict; my @strs = qw/a b c aa bb ccc aaaaa bbb cccccc aaaaaaa bbbb ccccccc/; use Regexp::Trie; my $rt = Regexp::Trie->new; $rt->add($_) for @strs; print $rt->regexp, "\n"; # (?^:(?:a(?:a(?:aaa(?:aa)?)?)?|b(?:b(?:bb?)?)?|c(?:cc(?:cccc?)?)?)) use Regexp::Assemble; my $ra = Regexp::Assemble->new; $ra->add($_) for @strs; print $ra->re, "\n"; # (?^:(?:a(?:a(?:(?:aa)?aaa)?)?|c(?:cc(?:c?ccc)?)?|b(?:b(?:b?b)?)?)) use Regex::PreSuf; my $re = presuf(@strs); print $re, "\n"; # (?:aa(?:aaa(?:aa)?)?|bb(?:bb|b)?|ccc(?:cccc?)?|[abc])
In reply to Re: Partitioning a set of strings by regular expressions
by haukex
in thread Partitioning a set of strings by regular expressions
by Locutus
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |