bgu has asked for the wisdom of the Perl Monks concerning the following question:

Hi guys,

This is my first post, hope you can help me out.

I'm looking for a CPAN module that would give all the possible matching strings for a relatively simple regular expression.

By simple I mean that this regexp would not contain +,*, backreferences, non-capturing groups or anything that would be a tricky computation. Charset selection would be a bonus, but anything would be much appreciated.

So having a simple input as

abc[0pz]

to return these strings

abc0 abcp abcz

Thanks!

Replies are listed 'Best First'.
Re: CPAN module for generating regexp matches
by toolic (Bishop) on Feb 29, 2012 at 01:44 UTC
    A CPAN search finds Regexp::Genex. It doesn't seem to support character classes, but this is close:
    use warnings; use strict; use Regexp::Genex qw(:all); print "$_ " for strings('abc(0|p|z)'); print "\n"; __END__ abc0 abcp abcz
Re: CPAN module for generating regexp matches
by AnomalousMonk (Archbishop) on Feb 29, 2012 at 08:24 UTC

    bgu: An application that seems to almost exactly match the requirements you give is described in Dominus's book "Higher Order Perl" (freely downloadable here) in section 4.3.2 "Genomic Sequence Generator". A more general discussion of generating strings given a regex is contained in section 6.5 "Regex String Generation".

      Thanks toolic,

      I've just tried Regexp::Genex and it works fine, I will do some tweaks for the character sets, but yeah, most of the actual string generation is covered.

      AnomalousMonk, great tip with that book, found a lot of valuable stuff in there, very neatly explained

Re: CPAN module for generating regexp matches
by Marshall (Canon) on Feb 29, 2012 at 01:19 UTC
    #!usr/bin/perl -w use strict; my @strings = glob "{abc}{0,p,z}"; #to enumerate the combinations print "@strings\n"; #abc0 abcp abcz
    You can make a regex out of these combinations, $regex=join("|",@strings); or what you had is fine  /abc[0pz]/

    note: that abc or the character set 0pz could be Perl variables ($prefix or $endingLetters or whatever).

    my $prefix = 'abc'; my $endingLetters = '0pz'; if (/$prefix[$endingLetters]/){...}