Based on a few lines in perltodo for 5.6, I've written a new module called Regex::English. Ominous name, no? Here's an example:
my $time; my $re = Regex::English -> start_of_line() -> literal("1998/10/08") -> optional( whitespace() ) -> literal("[") -> remember(\$time, many( or( ":", digit() ) )) -> non_digit(); if ($re->match('1998/10/08 [11:10]')) { print "Found a match at time $time!\n"; }
Okay, that's not a big win for much besides readability, but it could help non-regex-gurus. It was kinda neat to write, too. It's a little long to post here, but it would be great to get some feedback. I'll keep updating the module on my site.

Feedback, suggestions, patches, questions are all welcome. It'd be cool to have a preliminary upload to the CPAN by this weekend.

Replies are listed 'Best First'.
Re: Pondering Regex::English
by TheDamian (Vicar) on Aug 24, 2001 at 10:48 UTC
    Here's an alternative I've been playing with:
    package Regexp::Simple; use overload; use Parse::RecDescent; local $/; my $translate = Parse::RecDescent->new(<DATA>); sub import { overload::constant qr => sub { return $translate->pattern($_[0]); } } 1; __DATA__ { sub wrap { "(?:$_[0])".($_[1]||"") } } pattern: element(s) { join '', @{$item{element}} } element: 'start_of_line' { '^' } | 'whitespace' { '\\s' } | 'digit' { '\\d' } | 'non_digit' { '\\D' } | 'literal' subpat { wrap $item{subpat} } | 'optional' subpat { wrap $item{subpat}, '?' } | 'many' subpat { wrap $item{subpat}, '+' } | 'any' subpatlist { wrap join "|", @{$item{subpatlist}} } | 'remember' subpat { "($item{subpat})" } | <perl_quotelike> { quotemeta $item[1][2] } # etc. as necessary subpat: '(' pattern ')' { $item[2] } subpatlist: '(' pattern(s /,/) ')' { $item[2] }

    Then your example would become:

    use Regexp::Simple; my $re = qr{ start_of_line literal("1998/10/08") optional( whitespace ) literal("[") remember( many( any( ":", digit ) ) ) non_digit }; if ('1998/10/08 [11:10]' =~ m/$re/) { print "Found a match at time $1!\n"; }
    Since we all seem to be converging, I'll be happy collaborate (I have many more ideas for this module than those shown above above and it would be great to have someone else share the implementation effort.

    Send me some email if either (or both) of you are interested.

    And, yes, the Regexp:: namespace is an abomination.
    But it's the standard abomination for Perl 5, so we should use it.

    Damian

Re: Pondering Regex::English
by VSarkiss (Monsignor) on Aug 24, 2001 at 06:19 UTC

    Ack! I must've read the same thing, because I wrote something along the same lines. I was going to finish writing the POD this weekend and post to CLPM, to upload to CPAN later. (My first upload to CPAN!)

    There are two main differences that I can see right now. First, I called mine Regexp::Wordy (with a P ;-) based on the idea that it was using words instead of symbols to describe regexes. Although it is in English.

    Second, I avoided using OO style. My rationale was that this would mainly be of use to newbies, so I wanted to keep things as simple as possible. If you haven't got the hang of /^\s+[a-m]*\d{3}/, you'd probably be intimidated by the $foo->Regex::Wordy stuff too. If I'm reading your example right, with this module you would render it as:

    use Regexp::Wordy qw(:all); if ('1998/10/08 [11:10]' =~ regexp( at_bol, clean('1998/10/08'), any_number(space), clean('['), remember( one_or_more( either(':', digit))), nondigit)) { print "Found a match at time $1\n"; }
    As you can see, some things are still clumsy. I didn't think about passing in a ref to be set instead of $1, etc.

    I'd be glad to send you what I have, but I couldn't find an address on your home node or web site. I have an obfuscated address on my home node if you'd like to email me instead.

Re (tilly) 1: Pondering Regex::English
by tilly (Archbishop) on Aug 24, 2001 at 03:47 UTC
    First of all its name should be Regexp::English to fit with the current naming structure.

    Secondly, how does it compare with Damian's Regexp::Common? I think the best way to do your module would either be as part of that, or as a wrapper around it. If the wrapper approach fits what you want, then I think an excellent "proof of concept" would be a partial translation into another language, for instance a (limited) Regexp::Francais.

      <rant>

      While I can't disagree that maintaining a standard naming structure a Good Thing, I have to say that I agree with Jeffrey Friedl regarding "regexp": it's a blight on the language. I speak English and a bit of French and I can't imagine any speakers in either language finding "regexp" to roll off the tongue easier than "regex". In fact, I find it hard to imagine that speakers in any language would enjoy that P on the end. "regexp" sounds like something Sylvester the Cat would spit out in conversation if he were conversant with Perl.

      I don't know who started the "regexp" naming scheme on the CPAN, but I was at a talk that Damian gave recently and unless I drastically misunderstood him, he also thought that "regexP" was an abomination.

      Which is my roundabout way of saying that I get chills up my spine every time I hear someone trying to say 'regexp'.

      'Nuff said :)

      </rant>

      This node was not brought to you by the letter 'P'.

      Cheers,
      Ovid

      Vote for paco!

      Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.

        I've heard that it is "regexp" with a silent "p". But I still use "regex".

        _____________________________________________________
        Jeff[japhy]Pinyan: Perl, regex, and perl hacker.
        s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;

        I speak English and a bit of French and I can't imagine any speakers in either language finding "regexp" to roll off the tongue easier than "regex".

        For the unconvinced, try speaking the following aloud at a normal pace:

        Regexp regexp regexp

        Toy boat toy boat toy boat

        Regexp regexp regexp

        Pass the peanut butter.

        I speak French with a bit of English (and i try to learn a little german by listening www.jazzradio.net). I think that regexp is a better mnemonic then regex. It is indeed a tongue twister so the P may be better silent.

        -- stefp

      You're right about the namespace thing. "Regexp" is abominable, but it's the current namespace.

      Regexp::English has very little in common with Regexp::Common, however. One is a library of common regexes. The other is a wrapper around qr with a few handy features.

      I like the idea about providing other languages, though. It would be useful to abstract out the common features and use glob aliasing to build the appropriate sub and method names for various other languages. Supporting locales could be tricky, though.

      Supposing this did become the parent to all sort of language-specific long regex modules, what would it be called? Regexp::Easy? Regexp::Language? Regexp::Long? Regexp::Language::Base?

        I agree with disliking the name. (Besides which, Perl's regular expressions aren't even that regular.)

        But I still think that you should think carefully whether you are best done as a wrapper around Regexp::Common. The point is not that you are doing something similar, but rather that users of your module will likely want to do the same tasks that Regexp::Common makes easy, and it would be nice for your module to be able to give you all of its snippets.

        This is particularly true considering how long it will be with your module to describe the common REs that Damian supports...

        As for the language hierarchy, I hate thinking up names. You could, though, just call it Regexp::English, and have all of the other ones depend on Regexp::English. A bit of a hack, but your internals are your business, users should not have hidden expectations...