Pondering Regex::English

Replies are listed 'Best First'.
Re: Pondering Regex::English by TheDamian (Vicar) on Aug 24, 2001 at 10:48 UTC
Here's an alternative I've been playing with: package Regexp::Simple; use overload; use Parse::RecDescent; local $/; my $translate = Parse::RecDescent->new(<DATA>); sub import { overload::constant qr => sub { return $translate->pattern($_[0]); } } 1; __DATA__ { sub wrap { "(?:$_[0])".($_[1]\|\|"") } } pattern: element(s) { join '', @{$item{element}} } element: 'start_of_line' { '^' } \| 'whitespace' { '\\s' } \| 'digit' { '\\d' } \| 'non_digit' { '\\D' } \| 'literal' subpat { wrap $item{subpat} } \| 'optional' subpat { wrap $item{subpat}, '?' } \| 'many' subpat { wrap $item{subpat}, '+' } \| 'any' subpatlist { wrap join "\|", @{$item{subpatlist}} } \| 'remember' subpat { "($item{subpat})" } \| <perl_quotelike> { quotemeta $item[1][2] } # etc. as necessary subpat: '(' pattern ')' { $item[2] } subpatlist: '(' pattern(s /,/) ')' { $item[2] } [download] Then your example would become: `use Regexp::Simple; my $re = qr{ start_of_line literal("1998/10/08") optional( whitespace ) literal("[") remember( many( any( ":", digit ) ) ) non_digit }; if ('1998/10/08 [11:10]' =~ m/$re/) { print "Found a match at time $1!\n"; }` [download] Since we all seem to be converging, I'll be happy collaborate (I have many more ideas for this module than those shown above above and it would be great to have someone else share the implementation effort. Send me some email if either (or both) of you are interested. And, yes, the `Regexp::` namespace is an abomination. But it's the standard abomination for Perl 5, so we should use it. Damian	[reply] [d/l] [select]
Re: Pondering Regex::English by VSarkiss (Monsignor) on Aug 24, 2001 at 06:19 UTC
Ack! I must've read the same thing, because I wrote something along the same lines. I was going to finish writing the POD this weekend and post to CLPM, to upload to CPAN later. (My first upload to CPAN!) There are two main differences that I can see right now. First, I called mine `Regexp::Wordy` (with a P ;-) based on the idea that it was using words instead of symbols to describe regexes. Although it is in English. Second, I avoided using OO style. My rationale was that this would mainly be of use to newbies, so I wanted to keep things as simple as possible. If you haven't got the hang of `/^\s+[a-m]*\d{3}/`, you'd probably be intimidated by the `$foo->Regex::Wordy` stuff too. If I'm reading your example right, with this module you would render it as: `use Regexp::Wordy qw(:all); if ('1998/10/08 [11:10]' =~ regexp( at_bol, clean('1998/10/08'), any_number(space), clean('['), remember( one_or_more( either(':', digit))), nondigit)) { print "Found a match at time $1\n"; }` [download] As you can see, some things are still clumsy. I didn't think about passing in a ref to be set instead of `$1`, etc. I'd be glad to send you what I have, but I couldn't find an address on your home node or web site. I have an obfuscated address on my home node if you'd like to email me instead.	[reply] [d/l] [select]
Re (tilly) 1: Pondering Regex::English by tilly (Archbishop) on Aug 24, 2001 at 03:47 UTC
First of all its name should be Regexp::English to fit with the current naming structure. Secondly, how does it compare with Damian's Regexp::Common? I think the best way to do your module would either be as part of that, or as a wrapper around it. If the wrapper approach fits what you want, then I think an excellent "proof of concept" would be a partial translation into another language, for instance a (limited) Regexp::Francais.	[reply]
(Ovid - a blight on the language) Re(2): Pondering Regex::English by Ovid (Cardinal) on Aug 24, 2001 at 03:58 UTC
`<rant>` While I can't disagree that maintaining a standard naming structure a Good Thing, I have to say that I agree with Jeffrey Friedl regarding "regexp": it's a blight on the language. I speak English and a bit of French and I can't imagine any speakers in either language finding "regexp" to roll off the tongue easier than "regex". In fact, I find it hard to imagine that speakers in any language would enjoy that P on the end. "regexp" sounds like something Sylvester the Cat would spit out in conversation if he were conversant with Perl. I don't know who started the "regexp" naming scheme on the CPAN, but I was at a talk that Damian gave recently and unless I drastically misunderstood him, he also thought that "regexP" was an abomination. Which is my roundabout way of saying that I get chills up my spine every time I hear someone trying to say 'regexp'. 'Nuff said :) `</rant>` This node was not brought to you by the letter 'P'. Cheers, Ovid Vote for paco! Join the Perlmonks Setiathome Group or just click on the the link and check out our stats.	[reply]
"Regexp" with a silent "p" by japhy (Canon) on Aug 24, 2001 at 04:09 UTC
I've heard that it is "regexp" with a silent "p". But I still use "regex". _____________________________________________________ Jeff`[japhy]`Pinyan: Perl, regex, and perl hacker. `s++=END;++y(;-P)}y js++=;shajsj<++y(p-q)}?print:??;`	[reply]
Re: (Ovid - a blight on the language) Re(2): Pondering Regex::English by dws (Chancellor) on Aug 24, 2001 at 05:55 UTC
I speak English and a bit of French and I can't imagine any speakers in either language finding "regexp" to roll off the tongue easier than "regex". For the unconvinced, try speaking the following aloud at a normal pace: Regexp regexp regexp Toy boat toy boat toy boat Regexp regexp regexp Pass the peanut butter.	[reply]
Re: (Ovid - a blight on the language) Re(2): Pondering Regex::English by stefp (Vicar) on Aug 25, 2001 at 00:00 UTC
I speak French with a bit of English (and i try to learn a little german by listening www.jazzradio.net). I think that regexp is a better mnemonic then regex. It is indeed a tongue twister so the P may be better silent. -- stefp	[reply]
Re: Re (tilly) 1: Pondering Regex::English by chromatic (Archbishop) on Aug 24, 2001 at 05:04 UTC
You're right about the namespace thing. "Regexp" is abominable, but it's the current namespace. Regexp::English has very little in common with Regexp::Common, however. One is a library of common regexes. The other is a wrapper around qr with a few handy features. I like the idea about providing other languages, though. It would be useful to abstract out the common features and use glob aliasing to build the appropriate sub and method names for various other languages. Supporting locales could be tricky, though. Supposing this did become the parent to all sort of language-specific long regex modules, what would it be called? Regexp::Easy? Regexp::Language? Regexp::Long? Regexp::Language::Base?	[reply]
Re (tilly) 3: Pondering Regex::English by tilly (Archbishop) on Aug 24, 2001 at 05:45 UTC
I agree with disliking the name. (Besides which, Perl's regular expressions aren't even that regular.) But I still think that you should think carefully whether you are best done as a wrapper around Regexp::Common. The point is not that you are doing something similar, but rather that users of your module will likely want to do the same tasks that Regexp::Common makes easy, and it would be nice for your module to be able to give you all of its snippets. This is particularly true considering how long it will be with your module to describe the common REs that Damian supports... As for the language hierarchy, I hate thinking up names. You could, though, just call it Regexp::English, and have all of the other ones depend on Regexp::English. A bit of a hack, but your internals are your business, users should not have hidden expectations...	[reply]