Re: Re: Dependencies, or, How Common is Regexp::Common?

I wonder about Regexp::Common. I often have tasks that a regular-expression related and then I look at what the module offers and usually it doesn't have what I need.

By "what I Need" I mean 2 things. (a) it lacks a certain common regular expression (b) it lacks a certain tasks related to regular expressions

By (a), what I mean is sometimes a regular expression is common, but not in that distro. For example, I was told to write something to make sure an address was valid. So, I simply made sure that the string had a number and a letter in it... and it did get a little bit of filtering done. Is there a better solution? Aren't many people having to validate addresses? How are you doing it? Also, I am not sure how open Abigail-II is to new additions to the module and I am not sure if I should use rt.cpan.org or email him. He is certainly very present here, so I could msg him. But also by (a) what I mean is that Abigail and Damian are both non-American, and so their profanity regular expressions were way off the mark. I had never even heard of some of the terms they thought were bad and others are completely normal in American context (e.g, "bl**dy"). So I coded Regexp::US::Profanity to do filtering with

Regarding (b), the regexp to count the number of a certain character in a string is very simple, and the task to count is also simple, but neither was readily available in the distro. And again, I was afraid to contact the author about it, so I just whipped up some lines of code to do it

Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality.

Comment on Re: Re: Dependencies, or, How Common is Regexp::Common?

Replies are listed 'Best First'.
Re: Dependencies, or, How Common is Regexp::Common? by Abigail-II (Bishop) on Sep 17, 2003 at 21:12 UTC
For example, I was told to write something to make sure an address was valid. So, I simply made sure that the string had a number and a letter in it... and it did get a little bit of filtering done. Is there a better solution? Aren't many people having to validate addresses? How are you doing it? Personally, I don't think such a thing belongs in Regexp::Common, because there are no clear rules on what is a valid address. You could make some heuristics, but they will give many false positives, and false negatives. And the heuristics will differ from country to country. Also, I am not sure how open Abigail-II is to new additions to the module The PODs have always suggested there are not enough regexes and has asked for people to send them. In the year and a half that I'm taken care of this module, I haven't had enough regexes send in to need a second hand to count them. As for contacting me, email is preferred (regexp-common@abigail.nl). I don't do the chatterbox, so don't waste your time messaging me. As for the profanity regex, that's entirely Damians work, including the nifty encoding. Had it not been there when I started maintaining it, I would not have added. The problem I have with it, is that it's so subjective. Who am I to decide what's profanity, and what isn't? You can never be complete on this one, and where do you stop? Regarding (b), the regexp to count the number of a certain character in a string is very simple, and the task to count is also simple, but neither was readily available in the distro. The regexp is simple? You'd have to write something like (assuming you want to count the occurrance of the character `c`: `/^(?{$count = 0})[^c](?:c(?{$count ++})[^c])*/` [download] which I don't think is simple. I wouldn't use a regex for that, I'd use `tr/c/c/` [download] and if you want to count the number of non-overlapping matches of a pattern, I'd use: `$count = () = /$pat/g;` [download] To catch that inside a single regex is really awkward. Remember that Regexp::Common gives you patterns, that can be interpolated in a regexp. For instance, if you want to count the number of HTTP URIs in a string, Regexp::Common doesn't give you a function to that directly, but it does do the hard work for you, it gives you the pattern: `$count = () = $str =~ /$RE{URI}{HTTP}/;` [download] Patches are more than welcome, or even suggestions what to include. The next version of Regexp::Common is planned to be released shortly after 5.8.1 comes out. The major addition will be ISBN numbers, checking against the latest country/publisher lists. Abigail	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Dependencies, or, How Common is Regexp::Common?
by Abigail-II (Bishop) on Sep 17, 2003 at 21:12 UTC

For example, I was told to write something to make sure an address was valid. So, I simply made sure that the string had a number and a letter in it... and it did get a little bit of filtering done. Is there a better solution? Aren't many people having to validate addresses? How are you doing it?

Personally, I don't think such a thing belongs in Regexp::Common, because there are no clear rules on what is a valid address. You could make some heuristics, but they will give many false positives, and false negatives. And the heuristics will differ from country to country.

Also, I am not sure how open Abigail-II is to new additions to the module

As for contacting me, email is preferred (regexp-common@abigail.nl). I don't do the chatterbox, so don't waste your time messaging me.

As for the profanity regex, that's entirely Damians work, including the nifty encoding. Had it not been there when I started maintaining it, I would not have added. The problem I have with it, is that it's so subjective. Who am I to decide what's profanity, and what isn't? You can never be complete on this one, and where do you stop?

Regarding (b), the regexp to count the number of a certain character in a string is very simple, and the task to count is also simple, but neither was readily available in the distro.

c

    /^(?{$count = 0})[^c]*(?:c(?{$count ++})[^c]*)*/
[download]

    tr/c/c/
[download]

    $count = () = /$pat/g;
[download]

    $count = () = $str =~ /$RE{URI}{HTTP}/;
[download]

Patches are more than welcome, or even suggestions what to include.

The next version of Regexp::Common is planned to be released shortly after 5.8.1 comes out. The major addition will be ISBN numbers, checking against the latest country/publisher lists.

Abigail

[reply]
[d/l]
[select]