in reply to Dependencies, or, How Common is Regexp::Common?

"Never use dependencies other than what comes with the default install,"

The drawbacks are added complexity and a dependency of perhaps questionable value (for this application: notice I have no heartache using CGI :).

I think such attitudes totally miss the point of Open Source in general, and CPAN in particular. What is the point of sharing software, if people balk at the slightess inconvenience and don't want to use what's available. Does everything have to be delivered to your doorstep?

It seems that certain people would prefer that Perl comes with everything that's available on CPAN - and then some. Beside that that would make it impossible to ever release a version of Perl again (just look at how hard it is to release 5.8.1, which is partially due to the bloat, and wanting to service everyone).

It's far better for packages to live on CPAN. Then at least there is the potential that they will be update soon after a bug is revealed. Suppose Regexp::Common came with 5.8.0, and it had a bug. The earliest release that would fix the bug will be 5.8.1, which, if it came out today, would be 14 months after 5.8.0. And if you have a hard time to convince people to install a module, think how hard it's going to be to convince them to install a new version of Perl! And if there would be a bug in Regexp::Common released with 5.8.1, do you have any idea how long you have to wait for a new release? The track to 5.10 was started in July 2002. It's now September 2003, and there isn't even any sign of a 5.9.0. You might have to wait *years* for a bugfix.

Having said that, Regexp::Common is easy to install. It's a pure Perl module, and I don't have intention to ever turn it into something that isn't pure Perl. All you need to do is (recursively) copy the files in the 'lib' directory of the distribution. How hard can that be? But even if you don't want to install Regexp::Common, there is always the option to copy the code. Of course, your own license may prevent that, and you do have to do more work in case the code you copied gets upgraded, but the license of Regexp::Common allows you to go this way.

Abigail

  • Comment on Re: Dependencies, or, How Common is Regexp::Common?

Replies are listed 'Best First'.
Re: Re: Dependencies, or, How Common is Regexp::Common?
by princepawn (Parson) on Sep 17, 2003 at 16:12 UTC
    I wonder about Regexp::Common. I often have tasks that a regular-expression related and then I look at what the module offers and usually it doesn't have what I need.

    By "what I Need" I mean 2 things. (a) it lacks a certain common regular expression (b) it lacks a certain tasks related to regular expressions

    By (a), what I mean is sometimes a regular expression is common, but not in that distro. For example, I was told to write something to make sure an address was valid. So, I simply made sure that the string had a number and a letter in it... and it did get a little bit of filtering done. Is there a better solution? Aren't many people having to validate addresses? How are you doing it? Also, I am not sure how open Abigail-II is to new additions to the module and I am not sure if I should use rt.cpan.org or email him. He is certainly very present here, so I could msg him. But also by (a) what I mean is that Abigail and Damian are both non-American, and so their profanity regular expressions were way off the mark. I had never even heard of some of the terms they thought were bad and others are completely normal in American context (e.g, "bl**dy"). So I coded Regexp::US::Profanity to do filtering with

    Regarding (b), the regexp to count the number of a certain character in a string is very simple, and the task to count is also simple, but neither was readily available in the distro. And again, I was afraid to contact the author about it, so I just whipped up some lines of code to do it

    Carter's compass: I know I'm on the right track when by deleting something, I'm adding functionality.

      For example, I was told to write something to make sure an address was valid. So, I simply made sure that the string had a number and a letter in it... and it did get a little bit of filtering done. Is there a better solution? Aren't many people having to validate addresses? How are you doing it?

      Personally, I don't think such a thing belongs in Regexp::Common, because there are no clear rules on what is a valid address. You could make some heuristics, but they will give many false positives, and false negatives. And the heuristics will differ from country to country.

      Also, I am not sure how open Abigail-II is to new additions to the module
      The PODs have always suggested there are not enough regexes and has asked for people to send them. In the year and a half that I'm taken care of this module, I haven't had enough regexes send in to need a second hand to count them.

      As for contacting me, email is preferred (regexp-common@abigail.nl). I don't do the chatterbox, so don't waste your time messaging me.

      As for the profanity regex, that's entirely Damians work, including the nifty encoding. Had it not been there when I started maintaining it, I would not have added. The problem I have with it, is that it's so subjective. Who am I to decide what's profanity, and what isn't? You can never be complete on this one, and where do you stop?

      Regarding (b), the regexp to count the number of a certain character in a string is very simple, and the task to count is also simple, but neither was readily available in the distro.
      The regexp is simple? You'd have to write something like (assuming you want to count the occurrance of the character c:
      /^(?{$count = 0})[^c]*(?:c(?{$count ++})[^c]*)*/
      which I don't think is simple. I wouldn't use a regex for that, I'd use
      tr/c/c/
      and if you want to count the number of non-overlapping matches of a pattern, I'd use:
      $count = () = /$pat/g;
      To catch that inside a single regex is really awkward. Remember that Regexp::Common gives you patterns, that can be interpolated in a regexp. For instance, if you want to count the number of HTTP URIs in a string, Regexp::Common doesn't give you a function to that directly, but it does do the hard work for you, it gives you the pattern:
      $count = () = $str =~ /$RE{URI}{HTTP}/;

      Patches are more than welcome, or even suggestions what to include.

      The next version of Regexp::Common is planned to be released shortly after 5.8.1 comes out. The major addition will be ISBN numbers, checking against the latest country/publisher lists.

      Abigail