in reply to Re^5: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharProps - User Defined Character Properties )
in thread Namespace/advice for new CPAN modules for Thai & Lao

"but he meant some unicode string"

Yes. It definitely wouldn't work on an upper-ascii-type encoding such as Thai originally began with, without some form of encoding/decoding going on. I guess I put "UTF8" because that is what gets used most with Thai, and what I knew would work having developed strictly with that. I presume any Unicode type should work equally well, though I don't claim to be an expert on Unicode.

In your code example:

print "\$_ has got Thai" if m{ \p{InThai} |\p{InThaiCons} |\p{InThaiHCons} |\p{InThaiMCons} |\p{InThaiLCons} |\p{InThaiVowel} |\p{InThaiPreVowel} |\p{InThaiPostVowel} |\p{InThaiCompVowel} |\p{InThaiDigit} |\p{InThaiTone} |\p{InThaiPunct} }x;
...only the first item in the OR'ed list should ever see action. All of the subsequent categories are already "InThai", and the "InThai" token already comes standard with Perl, AFAIK (see pg. 172 of "Programming Perl, Third Edition"), so that code would do little to test additional functionality. If the first line (\p{InThai}) failed, none of the others should succeed either.

NOTE: I've updated my list to reflect your proposed name, but I've adapted it slightly to one that seems a better fit to me.

Blessings,

~Polyglot~

  • Comment on Re^6: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharProps - User Defined Character Properties )
  • Download Code

Replies are listed 'Best First'.
Re^7: Namespace/advice for new CPAN modules for Thai & Lao ( Regexp::CharClasses::Thai / Lingua::Thai::RegexpCharClasses )
by Anonymous Monk on Mar 24, 2015 at 08:03 UTC

    In your code example: ...only the first item in the OR'ed list should ever see action...

    SYNOPSIS only shows whats possible, it can be repetitive and incorrect as long as the syntax is valid. And when the exports are few, might as well show them all instead of "..."

    Regexp::CharProps

    My suggestion was that you call yours Regexp::CharProps::Thai not Regexp::CharProps. Also to distribute a helper parent module Regexp::CharProps with it, so that others can add Regexp::CharProps::AnonyRands or whatever ... a new well named place for these definitions to live

    Regexp::Thai::CharClasses

    So are you're going to have more Thai Regexp's that aren't CharSlasses?

    I think you got that backwards, it should be Regexp::CharClasses::Thai :)

    Or it should go into Lingua::Thai::RegexpCharClasses? In case you're going to have more Lingua::Thai things that aren't RegexpCharClasses

    Yes. It definitely wouldn't work on an upper-ascii-type encoding such as Thai originally began with, without some form of encoding/decoding going on. I guess I put "UTF8" because that is what gets used most with Thai, and what I knew would work having developed strictly with that. I presume any Unicode type should work equally well, though I don't claim to be an expert on Unicode.

    Right :) the numbers are unicode code points , independent of encoding

      SYNOPSIS only shows whats possible, it can be repetitive and incorrect as long as the syntax is valid. And when the exports are few, might as well show them all instead of "..."

      Thank you for the clarification. I guess I misunderstood the intent of that. As is obvious, I've never submitted a module before, so I appreciate your patience with me.

      My suggestion was that you call yours Regexp::CharProps::Thai not Regexp::CharProps.

      Ok, I fixed that.

      Also to distribute a helper parent module Regexp::CharProps with it, so that others can add Regexp::CharProps::AnonyRands or whatever ... a new well named place for these definitions to live

      I have no idea how to do this.

      So are you're going to have more Thai Regexp's that aren't CharSlasses?
      I think you got that backwards, it should be Regexp::CharClasses::Thai :)

      Looking at that module now, perhaps it could all just go into Regexp::CharClasses, but I'm not the developer for that, and when I looked at its code, it's done in a somewhat different style which is confusing to me. I don't see any logical difference between Regexp::CharClasses::Thai and Regexp::Thai::CharClasses, except that, to my understanding, the former would be inhibited by the fact another developer has already used the Regexp::CharClasses namespace. Am I missing something here?

      Blessings,

      ~Polyglot~

        Looking at that module now, perhaps it could all just go into Regexp::CharClasses, but I'm not the developer for that, and when I looked at its code, it's done in a somewhat different style which is confusing to me. I don't see any logical difference between Regexp::CharClasses::Thai and Regexp::Thai::CharClasses, except that, to my understanding, the former would be inhibited by the fact another developer has already used the Regexp::CharClasses namespace. Am I missing something here?

        There is no real inhibition, only cooperation ... only flow :) go with it

        When I was doing my research I ignored Regexp::CharClasses because its kind of a one-off ... unlike (from same author) Regexp::Common which is designed to be an extensible module and expandable namespace ... and has been extended

        So that Regexp::CharClasses already exists, and its kind of experimental, I recommended Regexp::CharProps::Thai

        If you're going with three names, two out of three shouldn't be ends, and if using Regexp::Thai::CharClasses both Thai:: and CharClasses seem like ends

        I can imagine many many Regexp::CharClasses:: modules existing, I cannot imagine many Regexp::Thai:: modules existing

        Hey, you could even call it Regexp::ThaiCharClasses if you don't plan any kind of growth for the namespace :) sure it could be extended like Regexp::ThaiCharClasses::TheBestOnes but its not exactly likely/obvious/good :)

        You can find more discussions on naming modules in Re: RFC: Automatic logger module

        I have no idea how to do this.

        No idea how to do what? Since I posted the code, you have no idea how to release it? What filename to put it in? Something else?

        Here are some useful/related links