in reply to Re: problem with user-defined unicode character properties
in thread problem with user-defined unicode character properties

Unfortunately, that doesn't solve the problem, although it may bring us a step closer to the solution.

Without the trailing spaces, I've discovered that the last character pattern in the subroutine does get executed as necessary. For instance:

#! usr/local/perl use utf8; sub InRussian{ return <<'END'; +utf8::Cyrillic +utf8::Punctuation END } print s/[\P{InRussian}]//g;
gets me only numbers and punctuation, whereas having the +utf8::Cyrillic after +utf8::Punctuation in the subroutine produces the same output as the direct application of the InCyrillic pattern print s/[\P{InCyrillic}]//g;

Does this make any sense?

Replies are listed 'Best First'.
Re^3: problem with user-defined unicode character properties
by BrowserUk (Patriarch) on Jun 11, 2007 at 15:41 UTC

    Um...two guesses.

    1. You are using negation \P{} and NOT(+A +B) doesn't mean what you intend. Eg. 'not in A and not in B'?

      Maybe you need (something like):

      sub NotInRussian{ return <<'END'; !utf8::Cyrillic !utf8::Punctuation END } ... s/\p{NotInRussian}//g
    2. It might have something to do with this from the POD?
      A final note on the user-defined property tests and mappings: they will be used only if the scalar has been marked as having Unicode characters. Old byte-style strings will not be affected.

      Does your editor produce unicode source files? Will Perl promote ASCII source to unicode?


    Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.
    "Science is about questioning the status quo. Questioning authority".
    In the absence of evidence, opinion is indistinguishable from prejudice.