Re^2: problem with user-defined unicode character properties

Unfortunately, that doesn't solve the problem, although it may bring us a step closer to the solution.

Without the trailing spaces, I've discovered that the last character pattern in the subroutine does get executed as necessary. For instance:

#! usr/local/perl 
use utf8;

sub InRussian{
    return <<'END'; 
+utf8::Cyrillic
+utf8::Punctuation 
END
}

print s/[\P{InRussian}]//g;
[download]

gets me only numbers and punctuation, whereas having the +utf8::Cyrillic after +utf8::Punctuation in the subroutine produces the same output as the direct application of the InCyrillic pattern print s/[\P{InCyrillic}]//g;

Does this make any sense?

Comment on Re^2: problem with user-defined unicode character properties Select or Download Code

Replies are listed 'Best First'.
Re^3: problem with user-defined unicode character properties by BrowserUk (Patriarch) on Jun 11, 2007 at 15:41 UTC
Um...two guesses. You are using negation `\P{}` and `NOT(+A +B)` doesn't mean what you intend. Eg. 'not in A and not in B'? Maybe you need (something like): `sub NotInRussian{ return <<'END'; !utf8::Cyrillic !utf8::Punctuation END } ... s/\p{NotInRussian}//g` [download] It might have something to do with this from the POD? A final note on the user-defined property tests and mappings: they will be used only if the scalar has been marked as having Unicode characters. Old byte-style strings will not be affected. Does your editor produce unicode source files? Will Perl promote ASCII source to unicode? Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error. "Science is about questioning the status quo. Questioning authority". In the absence of evidence, opinion is indistinguishable from prejudice. "Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re^3: problem with user-defined unicode character properties
by BrowserUk (Patriarch) on Jun 11, 2007 at 15:41 UTC

Um...two guesses.

You are using negation \P{} and NOT(+A +B) doesn't mean what you intend. Eg. 'not in A and not in B'?
Maybe you need (something like):
```
sub NotInRussian{
    return <<'END'; 
!utf8::Cyrillic
!utf8::Punctuation 
END
}
...
s/\p{NotInRussian}//g
[download]
```
It might have something to do with this from the POD?
A final note on the user-defined property tests and mappings: they will be used only if the scalar has been marked as having Unicode characters. Old byte-style strings will not be affected.

Does your editor produce unicode source files? Will Perl promote ASCII source to unicode?

Examine what is said, not who speaks -- Silence betokens consent -- Love the truth but pardon error.

"Science is about questioning the status quo. Questioning authority".

In the absence of evidence, opinion is indistinguishable from prejudice.

"Too many [] have been sedated by an oppressive environment of political correctness and risk aversion."

[reply]
[d/l]
[select]