brian_d_foy has asked for the wisdom of the Perl Monks concerning the following question:

Over the holiday I tried to clean out my perlfaq updates backlog, but I have a couple of stubborn answers I need help with (much like in Help me update stubborn perlfaq answers!).

6.9 What is "/o" really for?

Update ikegami has the answer, but does anyone remember the patch or thread on p5p that made it official?

I've been told, but haven't been able to verify when this change took place, that the latest versions of Perl won't recompile a regular expression using interpolation if the variable hasn't changed. I don't see any notes in the perldelta about it. Any information there? If I can't say something like "Since perl5.6, ....", then I won't change the answer.

6.6 How can I make "\w" match national character sets?

Note: doc references are for the docs as of Perl 5.8.8

I asked about this one in Help me update stubborn perlfaq answers! and got part of an answer about \p{} in Re: Help me update stubborn perlfaq answers!. I don't have experience with that sorta stuff, so maybe someone can give me some good examples along with the text to explain them.

Also, Alan Flavell pointed out to me that although the answer says to see perllocale, that says in Unicode and UTF-8:

Usually locale settings and Unicode do not affect each other, but there are exceptions, see "Locales" in perlunicode for examples.
Where, over in perlunicode it says in the BUGS section:
Use of locales with Unicode data may lead to odd results. Currently, Perl attempts to attach 8-bit locale info to characters in the range 0..255, but this technique is demonstrably incorrect for locales that use characters above that range when mapped into Unicode. Perl's Unicode support will also tend to run slower. Use of locales with Unicode is discouraged.

Not only that, but in Security Implications of Unicode in the same document, it says:

...unlike most locales, which are specific to a language and country pair, Unicode classifies all the characters that are letters somewhere as \w.

Dr. Ruud sent me a nice little program to find out just how many things show up in \w, and it's a lot more than just a national character set.

--
brian d foy <brian@stonehenge.com>
Subscribe to The Perl Review

Replies are listed 'Best First'.
Re: Year end FAQ stubborn answer help, 2006 edition
by ikegami (Patriarch) on Nov 28, 2006 at 02:24 UTC

    I've been told, but haven't been able to verify when this change took place, that the latest versions of Perl won't recompile a regular expression using interpolation if the variable hasn't changed.

    I didn't find any mention of this in perl56delta, perl561delta, the Changes file for 5.6.0, or the Changes file for 5.6.1.

    However, it's easy to verify whether a given version has the optimization or not. Since at least 5.6.0,

    use re 'debug'; for (1..2) { $re_str = $_; print("***** $_ *****\n"); /$re_str/; } $re_str = "\\d"; for (3..4) { print("***** $_ *****\n"); /$re_str/; }

    outputs

    ***** 1 ***** Compiling REx `1' Guessing start of match, REx "1" against "1"... ***** 2 ***** Compiling REx `2' Guessing start of match, REx "2" against "2"... ***** 3 ***** Compiling REx `\d' Matching REx "\d" against "3" ***** 4 ***** Matching REx "\d" against "4" <--- No "Compiling" before this.

    (irrelevant lines omitted).

    I've personally tested this with 5.6.0, 5.6.1, 5.8.0 and 5.8.8.

Re: Year end FAQ stubborn answer help, 2006 edition
by greatshots (Pilgrim) on Nov 28, 2006 at 01:15 UTC