in reply to Odd problems with UTF-8, regexps, and newer Perl versions

For me the problem goes away when I comment out the use encoding 'utf8' line (tested with 5.10.1).

Why do you think you need it? — use utf8 already tells Perl that the script source is in UTF-8 (and you can always use binmode to change layers for STDIN and STDOUT).

Replies are listed 'Best First'.
Re^2: Odd problems with UTF-8, regexps, and newer Perl versions
by choroba (Cardinal) on Jun 04, 2010 at 21:33 UTC
    I can replicate the problem in perl 5.10.0 too, but not in 5.8.8. almut's solution solves it.
Re^2: Odd problems with UTF-8, regexps, and newer Perl versions
by ablegrape (Initiate) on Jun 04, 2010 at 23:36 UTC

    Thanks for the quick reply. I tried that, too, and while the regexp then works, the behavior changes.

    With only 'use utf8':
    % ./test.pl
    yep, is UTF8
    success with B?ck
    

    I see, "use encoding" also sets binmode on STDIN and STDOUT, so that while just using 'use' I need to explicitly add the binmode.

    With use utf8 plus "binmode STDOUT ':utf8'":

    % ./test.pl
    yep, is UTF8
    success with Böck
    

    (My, Perl's unicode handling is complicated.) Now to see if I can apply this learning successfully to the original application, which is far more complex...

      I see, "use encoding" also sets binmode on STDIN and STDOUT, so that while just using 'use' I need to explicitly add the binmode.

      You can also use the open pragma for that, and also for future calls to open.

      Perl 6 - links to (nearly) everything that is Perl 6.