in reply to Re^2: UTF8 versus \w in pattern matching (basic test)
in thread UTF8 versus \w in pattern matching

How do you fetch and process the file? Your original code example has no use utf8; and does not UTF-8-encode the output. You get your original string only because of a cancellation of errors:

Perl's default encoding is not UTF-8. If you read the file and decode it from UTF-8 you should be fine. If you fetch with LWP, you can either print $response->content (without encoding it) or encode $response->decoded_content before printing.

  • Comment on Re^3: UTF8 versus \w in pattern matching (basic test)