How do you
fetch and
process the file? Your original code example has no
use utf8; and does not UTF-8-encode the output. You get your original string only because of a cancellation of errors:
- Your file is UTF-8-encoded but you don't declare this to Perl. Perl reads the individual bytes of the UTF-8-encoding which are no word characters and thus won't match \w.
- You just print the bytes. If you are using a UTF-8 terminal, this "works" because the terminal decodes your bytes.
Perl's default encoding is not UTF-8. If you read the file and decode it from UTF-8 you should be fine. If you fetch with LWP, you can either print $response->content (without encoding it) or encode $response->decoded_content before printing.