http://qs1969.pair.com?node_id=738846


in reply to Re: Modern best practices for multilingual regexp alphabetical character matching?
in thread Modern best practices for multilingual regexp alphabetical character matching?

Dear Monks,

Sorry to introduce myself by hijacking an old thread, but I have some related questions. I am a complete beginner and this topic confuses me the most. I didn't realize the problem until I used some automatic match variables ($` $& $') and parentheses. The output encoding which was fine until then broke. Following your advice and with trial-error I found that putting :

while (<>) { $_ = Encode::decode_utf8( $_ ); binmode STDOUT, ":utf8";
to the input corrects the encoding. It is strange that without these lines on the input, everything "looks" fine unless I use parentheses or automatic match variables. Is the encoding wrong all the way and somehow gets corrected on the output? Or is it correct and the automatic variables and parentheses break it? Considering that I work only with utf-8 files, should I make a habit of putting these lines every time I use input?

Best regards,

Martin