in reply to Re^5: XML::Twig and file encoding
in thread XML::Twig and file encoding

Also this is what's printed to STDOUT:

That is only half -- is that infile our outfile?

You're using 2 arg open and you forgot to binmode, see perlunitut: Unicode in Perl#I/O flow (the actual 5 minute tutorial) and Is there a way to automatically decode or encode?

Replies are listed 'Best First'.
Re^7: XML::Twig and file encoding
by slugger415 (Monk) on Aug 02, 2014 at 16:09 UTC

    I see! This was the bit I was missing:

    open(NEW, '>:encoding(UTF-8)', $outfile)

    That seems to do the trick. Thanks for the tips! Really appreciate it.

Re^7: XML::Twig and file encoding
by slugger415 (Monk) on Sep 18, 2014 at 14:19 UTC

    hello, sorry to keep harping on this, but I continue to experience character mishaps when using XML::Twig.

    The files are UTF8. They contain things like smart quotes, n-dashes, special spaces, etc. When I resave them I set UTF8:

       open(NEW,'>:encoding(UTF-8)', $outfile)

    But then those special characters turn to gibberish. So "0-9" becomes "0–9"

    My Twig setup:

    my $twig= XML::Twig->new( comments => 'keep', keep_encoding => 1, pretty_print => 'indented', twig_handlers => { ...} );

    Sorry to be slow here but I'm at a loss as to how to do this properly. (I did read the topics suggested but they seem to suggest encoding/decoding, and then it says not to do it if you don't want them encoded.)

    Thanks, Scott

      hm I seem to be getting better results by removing 'keep_encoding'... that doesn't make sense, so I'm clearly not understanding how that's supposed to work.

      forward...