in reply to XML::Twig, HTML::Table, and wide characters

Humm.. I am a bit busy at the moment, so I haven't had much time to look deep into this. Encoding problems are often a pain.

My guess is that the problem is with the parse_html. It uses HTML::TreeBuilder to turn the HTML into XML, as does set_inner_html BTW. At this point I am not sure of the encodings of the various strings in the twig.

The only thing I can think of: could you try doing the decode before parsing the html? This way you would be working in utf-8?

Replies are listed 'Best First'.
Re^2: XML::Twig, HTML::Table, and wide characters
by eff_i_g (Curate) on Jan 16, 2009 at 15:58 UTC
    Like this? I get the same result:
    my $new_elt = XML::Twig::Elt->new('div'); my $table = HTML::Table->new([[1,2]]); my $content = decode('iso-8859-1', $table->getTable()); $new_elt->set_inner_html($content); $new_elt->paste(before => $_);
    Something else to note: If I remove the "TM" from the original XML, everything works; however, as long as it is present, it upsets set_inner_(?:ht|x)ml.