in reply to HTML::Treebuilder Special characters

Looks like you ran into a UTF-problem.

Are the chars HTML-encoded or are they written as plain chars?
HTML-encoded: ü
Plain: ü

You should also check the charset-setting of your HTML page

  • Comment on Re: HTML::Treebuilder Special characters

Replies are listed 'Best First'.
Re^2: HTML::Treebuilder Special characters
by jai_dgl (Beadle) on Sep 08, 2009 at 14:59 UTC
    I get the proper content from the page with the same look and feel
    plain text : ü
    but when the HTML content is parsed using HTML::TreeBuilder the
    plain text is converted into HTML codes.

    Thanks
    Jey
      This function helped me to solve the issue
      sub encode_entities_decimal { my $text = shift; $text =~ s{([^\0-\x7f])}{sprintf("&#%d;",ord($1))}ge; $text; }