Thanks, everybody.
It came down to some gradual, one-step-at-a-time debugging combined with your advice above.
The wrong code which caused the problem:
my $xhtml = HTML::TreeBuilder::XPath->new; $xhtml->implicit_tags(1); $xhtml->parse_file($file) or die("Could not parse '$file' : $!\n");
The code which prevented the mutilation of the data:
. . . use open qw/:std :utf8/; . . . my $xhtml = HTML::TreeBuilder::XPath->new; $xhtml->implicit_tags(1); my $filehandle; open ($filehandle, "<", $file) or die("Could not open file '$file' : error: $!\n"); $xhtml->parse_file($filehandle) or die("Could not parse file handle for '$file' : $!\n");
So if I guess right, the use of a file handle which I have opened myself under the influence of the use open qw/:std :utf8/; pragma forced the data going into HTML::TreeBuilder::XPath to be read as UTF-8?
In reply to Re^3: Difficulty with UTF-8 and file contents
by mldvx4
in thread Difficulty with UTF-8 and file contents
by mldvx4
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |