Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

:) help

#!/usr/bin/perl -- use strict; use warnings; use HTML::HTML5::Parser; use HTML::HTML5::ToText; binmode STDIN; binmode STDOUT, ':encoding(UTF-8)'; my $dom = HTML::HTML5::Parser->load_html(IO => \*STDIN); print HTML::HTML5::ToText ->with_traits(qw/ShowLinks ShowImages RenderTables/) ->new() ->process($dom); __END__ Can't locate object method "#COMMENT" via package "MooseX::Traits::__A +NON__::SERIAL::1" at HTML/HTML5/ToText.pm line 129, <STDIN> line 3016 +.
  • Comment on Can't locate object method "#COMMENT" via package "MooseX::Traits::__ANON__::SERIAL::1" at HTML/HTML5/ToText.pm line 129, <STDIN> line 3016.
  • Download Code

Replies are listed 'Best First'.
Re: Can't locate object method "#COMMENT" via package "MooseX::Traits::__ANON__::SERIAL::1" at HTML/HTML5/ToText.pm line 129, <STDIN> line 3016.
by Anonymous Monk on Apr 17, 2013 at 05:20 UTC

    A workaround that needs more work (the test fails )

    diff -ruN HTML-HTML5-ToText-0.002/lib/HTML/HTML5/ToText.pm HTML-HTML5- +ToText-0.00201/lib/HTML/HTML5/ToText.pm --- HTML-HTML5-ToText-0.002/lib/HTML/HTML5/ToText.pm 2012-01-31 01: +41:12.000000000 -0800 +++ HTML-HTML5-ToText-0.00201/lib/HTML/HTML5/ToText.pm 2013-04-16 2 +2:08:08.296875000 -0700 @@ -126,7 +126,10 @@ else { my $elem = uc $kid->nodeName; - my $str = $self->$elem($kid, %args); + $elem =~ s/[^A-Z_0-9]//g; + local $@; + my $str = eval { $self->$elem($kid, %args) }; + $@ and next ; if ($str =~ m{^\n} and not $kid->previousSibling) { diff -ruN HTML-HTML5-ToText-0.002/MANIFEST HTML-HTML5-ToText-0.00201/M +ANIFEST --- HTML-HTML5-ToText-0.002/MANIFEST 2012-01-31 01:44:50.000000000 +-0800 +++ HTML-HTML5-ToText-0.00201/MANIFEST 2013-04-16 22:17:30.12500000 +0 -0700 @@ -47,4 +47,5 @@ t/01basic.t t/02simple.t t/03tables.t +t/04comment.t SIGNATURE Public-key signature (added +by MakeMaker) diff -ruN HTML-HTML5-ToText-0.002/t/04comment.t HTML-HTML5-ToText-0.00 +201/t/04comment.t --- HTML-HTML5-ToText-0.002/t/04comment.t 1969-12-31 16:00:00.00000 +0000 -0800 +++ HTML-HTML5-ToText-0.00201/t/04comment.t 2013-04-16 22:19:34.828 +125000 -0700 @@ -0,0 +1,26 @@ +use Test::More tests => 1; +use HTML::HTML5::Parser; +use HTML::HTML5::ToText; + +my $dom = HTML::HTML5::Parser->load_html(IO => \*DATA); +my $str = HTML::HTML5::ToText->with_traits(qw/TextFormatting ShowLink +s ShowImages/)->process($dom); + +my $output = <<'OUTPUT'; +Foo +LINK: <style.css> (stylesheet) + +*Hello world <http://example.com>* + +_how_are +[IMG:_you]?_ +OUTPUT + +#~ use Data::Dump qw/ dd pp /; warn pp([ $str, $output] ); +is $str, $output; + +__DATA__ +<!doctype html> +<title>Foo</title> +<link rel=stylesheet href=style.css> +<!-- a comment here --><p><!-- a comment here --><b><!-- a comment he +re -->Hello <a href="http://example.com">world</a></b><!-- a comment +here --></p> +<!-- a comment here --><p><!-- a comment here --><i>how are<br><img s +rc=you.jpeg alt=you>?</i><!-- a comment here --></p>

      Thanks - I've added a link to this thread to the bugtracker.

      package Cow { use Moo; has name => (is => 'lazy', default => sub { 'Mooington' }) } say Cow->new->name
Re: Can't locate object method "#COMMENT" via package "MooseX::Traits::__ANON__::SERIAL::1" at HTML/HTML5/ToText.pm line 129, <STDIN> line 3016. (encoding)
by Anonymous Monk on Apr 17, 2013 at 05:41 UTC

    And here is an encoding related bug in HTML::HTML5::ToText or HTML::HTML5::Parser

    #!/usr/bin/perl -- use Test::More tests => 1; use HTML::HTML5::Parser; use HTML::HTML5::ToText; my $input = "<!DOCTYPE html PUBLIC \"-//W3C//DTD XHTML 1.0 Transitiona +l//EN\" \"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">\ +r\n<head>\r\n<meta http-equiv=\"Content-Type\" content=\"text/html; c +harset=ISO-8859-1\" />\r\n<title> \x93literal smart quotes\x94 </titl +e>\r\n<body><p> num-ent apostrophe &#8217; </p>\n<p> num-ent double-d +ash &#8211; </p></body>\r\n</html>"; my $dom = HTML::HTML5::Parser->load_html( string => \$input ); my $str = HTML::HTML5::ToText->with_traits(qw/TextFormatting ShowLinks + ShowImages/)->process($dom); #~ use Data::Dump qw/ pp /; warn pp($str); #~ "\x{201C}literal smart quotes\x{201D}\n\nnum-ent apostrophe \xE2\x8 +0\x99\n\nnum-ent double-dash \xE2\x80\x93\n"; my $expected = "\x{201C}literal smart quotes\x{201D}\n\nnum-ent apostr +ophe \N{U+2019}\n\nnum-ent double-dash \N{U+2013}\n"; is $str, $expected;

    \xE2\x80\x93 is the utf-8 encoding of \N{U+2013}, but its a byte string, appended to a perl-utf string, and so corrupted

      use Data::Dump qw/ dd pp /; die pp $dom->textContent; shows its an HTML::HTML5::Parser issue