in reply to Re: Cleaning up HTML
in thread Cleaning up HTML

I get an error on the conditional,
 $result .= _html_tag($tagname, $attr, $attrseq);
sub DeWordifyHTML() says,
 Undefined subroutine &main::_html_tag called at ...
I resplaced the "sub" at the top and the curly at the bottom to run, starting with,
 #!/usr/bin/perl
 use HTML::Parser ();
        open(my $html, "<", $ARGV[0])...
Google and inspection of perl5/5.20/HTML/Parser.pm didn't help. I'm assuming the _html_tag() sub has been replaced by something else.

Do you have an update for this? I realized it's 9 year old code, but I thought I'd ask.

Thanks, Art

Replies are listed 'Best First'.
Re^3: Cleaning up HTML
by Jenda (Abbot) on Mar 02, 2016 at 15:50 UTC

    It's incomplete. Sorry.

    sub _html_tag { my ( $tag, $attr, $attrseq) = @_; my $html; $html = "<$tag"; if ($attrseq and ref($attrseq) eq 'ARRAY') { foreach my $key (@$attrseq) { if (defined $attr->{$key}) { $html .= " $key="._arg_escape($attr->{$key}); } else { $html .= ' '.$key; } } } elsif ($attr and ref($attr)) { foreach my $key (keys %$attr) { if (defined $attr->{$key}) { $html .= " $key="._arg_escape($attr->{$key}); } else { $html .= ' '.$key; } } } $html .= ">"; return $html; } sub _arg_escape { my $arg = shift; return qq{"$arg"} if ($arg !~ /"/); return qq{'$arg'} if ($arg !~ /'/); $arg =~ s/"/&dblquote;/g; return qq{"$arg"}; }

    Not sure it's complete like this. I never got around to releasing this as a module. If it's not complete, either download the Jenda.Rex zip from http://jenda.krynicky.cz/#Jenda.Rex, extract the .pm and dissect that (remove all references to Win32::OLE (needed only when the module is wrapped as a COM DLL for use in VB(script)) and Win32::Registry (only used to find out the code page used by the system), remove the whole package JendaRex::CSVParser, ...) or send me a message with your email and I'll send you the module.

    Jenda
    Enoch was right!
    Enjoy the last years of Rome.