Re: Parsing and converting HTML

Try writing a recursive function using content_list method of HTML::Element.

For example,

my $html = HTML::TreeBuilder->new_from_content("$text") || die "$@\n";
sub to_text {
 if (ref $_[0] eq "HTML::Element") {
  foreach my $sub_element ($_[0]->content_list) {
   &to_text($sub_element);
  }
 } else {
  print qq{text="$_[0]"};
 }
}
&to_text($html);
[download]

Sorry if my advice was wrong.

Comment on Re: Parsing and converting HTML Select or Download Code

Replies are listed 'Best First'.
Re^2: Parsing and converting HTML by tevolo (Novice) on Jul 26, 2012 at 23:54 UTC
Hello, thanks but for some reason this did not seem to work. Though it is probably something I am doing wrong. here is my code `#!c:/strawberry/perl/bin/perl.exe use HTML::TokeParser; use HTML::Element; use HTML::TreeBuilder; use warnings; open(MYINPUTFILE, '<C:\acs\SA\content\acs\meetings\expositions\CNBP_ +028491'); while(<MYINPUTFILE>) { my $text = $_; my $html = HTML::TreeBuilder->new_from_content("$text") \|\| die "$@\n +"; sub to_text { if (ref $_[0] eq "HTML::Element") { foreach my $sub_element ($_[0]->content_list) { &to_text($sub_element); } } else { print qq{text="$_[0]"}; } } &to_text($html); }` [download] any other thoughts or did I miss something? Thanks again	[reply] [d/l]
Re^3: Parsing and converting HTML by aitap (Curate) on Jul 27, 2012 at 07:28 UTC
You are trying to parse your file by line. Every line an HTML::Element object gets created and then destroyed. You can use `new_from_file` HTML::TreeBuilder method instead. Sorry if my advice was wrong.	[reply] [d/l]
Re^3: Parsing and converting HTML by Anonymous Monk on Jul 26, 2012 at 23:59 UTC
my $tree = HTML::TreeBuilder->new; $tree->parse_file( $filename ); ...	[reply]
Re^2: Parsing and converting HTML by tevolo (Novice) on Jul 26, 2012 at 19:15 UTC
Thanks!!! I will give it a try and report back.	[reply]