Re: HTML::Treebuilder look_down not working with <header>, <article> etc

Its not working when we refer to <header> or <article> etc.... What could be the Isssue?

We wonder too :

What could be the code that causes this "issue" ?
What could be the meaning of "not working" ?

"I'm fairly sure if they took porn off the Internet, there'd only be one website left, and it'd be called 'Bring Back the Porn!'"
-- Dr. Cox, Scrubs

Comment on Re: HTML::Treebuilder look_down not working with <header>, <article> etc

Replies are listed 'Best First'.
Re^2: HTML::Treebuilder look_down not working with <header>, <article> etc by Anonymous Monk on May 07, 2014 at 11:38 UTC
To clarify. I'm not the orginal poster but I got the same problem. `my $tree = HTML::TreeBuilder->new_from_content($webcrawler->content()) +; if (my $div = $tree->look_down(_tag => "article" )) { print $div->as_text(), "\n"; } else { print "Not found"; }` [download] This piece of code gives a "Not found" on this article: http://www.sueddeutsche.de/politik/thailand-regierungschefin-yingluck-verliert-ihr-amt-1.1953299 although there is an article tag To test the code I changed it to grab a piece in the article tag itself: `if (my $div = $tree->look_down(_tag => "p" , class=>"article entry-sum +mary")) { print $div->as_text(), "\n"; } else { print "Not found"; }` [download] It worked as expected and printed me "Das höchste Gericht in Thailand hat entschieden: Regierungschefin Yingluck Shinawatra ist des Verfassungsbruchs schuldig. Sie wurde sofort ihres Amtes enthoben. " So I can't seem to grab the article tag itself. Since article is an html5 tag this might be the problem but how can I solve this another way?	[reply] [d/l] [select]
Re^3: HTML::Treebuilder look_down not working with <header>, <article> etc by Anonymous Monk on May 07, 2014 at 12:45 UTC
And you're not telling treebuilder to keep unknown tags because?	[reply]
Re^4: HTML::Treebuilder look_down not working with <header>, <article> etc by Anonymous Monk on May 07, 2014 at 14:36 UTC
Because I didn't know of this feature =). I'm just copying together scripts and twist them. I'm not a real programmer. For everyone who is struggeling with the same problem. Here is the piece of code. `//new_from_file parse it instantly so i have to make a new my $tree = HTML::TreeBuilder->new(); //set ignore_unkown to false $tree->ignore_unknown(0); //than parse the content $tree->parse_content($webcrawler->content()); print CONTENT $tree; if (my $div = $tree->look_down(_tag => "article" , class=>"article hen +try")) { print $div->as_text(), "\n"; } else { print "Not found"; } $tree->delete();` [download] This gives me the expected result. Thanks ^_^ Didn't know there was a ignore_unknown which is by dafault true.	[reply] [d/l]