Beefy Boxes and Bandwidth Generously Provided by pair Networks
Think about Loose Coupling
 
PerlMonks  

Re: Perl HTML confusion...

by trippledubs (Deacon)
on Sep 17, 2013 at 20:30 UTC ( #1054535=note: print w/replies, xml ) Need Help??


in reply to Perl HTML confusion...

If you want to get all the divs on a page, you make a tree out of the full page, look down the tree and collect all the divs, and then you can use method as_text to print out just the text.
use HTML::TreeBuilder; my $tree = HTML::TreeBuilder->new_from_file('test.html'); my @divs = $tree->look_down(_tag => 'div'); print $divs[0]->as_text();

I saved this node to test.html, and so it outputs your first post.

Output:

I'm having trouble with using Perl to parse an HTML file I have, where I'm trying to grab all <a>...

I'm not going to repost it all, but the full text of your first post is there.

When you want to match regular expressions you have to pass a sub ref to look_down. There is an example in HTML::Element. Also, here is a quick intro: HTML::Tree(Builder) in 6 minutes. And a more thorough article: HTML::Tree::Scanning

Replies are listed 'Best First'.
Re^2: Perl HTML confusion...
by AI Cowboy (Beadle) on Sep 17, 2013 at 22:29 UTC
    Many thanks for your post and help! This is great :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://1054535]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others examining the Monastery: (3)
As of 2022-09-26 09:15 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?
    I prefer my indexes to start at:




    Results (117 votes). Check out past polls.

    Notices?