This was extremely helpful, thank you! I wasn't aware of look_down and had I known of its existence from the start it would have made life so much easier.
I'm having some trouble grabbing some content from the aside content, however. This could be caused by the HTML structure of the website, but I'm hoping I'm wrong and that a workaround exists. I'm trying to grab the "Interviewers" list which lies inside a div ID of "innercontent." The problem is that even when I give it string match it's returning everything inside the innercontent div. Here's what I have.
#!/usr/bin/perl -w use strict; use WWW::Mechanize; use WWW::Mechanize::TreeBuilder; use feature qw/ say /; use Data::Dumper; my $mech = WWW::Mechanize->new(); WWW::Mechanize::TreeBuilder->meta->apply($mech); $mech->get("http://millercenter.org/president/clinton/oralhistory/made +leine-k-a$ # introduction for ( $mech->look_down(_tag => "div", id => 'introduction') ) { next unless $_->as_trimmed_text =~ m/Publicly released transcripts/; say $_->as_HTML; } # interviewers for ( $mech->look_down(_tag => "div", id => 'innercontent') ) { next unless $_->as_trimmed_text =~ m/Interviewers:/; say $_->as_HTML; } # interview my @list = $mech->find('dl'); foreach ( @list ) { print $_->as_HTML(); }
In reply to Re^2: Extracting specific <p>content</p> using WWW::Mechanize
by mserino
in thread Extracting specific <p>content</p> using WWW::Mechanize
by mserino
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |