comment on

This was extremely helpful, thank you! I wasn't aware of look_down and had I known of its existence from the start it would have made life so much easier.

I'm having some trouble grabbing some content from the aside content, however. This could be caused by the HTML structure of the website, but I'm hoping I'm wrong and that a workaround exists. I'm trying to grab the "Interviewers" list which lies inside a div ID of "innercontent." The problem is that even when I give it string match it's returning everything inside the innercontent div. Here's what I have.


#!/usr/bin/perl -w

use strict;
use WWW::Mechanize;
use WWW::Mechanize::TreeBuilder;
use feature qw/ say /;
use Data::Dumper;

my $mech = WWW::Mechanize->new();
WWW::Mechanize::TreeBuilder->meta->apply($mech);
$mech->get("http://millercenter.org/president/clinton/oralhistory/made
+leine-k-a$

# introduction
for ( $mech->look_down(_tag => "div", id => 'introduction') ) {
  next unless $_->as_trimmed_text =~ m/Publicly released transcripts/;
  say $_->as_HTML;
}

# interviewers
for ( $mech->look_down(_tag => "div", id => 'innercontent') ) {
  next unless $_->as_trimmed_text =~ m/Interviewers:/;
  say $_->as_HTML;
}

# interview
my @list = $mech->find('dl');
foreach ( @list ) {
print $_->as_HTML();
}
[download]

In reply to Re^2: Extracting specific content using WWW::Mechanize by mserino
in thread Extracting specific content using WWW::Mechanize by mserino

Posts are HTML formatted. Put   tags around your paragraphs. Put <code> </code> tags around your code and data!

Titles consisting of a single word are discouraged, and in most cases are disallowed outright.

Read Where should I post X? if you're not absolutely sure you're posting in the right place.

Please read these before you post! —

Posts may use any of the Perl Monks Approved HTML tags:

a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr

You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)

	For:		Use:
	&		`&`
	<		`<`
	>		`>`
	[		`[`
	]		`]`

Link using PerlMonks shortcuts! What shortcuts can I use for linking?

See Writeup Formatting Tips and other pages linked from there for more info.