This was extremely helpful, thank you! I wasn't aware of look_down and had I known of its existence from the start it would have made life so much easier.

I'm having some trouble grabbing some content from the aside content, however. This could be caused by the HTML structure of the website, but I'm hoping I'm wrong and that a workaround exists. I'm trying to grab the "Interviewers" list which lies inside a div ID of "innercontent." The problem is that even when I give it string match it's returning everything inside the innercontent div. Here's what I have.

#!/usr/bin/perl -w use strict; use WWW::Mechanize; use WWW::Mechanize::TreeBuilder; use feature qw/ say /; use Data::Dumper; my $mech = WWW::Mechanize->new(); WWW::Mechanize::TreeBuilder->meta->apply($mech); $mech->get("http://millercenter.org/president/clinton/oralhistory/made +leine-k-a$ # introduction for ( $mech->look_down(_tag => "div", id => 'introduction') ) { next unless $_->as_trimmed_text =~ m/Publicly released transcripts/; say $_->as_HTML; } # interviewers for ( $mech->look_down(_tag => "div", id => 'innercontent') ) { next unless $_->as_trimmed_text =~ m/Interviewers:/; say $_->as_HTML; } # interview my @list = $mech->find('dl'); foreach ( @list ) { print $_->as_HTML(); }

In reply to Re^2: Extracting specific <p>content</p> using WWW::Mechanize by mserino
in thread Extracting specific <p>content</p> using WWW::Mechanize by mserino

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.