In an attempt at brevity, here's what I'm using to grab text on a page between any given set of tags (
<ul> being the example in this case):
#!/usr/bin/perl -w
use strict;
use WWW::Mechanize;
use HTML::TokeParser;
my $agent = WWW::Mechanize->new();
$agent->get("http://www.perlmonks.com/");
my $stream = HTML::TokeParser->new(\$agent->{content});
$stream->get_tag("ul");
For all I know, that works just fine. Being that I want to verify that before I move on, how do I view the contents of $stream? A print function simply gives me "HTML::TokeParser=HASH(0x22dcd84)" as I'm suspecting it should. Past experiences have taught me that sometimes "as_text" or a garden variety of other text commands will format it as I expect, but I've been unable to figure it out for TokeParser. Can someone illustrate what's happening and how I can deal with this in the future? I'm sure my next problem beyond this will be cookie handling, but I'll leave that for another time.
(Yes, I've rtfm, but being new, the answer has thus far escaped me)
Thanks in advance!
Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
Read Where should I post X? if you're not absolutely sure you're posting in the right place.
Please read these before you post! —
Posts may use any of the Perl Monks Approved HTML tags:
- a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
| |
For: |
|
Use: |
| & | | & |
| < | | < |
| > | | > |
| [ | | [ |
| ] | | ] |
Link using PerlMonks shortcuts! What shortcuts can I use for linking?
See Writeup Formatting Tips and other pages linked from there for more info.