I hate to ask this but I have been reading and trying to do this for a day. I read and installed the follwoing perl modules, HTML::Strip, Parser, TreeBuilder, and Element and I still cannot figure out how to get the data that I need.

I want to grab the following fields from an html page.

Harry Jones Wood Shop, 56789904,-938882991, Smith Rd New York, NY 14254, (154)555-1234

<div class=\042mytitle maximumtitle\042 id=\042idtitle\042> Harry Jone +s <b>Wood </b> &amp; Shop</div> latlng=56789904,-938882991,3132132133321 &amp; <div class=\042address\042 id=\042idaddr\042>737373 Smith Rd<br/>New Y +ork, NY 14254<br/></div><div class=\042 </div><div class=\042phone\042>(154) 555-1234&nbsp;-&nbsp;<span style= +\042display:none\042 class=\042my_hide\042>
I am able to get the web page into the program by doing this:
my $url = 'http://www.somepage.com'; # $browser->cookie_jar({}); #### use if the site requires cookies my $browser = LWP::UserAgent->new; my @ns_headers = ( 'User-Agent' => 'Mozilla/4.76 [en] (Win98; U)', 'Accept' => 'image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*', 'Accept-Charset' => 'iso-8859-1,*,utf-8', 'Accept-Language' => 'en-US', ); my $response = $browser->get( $url, @ns_headers); die "Can't get $url -- ", $response->status_line unless $response->is_success; die "Hey, I was expecting HTML, not ", $response->content_type unless $response->content_type eq 'text/html';
I have tried methods find->tag and otehrs and I am not getting anywhere. I also found a post on perlmonks regarding parsing and I edited the line fro mthe posting and tried this:
@addr = $response->content =~ /<div class=\042mytitle maximumtitle\042 + id=\042idtitle\042>"([^ "]+)"/gi;
Can you please help? Thanks

In reply to HTML parsing OR capturing text from a string within tags by kevyt

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.