I'm using LWP to get a page, everything is working there. But I am having trouble matching the lines of HTML. The script works fine if I'm fetching with LWP, but when I switched the pipe to DATA it does nothing, I have included it just so you can see a sample of what I am trying to read.

The main problem is that the script is only finding the first match (pam anderson). I started out just trying to match in the first text document, in the table row, but I actually want to record all the information between the text documents, so my log reads:

Pam Anderson|Stacked|nc|#1|images/hof.gif|295
Paris Hilton|Nicole Out|nc|#2|images/hof.gif|65

#!/perl/bin/perl use LWP::UserAgent; use LWP::Simple; $getURL = "http://localhost/~owner/50.htm"; $script_name = "Fetcher"; print "Content-type: text/html\n\n"; $page_data; $content_type; $request; $response; $fiftypath = "data/fifty.dat"; $ua = LWP::UserAgent->new; $ua->timeout(10); $ua->agent("$script_name"); $request = HTTP::Request->new('GET',$getURL); $response = $ua->request($request); #@page_data = $response->content; @page_data = <DATA>; foreach $line (@page_data) { if ($line =~ /class=\"fifty\">([^<]+)<\/a>/) { open (LOG,">>$fiftypath"); flock(LOG, 2); print LOG "$1"; flock(LOG, 8); close(LOG); print "$1"; } } __DATA__ <table> <tr> <td valign="middle"><b><font size="2">1</font></b> </td> <td valign="middle"><a href="http://localhost/~owner/pam.htm" class="t +itle">Pam Anderson</a><br>Stacked</td> <td align="center" width="50">nc</td> <td align="center" width="50">#1</td> <td valign="middle" width="50%"><img src="images/hof.gif" width="15" h +eight="14" border="0" hspace="2"></td> <td valign="middle" width="50%">295</td> </tr> <tr> <td valign="middle"><b><font size="2">1</font></b> </td> <td valign="middle"><a href="http://localhost/~owner/paris.htm" class= +"title">Paris Hilton</a><br>Nicole Out</td> <td align="center" width="50">nc</td> <td align="center" width="50">#2</td> <td valign="middle" width="50%"><img src="images/hof.gif" width="15" h +eight="14" border="0" hspace="2"></td> <td valign="middle" width="50%">65</td> </tr> </table>

In reply to Searching HTML by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.