I know someone will say you shouldn't use reg expressions on HTML but if I did the LinkExtractor I'd have had links I didn't want and would have had to do regexes anyhow.
while($google_results =~ m|<p\sclass=g><a href=(http://www\..+)\son +mousedown|gs) { push (@links_found, $1); }
For some reason that's not matching with this data...
<p class=g><a href=http://www.ets.org/toefl/ onmousedown="return clk(t +his,'res',1)">Welcome to TOEFL: The <b>Test</b> of English as a Forei +gn Language</a><br><font size=-1>Information about the TOEFL tests an +d services are available online. Try the<br> TOEFL practice questions.<br><font color=#008000>www.ets.org/toefl/ - + 18k - </font><nobr> <a class=fl href="http://64.233.161.104/search? +q=cache:Gq-KV5uuj6YJ:www.ets.org/toefl/+test&hl=en">Cached</a> - <a c +lass=fl href="/search?hl=en&lr=&safe=off&q=related:www.ets.org/toefl/ +">Similar&nbsp;pages</a></nobr></font> <blockquote class=g><p class=g +><a href=http://www.ets.org/testcoll/ onmousedown="return clk(this,'r +es',2)">The ETS <b>Test</b> Collection includes an extensive library +of 20000 <b>...</b></a><br><font size=-1>The ETS <b>Test</b> Collecti +on includes an extensive library of 20000 tests and other<br> measurement devices from the early 1900s to the present.&lt;/<br><font + color=#008000>www.ets.org/<b>test</b>coll/ - 11k - </font><nobr> < +a class=fl href="http://64.233.161.104/search?q=cache:mY1iJUWuYoEJ:ww +w.ets.org/testcoll/+test&hl=en">Cached</a> - <a class=fl href="/searc +h?hl=en&lr=&safe=off&q=related:www.ets.org/testcoll/">Similar&nbsp;pa +ges</a></nobr></font> </blockquote><p class=g><a href=http://www.test +.com/ onmousedown="return clk(this,'res',3)">
As you can see I am going through a lot of junk just to get all the URLs from the search page. Can someone help tweak my regex a bit?

In reply to page parsing regex by coldfingertips

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.