Hi Monks,
I've been using this code for while just to have some news on my site from Google's, but it seems that they changed the <a href>'s tags. I am trying to fix it, but my parsing, or the regular expression is wrong, please can someone look at my code and help me I can't see it anymore.
Strict has been commented out for testing.
First, is how it should be if it's was alright:
<p><table border=0 width=90%><tr><th colspan=2 width=100%><font color= +"#336600"> </font></th></tr><tr><td height=2><img src="../images/clea +rspace.gif" width="4" height="2" border="0"></td><td></tr><tr><td>&nb +sp;</td><td> <a href="http://www.upi.com/view.cfm?StoryID=20040429-11 +1305-1663r" class="blu" target=_news>Bush, Cheney meet Sept. 11 panel +</a> - United Press International <tr><td height=2><img src="../images/clearspace.gif" width="4" height= +"2" border="0"></td><td></tr><tr><td>&nbsp;</td><td> <a href="http:// +www.baltimoresun.com/news/nationworld/bal-te.court29apr29,0,2515928.s +tory?coll=bal-nationworld-headlines" class="blu" target=_news>US dete +ntion tests scope of antiterror law</a> - Baltimore Sun <tr><td height=2><img src="../images/clearspace.gif" width="4" height= +"2" border="0"></td><td></tr><tr><td>&nbsp;</td><td> <a href="http:// +quote.bloomberg.com/apps/news?pid=10000080%26sid%3DagEm.asPqLU0%26ref +er%3Dasia" class="blu" target=_news>China Welcomes US Decision to Rej +ect Unions&#39; Trade Complaint</a> - Bloomberg <tr><td height=2><img src="../images/clearspace.gif" width="4" height= +"2" border="0"></td><td></tr><tr><td>&nbsp;</td><td> <a href="http:// +www.forbes.com/home/newswire/2004/04/29/rtr1353074.html" class="blu" +target=_news>Brazil says rich nations face many subsidy cases</a> - F +orbes <tr><td height=2><img src="../images/clearspace.gif" width="4" height= +"2" border="0"></td><td></tr><tr><td>&nbsp;</td><td> <a href="http:// +www.indystar.com/articles/3/142201-4853-009.html" class="blu" target= +_news>Bayh: Humvees needed</a> - Indianapolis Star <tr><td height=2><img src="../images/clearspace.gif" width="4" height= +"2" border="0"></td><td></tr><tr><td>&nbsp;</td><td> <a href="http:// +seattlepi.nwsource.com/national/apus_story.asp?category=1110%26slug%3 +DLA%2520Threat" class="blu" target=_news>Officials investigate LA mal +l threat</a> - Seattle Post Intelligencer </table></p>

Now is the wrong result:
<p><table border=0 width=90%><tr><th colspan=2 width=100%><font color= +"#336600"> </font></th></tr><tr><td height=2><img src="../images/clea +rspace.gif" width="4" height="2" border="0"></td><td></tr><tr><td>&nb +sp;</td><td> <a href="http://news.google.com/news?ned=us&hl=en&ncl=ht +tp://www.sacbee.com/content/politics/story/10576570p-11495493c.html" +class="blu" target=_news><nobr><b>all 2,276 related&nbsp;&raquo;</b>< +/nobr></a> - <b>Voice of America <tr><td height=2><img src="../images/clearspace.gif" width="4" height= +"2" border="0"></td><td></tr><tr><td>&nbsp;</td><td> <a href="http:// +news.google.com/news?ned=us&hl=en&ncl=http://www.voanews.com/article. +cfm?objectID=BD99AA44-85D4-4534-8556F1813B4FB0C8%26title%3DSoldier%25 +20Testifies%2520Against%2520Private%2520England%2520in%2520Iraqi%2520 +Prisoner%2520Abuse%26catOID%3D45C9C78F-88AD-11D4-A57200A0CC5EE46C%26c +ategoryname%3DUSA" class="blu" target=_news><nobr><b>all 376 related& +nbsp;&raquo;</b></nobr></a> - <b>Washington Post <tr><td height=2><img src="../images/clearspace.gif" width="4" height= +"2" border="0"></td><td></tr><tr><td>&nbsp;</td><td> <a href="http:// +news.google.com/news?ned=us&hl=en&ncl=http://www.washingtonpost.com/w +p-dyn/articles/A47521-2004Aug30.html" class="blu" target=_news><nobr> +<b>all 1,138 related&nbsp;&raquo;</b></nobr></a> - <b>Seattle Times <tr><td height=2><img src="../images/clearspace.gif" width="4" height= +"2" border="0"></td><td></tr><tr><td>&nbsp;</td><td> <a href="http:// +news.google.com/news?ned=us&hl=en&ncl=http://seattletimes.nwsource.co +m/html/localnews/2002020943_courtmartial31m.html" class="blu" target= +_news><nobr><b>all 350 related&nbsp;&raquo;</b></nobr></a> - <b>Guard +ian <tr><td height=2><img src="../images/clearspace.gif" width="4" height= +"2" border="0"></td><td></tr><tr><td>&nbsp;</td><td> <a href="http:// +news.google.com/news?ned=us&hl=en&ncl=http://www.guardian.co.uk/uslat +est/story/0,1282,-4463197,00.html" class="blu" target=_news><nobr><b> +all 300 related&nbsp;&raquo;</b></nobr></a> - <b>ABC News <tr><td height=2><img src="../images/clearspace.gif" width="4" height= +"2" border="0"></td><td></tr><tr><td>&nbsp;</td><td> <a href="http:// +news.google.com/news?ned=us&hl=en&ncl=http://abcnews.go.com/wire/Poli +tics/ap20040831_145.html" class="blu" target=_news><nobr><b>all 71 re +lated&nbsp;&raquo;</b></nobr></a> - <b>Chicago Sun Times </table></p>

Now, here is the code:
#!/perl/bin/perl #use strict; #use warnings; use LWP::UserAgent; use CGI qw(:header); use CGI::Carp qw(fatalsToBrowser); use CGI qw/:standard/; #print header(); my ($ua,$req,$res); my $pre="<tr><td height=2><img src=\"../images/clearspace.gif\" width= +\"4\" height=\"2\" border=\"0\"></td><td></tr><tr><td>&nbsp;</td><td> +"; my $post="</td></tr>"; my $x=0; $ua = LWP::UserAgent->new; $ua->agent("$0/0.1 " . $ua->agent); $ua->agent("Mozilla/8.0"); # pretend we are very capable browser $req = HTTP::Request->new(GET => 'http://news.google.com/news?ned=us&t +opic=n'); $req->header('Accept' => 'text/html'); # send request $res = $ua->request($req); # my $outfile = "news_test.txt"; # check the outcome #####'' my ($content) = $res->content; open(OUT,">news_test.txt") || die("Cannot Open File $outfile"); print OUT $content; close(OUT); if ($content) { $DATA ="<p><table border=0 width=90%><tr><th colspan=2 width=100%> +<font color=\"\#336600\"> </font></th></tr>"; @news_page=split(/<a class/,$content); foreach $line(@news_page){ if ($line=~/=p href=\"(.*?)\">(.*?)<\/a>(.*?)6f6f6f>(.*?)&/) { $edit_url=$1; #added $edit_url2=$2; #added $edit_url3=$3; #added $edit_url4=$4; #added open(OUT,">c:\\progra~1\\apache~1\\apache2\\htdocs\\acar\\line.txt") | +| die("Cannot Open File"); print OUT $line; close(OUT); # If news doesn't work check this line first #if ($edit_url=~/(url\?q=)(.*?)/){$edit_url =~ s/(.*?)url\?q=$ +2/$2/gi;}#else{print "No match- $re_url";exit;} #added if ($edit_url=~/(url\?ntc=)(.*?)(\bq=\b)/){$edit_url =~ s/(.*? +)url\?ntc=(.*?)(\bq=\b)//gi;} #else{print "No match- $edit_url";exit; +} #added #http://news.g +oogle.com/url?ntc=02SF0&q= #http://www.ch +icagotribune.com/news/local/chi-0401090381jan09,1,1855420.story%3Fcol +l%3Dchi-news-hed if($edit_url=~s/\%3F/\?/) { if($edit_url=~s/\%3D/\=/){ } } unless ($x > 5 ) { #$DATA=$DATA."$pre <a href=\"$1\" class=\"blu\" target=_blank> +$2</a> - $4\n"; $DATA=$DATA."$pre <a href=\"$edit_url\" class=\"blu\" target=_ +news>$edit_url2</a> - $edit_url4\n"; #added $x++; } } } $DATA=$DATA."</table></p>"; if ($DATA=~/<a/) { my $r_news = "news11.txt"; open(DATA_IN, ">$r_news") || die "Couldn't get it!"; #|| &open_err +or; print DATA_IN $DATA; close DATA_IN; } else { print "<br>L80- ERROR 1 <br>"; #&parse_error; } } else { print "<br>L=86 - ERROR 2 <br>"; #&http_error; }


Thank you very much!

Janitored by davido: Added readmore tags


In reply to Broken News- Reg. Exp. by Anonymous Monk

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.