in reply to Re: Retreive, modify, & display webpage
in thread Retreive, modify, & display webpage

Aidan: Thanks for the reply, I'm currently trying the regex approach and ran into a little bump. The pattern for grabbing the link's text description...
m|<a href=(.*?)>(.*?)</a>|
...will grab everything but "STF3A" and "STF1A". Given:
Browse Tiger <a href="http://tiger.census.gov/cgi-bin/mapbrowse-tbl?la +t=36.12000 &lon=-95.94135&wid=0.75&ht=0.75&mlat=36.12000&mlon=-95.94135&msym=redp +in&off=CIT IES&mlabel=Tulsa+County,+OK">Map</a> of area.<br>
$text will hold "Map" but when given:
Lookup 1990 Census <a href=http://venus.census.gov/cdrom/lookup/CMD=TA +BLES/DB=C9 0STF1A/F0=FIPS.STATE/F1=FIPS.COUNTY90/F2=STUB.GEO/LEV=COUNTY90/SEL=40, +143,Tulsa+ County>STF1A</a>
$text is empty...I've tried tweaking the pattern but I'm even more of a newbie with regex than I am with perl, any suggestions?

Replies are listed 'Best First'.
Re: Re: Re: Retreive, modify, & display webpage
by AidanLee (Chaplain) on Jan 04, 2002 at 00:21 UTC
    If STF1A and STF3A are the only two strings you'll ever want to match you might consider changing it to this:
    m|<a href=(.*?)>(STF1A|STF3A)</a>|
    But that won't necessarily address why it isn't matching. If the urls you're parsing are broken on multiple lines like that you'll need to add the 's' modifier so that the .*? will match newlines as well:
    m|<a href=(.*?)>(STF1A|STF3A)</a>|
    HTH
      I figured it out actually, I don't understand it, but I figured it out. If you move the
      $url =~ s/CMD=TABLES/CMD=RET/;
      line into the if block works fine, if you don't it will only grab the "Map" links.