duelafn has asked for the wisdom of the Perl Monks concerning the following question:

Good Day,

I was looking at adding US zip code and city lookup to a program I'm writing, so I'm trying a quick test. I got the hard part to work (as far as I was concerned), but I'm having troubles with the simple stuff.

The following code simply passes the entered information to usps.gov and asks for the matches. It can handle either a five digit zip code, or a City, State pair.
use strict; use warnings; use LWP::UserAgent; print 'Enter Search: '; chomp( my $x = <STDIN> ); # Create a user agent object my $ua = new LWP::UserAgent; $ua->agent("AgentName/0.1 " . $ua->agent); # Create a request my $req = new HTTP::Request POST => 'http://www.usps.com/cgi-bin/zip4/ +ctystzip2'; $req->content_type('application/x-www-form-urlencoded'); $req->content('ctystzip='.$x); my $res = $ua->request($req); my @matches; my $data = $res->content; $data =~ /----------</g; if ($x =~ /^\s*\d{5}\s*/) { while ($data =~ /\G(br>((\w+\s)+\s*\w\w ))/ig) { push @matches, $2; } } else { while ($data =~ /\G(br>(\d{5}))/ig) { push @matches, $2; } } print 'Found ',$#matches+1,' matches:',"\n ", join("\n ", @matches), +"\n";
Here are the results of a few test cases
[duelafn@XYZZY bbdb]$ perl States.pl Enter Search: Mishawaka, IN Found 0 matches: [duelafn@XYZZY bbdb]$ perl States.pl Enter Search: Chicago, IL Found 0 matches: [duelafn@XYZZY bbdb]$ perl States.pl Enter Search: 84102 Found 1 matches: SALT LAKE CITY UT [duelafn@XYZZY bbdb]$ perl States.pl Enter Search: 12345 Found 1 matches: SCHENECTADY NY
The tests tell me that what's wrong is in the "else" case, but when I put a "print pos $data" in there it prints a reasonable position.

Thanks,
    Dean

Replies are listed 'Best First'.
Re: specific /\G/g troubles
by chipmunk (Parson) on Dec 13, 2001 at 09:38 UTC
    \G forces the regular expression to match at the point where the last match left off, or not at all. Looking at your sample data, the regex does not match where the last match left off; it matches later in the string. You could fix it so the end of each match coincides with the beginning of the next match, but it would be simpler to just remove the \G. I don't think it's useful in this case.
Re: specific /\G/g troubles
by dws (Chancellor) on Dec 13, 2001 at 10:45 UTC
    In the case of passing a city and state, the HTML you're getting back isn't matched by the sequence of regular expressions you're using. After first matching   $data =~ /----------</g; you try for   while ($data =~ /\G(br>(\d{5}))/ig) { which won't ever match, at least not until the USPS changes the format of the data they're returning. Take a very careful look at the return data to see why not.

    One easy way to fix this is to change that to   while ($data =~ /\G(.*?<br>)(\d{5})/ig) {

Re: specific /\G/g troubles
by duelafn (Parson) on Dec 13, 2001 at 09:28 UTC
    Oops, I forgot to mention that the usps does actually find a match for the test Cities, the reuslts look like:
    <PRE> <b>MISHAWAKA IN</b><br>is associated with the For these ZIP Codes, ZIP Code<br>following ZIP Codes: the city name is: Type<br>-------------------------------------------------------------< +BR> <BR>46544 ACCEPTABLE (DEFAULT) STANDARD<BR> <BR>46545 ACCEPTABLE (DEFAULT) STANDARD<BR> <BR>46546 ACCEPTABLE (DEFAULT) STANDARD<BR></PRE>
    Good Day
Re: specific /\G/g troubles
by Fastolfe (Vicar) on Dec 13, 2001 at 22:07 UTC
    Just a thought: Keep in mind that there are actually published machine-readable versions of this data. If your application is going to be making heavy use of this, it might be beneficial to download this type of thing and store it in a local database (or DB file). I found a nice CSV file with every zip code, locality name and latitude/longitude after a few minutes of Google surfing, though I don't know how recent it is.