Beefy Boxes and Bandwidth Generously Provided by pair Networks
Perl-Sensitive Sunglasses
 
PerlMonks  

Grab zip codes out of an HTML page

by Falkkin (Chaplain)
on Feb 24, 2001 at 08:56 UTC ( [id://60611]=CUFP: print w/replies, xml ) Need Help??

Takes in an HTML page and extracts everything that looks like a ZIP code (that is, 5 digits in a row.) Prints them out, separated by commas. I'm sure there's an better way to do this, but thought I'd share. :)
cat zips.html | perl -ne 's/(\d{5,})//g; print "$1," if $1'

Replies are listed 'Best First'.
Re: Grab zip codes out of an HTML page
by damian1301 (Curate) on Feb 24, 2001 at 09:31 UTC
    Wouldn't the s/(\d{5,})//g; match 5 or more instances of consecutive numbers? So if, in a webpage, there is 123456778990, it would match all of the numbers and return them in $1 A better solution to this would be to omit the comma and result with this

    s/(\d{5})//g;

    But, since your not actually making a substitution in that code, you should just use a match.

    m/(\d{5})(-\d{4})?/g;

    That way you don't have to falsely delete anything and its much more tidier :). Also, now you can catch the full zip code for better accuracy (eg. 12345-1234). Hope I helped.

    UPDATE:Thanks albannach for pointing out my typing mistake and for suggesting the (-\d{4})? part :)

    Almost a Perl hacker.
    Dave AKA damian

    I encourage you to email me

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: CUFP [id://60611]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others studying the Monastery: (6)
As of 2024-04-23 11:53 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found