Grab zip codes out of an HTML page

Takes in an HTML page and extracts everything that looks like a ZIP code (that is, 5 digits in a row.) Prints them out, separated by commas. I'm sure there's an better way to do this, but thought I'd share. :)

cat zips.html | perl -ne 's/(\d{5,})//g; print "$1," if $1'
[download]

Comment on Grab zip codes out of an HTML page Download Code

Replies are listed 'Best First'.
Re: Grab zip codes out of an HTML page by damian1301 (Curate) on Feb 24, 2001 at 09:31 UTC
Wouldn't the `s/(\d{5,})//g;` match 5 or more instances of consecutive numbers? So if, in a webpage, there is 123456778990, it would match all of the numbers and return them in `$1` A better solution to this would be to omit the comma and result with this `s/(\d{5})//g;` But, since your not actually making a substitution in that code, you should just use a match. `m/(\d{5})(-\d{4})?/g;` That way you don't have to falsely delete anything and its much more tidier :). Also, now you can catch the full zip code for better accuracy (eg. 12345-1234). Hope I helped. UPDATE:Thanks albannach for pointing out my typing mistake and for suggesting the `(-\d{4})?` part :) Almost a Perl hacker. Dave AKA damian I encourage you to email me	[reply] [d/l] [select]

Replies are listed 'Best First'.

Re: Grab zip codes out of an HTML page
by damian1301 (Curate) on Feb 24, 2001 at 09:31 UTC

s/(\d{5,})//g;

$1

s/(\d{5})//g;

m/(\d{5})(-\d{4})?/g;

UPDATE

albannach

(-\d{4})?

Almost a Perl hacker.
Dave AKA damian

I encourage you to email me

[reply]
[d/l]
[select]