Takes in an HTML page and extracts everything that looks like a ZIP code (that is, 5 digits in a row.) Prints them out, separated by commas.
I'm sure there's an better way to do this, but thought I'd share. :)
cat zips.html | perl -ne 's/(\d{5,})//g; print "$1," if $1'
Wouldn't the s/(\d{5,})//g; match 5 or more instances of consecutive numbers? So if, in a webpage, there is 123456778990, it would match all of the numbers and return them in $1 A better solution to this would be to omit the comma and result with this
s/(\d{5})//g;
But, since your not actually making a substitution in that code, you should just use a match.
m/(\d{5})(-\d{4})?/g;
That way you don't have to falsely delete anything and its much more tidier :). Also, now you can catch the full zip code for better accuracy (eg. 12345-1234). Hope I helped.
UPDATE:Thanks albannach for pointing out my typing mistake and for suggesting the (-\d{4})? part :)