Popcorn Dave has asked for the wisdom of the Perl Monks concerning the following question:
I'm writing a program to look up sales tax rates for California. Unfortunately there is more than one rate for the state - it's determined by county. Not a problem in so much as there's a PDF file with all the information, and Adobe is kind enough to provide a conversion service to make a PDF file into HTML here. So far so good.
I'm running in to trouble when I try to parse the file looking for a city. The HTML source is 3243 lines long, but when I read the file in, I'm only getting 231 lines before it quits.
My thinking is that when the file was converted, there is some character that is acting as an EOF. I'm not sure about that though, because the file displays fine in a browser.
So, I'm wondering, is there a simple way to strip out anything below ascii 32? I've tried using things like
next if $line =~ /\W/; next if $line != /^[A-Z]/; next if $line =~ /[^A-Z]/;
to no avail.
The information I'm actually after in the file starts probably half way in to the file and is layed out as: city, tax rate, county so looking for the capital letter at the beginning, I thought, would work.
Thanks in advance!
There is no emoticon for what I'm feeling now.
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Stripping non alphanumeric characters and leaving punctuation characters from a file
by BrowserUk (Patriarch) on Jun 06, 2003 at 19:27 UTC | |
by Popcorn Dave (Abbot) on Jun 06, 2003 at 19:40 UTC | |
|
Re: Stripping non alphanumeric characters and leaving punctuation characters from a file
by Zaxo (Archbishop) on Jun 06, 2003 at 19:29 UTC | |
by Popcorn Dave (Abbot) on Jun 06, 2003 at 19:32 UTC |