in reply to Stripping non alphanumeric characters and leaving punctuation characters from a file

Answer to the stated problem; there are character classes in perl for control characters. In unicode, s/\p{IsC}//g; or s/\p{IsCntrl}//g;, in POSIX, s/[[:cntrl:]]//g;.

As for the real problem, it sounds very clunky to parse that information out of a pdf file each run. Why not extract it once and place it in a small db or flat file? [Update] The same objection holds for parsing it from html each time. Scribble the tax rate data and only the tax rate data somewhere you can get it easily.

Perl's binmode instruction may help with your file reading problem.

After Compline,
Zaxo

  • Comment on Re: Stripping non alphanumeric characters and leaving punctuation characters from a file
  • Select or Download Code

Replies are listed 'Best First'.
Re: Re: Stripping non alphanumeric characters and leaving punctuation characters from a file
by Popcorn Dave (Abbot) on Jun 06, 2003 at 19:32 UTC
    Sorry I forgot to mention that.

    I'm checking to see if the file exists on the local machine, and only downloading and processing it if it's not there. I don't want to be hitting Adobe's server every time this thing is run.

    There is no emoticon for what I'm feeling now.