No such thing as a small change | |
PerlMonks |
Simplify parsing a fileby Anonymous Monk |
on Apr 02, 2007 at 17:28 UTC ( [id://607885]=perlquestion: print w/replies, xml ) | Need Help?? |
Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:
Hello Masters of the Script-fu!
I am a noob with Perl, and wrote the following code to parse through an HTML or JSP file and remove all code blocks and extraneous whitespace, leaving just the text of the page. I know there are probably easier ways to do this on a one-off basis, but I had a couple hundred files to do this to. I also needed an excuse to start learning Perl, and I wrote this using PerlMonks & the camel book. This script works, but man is it ugly.
I have the following issues that I can't seem to get a clear explanation of: 1. Placing more than one file for an argument results in new files ending in .txt, as expected, but they are all blank. 2. I had an issue that I resolved with the 'onceonly' var, where the file would have multiple iterations of the text in the file - somewhere around 50-60 times, but never the same number twice. I checked this with line number counts, and yes, I cleared the files between each test. 3. I know I can probably do this with modules, but as I'm still green with Perl anyway, I thought to do this in the basic syntax without clouding problems with module interactions. This is the next project, creating this script in a module form. 4. I had originally tried using a regex for this, but I couldn't get one to do nested tags. Thanks for any suggestions!
Back to
Seekers of Perl Wisdom
|
|