I've run some tests on the updated version of this code on
debug the error!!, a 500 line monstrosity posted without code tags, and
Slashdot Headline Grabber for *nix, which was posted with <CODE> tags and it appears to run fine.
For the program to properly parse code that was posted without code tags, it will require the following regex to be substituted for the last regex in the foreach loop.
s/<a href="[^"]+">([^<]+)<\/a>/[$1]/g;
Because I can't resist tinkering, I've decided to add the following enhancements (for a program that really has limited use!) in the form of a summary at the end of the script run:
- Possible code fragment (no shebang '#!').
- Verify balanced parens, quotes, curlies, etc. This will require ensuring that I don't accidentally pull in escaped characters such as \".
- Expand %charcodes hash to be more inclusive?
Other suggestions would be welcome.