Don't ask to ask, just ask | |
PerlMonks |
comment on |
( [id://3333]=superdoc: print w/replies, xml ) | Need Help?? |
First, change your @pages array to a hash. Then you can step through this with a:
rather than the cumbersome and obfuscated for(){} loop above. Second, a lot of your regexes don't need the /s modifier. See perldoc perlre for info about that. Third, use strict. And now for code error issues: I don't see where you set $keeperlength before using it in your nested for(){} loop. Incidentally, your changing of <tag> to {{{tag}}} doesn't account for things like <br />. That's a minor nitpick, though. Other than that, I can't see why it would "revert" back to the original $html variable. Wanna fix these things I've pointed out (or point out my flaws in thinking as the case may be =]) and try it, and if it still doesn't work point us to some pages that do and pages that don't work and we'll continue hammering. Good luck! -marius In reply to Re: Harvesting and Parsing HTML from other sites
by marius
|
|