The stupid question is the question not asked | |
PerlMonks |
Re: Harvesting and Parsing HTML from other sitesby marius (Hermit) |
on Mar 28, 2001 at 09:31 UTC ( [id://67753]=note: print w/replies, xml ) | Need Help?? |
First, change your @pages array to a hash. Then you can step through this with a:
rather than the cumbersome and obfuscated for(){} loop above. Second, a lot of your regexes don't need the /s modifier. See perldoc perlre for info about that. Third, use strict. And now for code error issues: I don't see where you set $keeperlength before using it in your nested for(){} loop. Incidentally, your changing of <tag> to {{{tag}}} doesn't account for things like <br />. That's a minor nitpick, though. Other than that, I can't see why it would "revert" back to the original $html variable. Wanna fix these things I've pointed out (or point out my flaws in thinking as the case may be =]) and try it, and if it still doesn't work point us to some pages that do and pages that don't work and we'll continue hammering. Good luck! -marius
In Section
Seekers of Perl Wisdom
|
|