Re: Harvesting and Parsing HTML from other sites


The stupid question is the question not asked
	PerlMonks

Re: Harvesting and Parsing HTML from other sites

by marius (Hermit)

on Mar 28, 2001 at 09:31 UTC ( [id://67753]=note: print w/replies, xml )

Need Help??

in reply to Harvesting and Parsing HTML from other sites

First, change your @pages array to a hash. Then you can step through this with a:

foreach $page (keys %pages) {
}
[download]

rather than the cumbersome and obfuscated for(){} loop above.

Second, a lot of your regexes don't need the /s modifier. See perldoc perlre for info about that.

Third, use strict.

And now for code error issues: I don't see where you set $keeperlength before using it in your nested for(){} loop. Incidentally, your changing of <tag> to {{{tag}}} doesn't account for things like <br />. That's a minor nitpick, though. Other than that, I can't see why it would "revert" back to the original $html variable. Wanna fix these things I've pointed out (or point out my flaws in thinking as the case may be =]) and try it, and if it still doesn't work point us to some pages that do and pages that don't work and we'll continue hammering.

Good luck!

-marius

Comment on Re: Harvesting and Parsing HTML from other sites Download Code

In Section Seekers of Perl Wisdom

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://67753]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others romping around the Monastery: (5)

As of 2024-04-23 21:41 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found