Re: How would you extract *content* from websites?


"be consistent"
	PerlMonks

Re: How would you extract content from websites?

by kirbyk (Friar)

on Jun 17, 2005 at 18:22 UTC ( [id://467833]=note: print w/replies, xml )

Need Help??

in reply to How would you extract *content* from websites?

One tip: many news sites these days have RSS feeds, if not directly from them, from someone like Yahoo. I'm sure you can get your Reuters through there. An RSS feed is exactly what you want - content without layout.

Anything else, a solution is going to be specific to their site, and only until they change their design. A lot of work. I don't see a way around it.

Good luck!

-- Kirby, WhitePages.com

Comment on Re: How would you extract content from websites?

In Section Meditations

Domain Nodelet^?

www.com | www.net | www.org

Node Status^?

node history
Node Type: note [id://467833]
help

Chatterbox^?

How do I use this? • Last hour • Other CB clients

Other Users^?

Others sharing their wisdom with the Monastery: (6)

As of 2024-04-23 13:38 GMT

Sections^?

Information^?

Find Nodes^?

Leftovers^?

Today I Learned

Voting Booth^?

No recent polls found