Beefy Boxes and Bandwidth Generously Provided by pair Networks
Keep It Simple, Stupid

Re: How do I extract text from an HTML page?

by CountZero (Bishop)
on Aug 03, 2003 at 16:06 UTC ( #280447=note: print w/replies, xml ) Need Help??

in reply to How do I extract text from an HTML page?

Well whatever you do, the only way not to go is to regex the HTML-code yourself. This will only work for the most simple and regular of HTML-code and will break before you know it.

Another approach is to go to the source of your data in the first web-page. Assuming that this is based upon some database, can't you go directly to that database and query the data from there?


"If you have four groups working on a compiler, you'll get a 4-pass compiler." - Conway's Law

  • Comment on Re: How do I extract text from an HTML page?

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://280447]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others browsing the Monastery: (4)
As of 2023-03-28 12:01 GMT
Find Nodes?
    Voting Booth?
    Which type of climate do you prefer to live in?

    Results (67 votes). Check out past polls.