Re: Re: Re: HTML content extractor

My sincere apologies.

When I read the description of your code you provided I assumed you had written yet-another-html-pseudo-parser. Which you have not. That will teach me to answer posts when I am tired (and too fast).

Once I started actually reading I found that your code _is_ valuable. I also tried (of course!) to write something similar but simpler, and haven't succeeded so far (man, this CNN page is Hell!).

What I have managed though is to find a bug in XML::PYX and one in XML::Twig, so I did not loose my time ;--)

Oh, and of course I upvoted the rest of your comments on the thread.

Sorry...

Comment on Re: Re: Re: HTML content extractor

Replies are listed 'Best First'.
Re: Re: Re: Re: HTML content extractor by Nooks (Monk) on Feb 12, 2001 at 02:01 UTC
Once I started actually reading I found that your code _is_ valuable. I also tried (of course!) to write something similar but simpler, and haven't succeeded so far (man, this CNN page is Hell!). Heh, yeah, those pages can be a right pain in the ass. Don't forget, once you have it working on CNN's news pages, it has to work on slashdot, lwn, (and maybe even one day perlmonks, not that I've tried it myself). Don't worry about bruised egos---I can see now the code probably wasn't ready to be posted, and certainly not without a much better explanation of what it does and why (which I originally cut out to make the node shorter).	[reply]

Replies are listed 'Best First'.

Re: Re: Re: Re: HTML content extractor
by Nooks (Monk) on Feb 12, 2001 at 02:01 UTC

Once I started actually reading I found that your code _is_ valuable. I also tried (of course!) to write something similar but simpler, and haven't succeeded so far (man, this CNN page is Hell!).

Heh, yeah, those pages can be a right pain in the ass. Don't forget, once you have it working on CNN's news pages, it has to work on slashdot, lwn, (and maybe even one day perlmonks, not that I've tried it myself).

Don't worry about bruised egos---I can see now the code probably wasn't ready to be posted, and certainly not without a much better explanation of what it does and why (which I originally cut out to make the node shorter).

[reply]