in reply to Re: Re: Remove HTML tags from document
in thread Remove HTML tags from document
I would certainly agree with you about that not being perl. But even this does what the OP requested. I realize that lynx isn't perl, but neither are a lot of gnulinux/unix system calls that are easier and shorter than the alternative "pure" perl methods. I use whatever allows me to get the job done in the shortest amount of time with the least trouble. In the case of stripping html tags out of pages lynx works better and quicker than any regex I've seen so far. Then if there are formatting changes that need to be made, once the tags are stripped out, you can use perl to modify the document as needed.lynx -dump htmlDocument.html > htmlDocument.txt
As I said in my original post - TIMTOWTDI ;-)
Daeve
|
|---|