Re: Re: Remove HTML tags from document

Replies are listed 'Best First'.
Re: Re: Re: Remove HTML tags from document by daeve (Deacon) on Aug 04, 2003 at 14:17 UTC
But it is perl. Or the calling structure is perl. Now if I had just posted `lynx -dump htmlDocument.html > htmlDocument.txt` [download] I would certainly agree with you about that not being perl. But even this does what the OP requested. I realize that lynx isn't perl, but neither are a lot of gnulinux/unix system calls that are easier and shorter than the alternative "pure" perl methods. I use whatever allows me to get the job done in the shortest amount of time with the least trouble. In the case of stripping html tags out of pages lynx works better and quicker than any regex I've seen so far. Then if there are formatting changes that need to be made, once the tags are stripped out, you can use perl to modify the document as needed. As I said in my original post - TIMTOWTDI ;-) Daeve	[reply] [d/l]
Re^3: Remove HTML tags from document by Aristotle (Chancellor) on Aug 04, 2003 at 07:24 UTC
Oh? Using a module and calling a function, spawning a utility, where's the difference? You're using a blackbox either way. Makeshifts last the longest.	[reply]