in reply to Re: Remove HTML tags from document
in thread Remove HTML tags from document

This node falls below the community's threshold of quality. You may see it by logging in.

Replies are listed 'Best First'.
Re: Re: Re: Remove HTML tags from document
by daeve (Deacon) on Aug 04, 2003 at 14:17 UTC
    But it is perl. Or the calling structure is perl. Now if I had just posted

    lynx -dump htmlDocument.html > htmlDocument.txt
    I would certainly agree with you about that not being perl. But even this does what the OP requested. I realize that lynx isn't perl, but neither are a lot of gnulinux/unix system calls that are easier and shorter than the alternative "pure" perl methods. I use whatever allows me to get the job done in the shortest amount of time with the least trouble. In the case of stripping html tags out of pages lynx works better and quicker than any regex I've seen so far. Then if there are formatting changes that need to be made, once the tags are stripped out, you can use perl to modify the document as needed.

    As I said in my original post - TIMTOWTDI ;-)

    Daeve

Re^3: Remove HTML tags from document
by Aristotle (Chancellor) on Aug 04, 2003 at 07:24 UTC
    Oh? Using a module and calling a function, spawning a utility, where's the difference? You're using a blackbox either way.

    Makeshifts last the longest.