Re: HTML <=> Text convertion

by Aragorn (Curate)
on Dec 10, 2003 at 11:32 UTC

in reply to HTML <=> Text convertion

You can use an external text-browser like lynx to do the hard work for you. Open a pipe to lynx -dump <url> and read the resulting text-rendered page.


Re: Re: HTML <=> Text convertion
on Dec 10, 2003 at 11:51 UTC
      Well, if it's your last resort, you are wasting a huge amount of your time and effort. Lazy Programmers -- and you do aspire to be one -- always use the quickest solution first.

      I prefer w3m -dump over lynx for generating plain text from HTML. It handles tables properly. It runs CGI locally for testing HTML output.

      If you are wanting text you can reformat easily, use the -cols option. It's your friend for stripping markup.

        There was a reason I wanted to do it the "Perl-way". I am not the only root on the system, but I pretty much the only doing Perl there. Therefor, nothing Perl-related changes without my knowledge on that machine. Lynx/links/w3m though can be removed/upgraded without me noticing. Easy to fix, I know, but good enough reason for me to try to find something else as a solution. :)

