Beefy Boxes and Bandwidth Generously Provided by pair Networks
Don't ask to ask, just ask

Re: HTML <=> Text convertion

by Aragorn (Curate)
on Dec 10, 2003 at 11:32 UTC ( #313690=note: print w/replies, xml ) Need Help??

in reply to HTML <=> Text convertion

You can use an external text-browser like lynx to do the hard work for you. Open a pipe to lynx -dump <url> and read the resulting text-rendered page.


Replies are listed 'Best First'.
Re: Re: HTML <=> Text convertion
by TVSET (Chaplain) on Dec 10, 2003 at 11:51 UTC
      Well, if it's your last resort, you are wasting a huge amount of your time and effort. Lazy Programmers -- and you do aspire to be one -- always use the quickest solution first.

      I prefer w3m -dump over lynx for generating plain text from HTML. It handles tables properly. It runs CGI locally for testing HTML output.

      If you are wanting text you can reformat easily, use the -cols option. It's your friend for stripping markup.

      bowling trophy thieves, die!

        There was a reason I wanted to do it the "Perl-way". I am not the only root on the system, but I pretty much the only doing Perl there. Therefor, nothing Perl-related changes without my knowledge on that machine. Lynx/links/w3m though can be removed/upgraded without me noticing. Easy to fix, I know, but good enough reason for me to try to find something else as a solution. :)

Log In?

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://313690]
and the web crawler heard nothing...

How do I use this? | Other CB clients
Other Users?
Others musing on the Monastery: (1)
As of 2023-04-02 08:39 GMT
Find Nodes?
    Voting Booth?

    No recent polls found