Beefy Boxes and Bandwidth Generously Provided by pair Networks
Come for the quick hacks, stay for the epiphanies.
 
PerlMonks  

Re: HTML <=> Text convertion

by Aragorn (Curate)
on Dec 10, 2003 at 11:32 UTC ( [id://313690]=note: print w/replies, xml ) Need Help??


in reply to HTML <=> Text convertion

You can use an external text-browser like lynx to do the hard work for you. Open a pipe to lynx -dump <url> and read the resulting text-rendered page.

Arjen

Replies are listed 'Best First'.
Re: Re: HTML <=> Text convertion
by TVSET (Chaplain) on Dec 10, 2003 at 11:51 UTC
      Well, if it's your last resort, you are wasting a huge amount of your time and effort. Lazy Programmers -- and you do aspire to be one -- always use the quickest solution first.

      I prefer w3m -dump over lynx for generating plain text from HTML. It handles tables properly. It runs CGI locally for testing HTML output.

      If you are wanting text you can reformat easily, use the -cols option. It's your friend for stripping markup.

      --
      bowling trophy thieves, die!

        There was a reason I wanted to do it the "Perl-way". I am not the only root on the system, but I pretty much the only doing Perl there. Therefor, nothing Perl-related changes without my knowledge on that machine. Lynx/links/w3m though can be removed/upgraded without me noticing. Easy to fix, I know, but good enough reason for me to try to find something else as a solution. :)

Log In?
Username:
Password:

What's my password?
Create A New User
Domain Nodelet?
Node Status?
node history
Node Type: note [id://313690]
help
Chatterbox?
and the web crawler heard nothing...

How do I use this?Last hourOther CB clients
Other Users?
Others meditating upon the Monastery: (5)
As of 2024-04-19 07:32 GMT
Sections?
Information?
Find Nodes?
Leftovers?
    Voting Booth?

    No recent polls found