Gosh! You didn't even take a look at what lynx -dump produces, did you?

He didn't claim it would produce the same output, nor comparable one. He just pointed out it has a method for outputting plain text, which it has. Indeed I think it more or less amounts to the as_text() of the whole parse tree of the wanted page. Lynx and its variations are full fledged browser, so it is natural they go beyond the capabilities of a simple parser, aiming at being presentation friendly. But that's quite a lot of work. You may hack/roll your own by inserting horizontal and vertical whitespace suitably around individual elements before printing them as_text. Needless to say, this is necessarily going to be quite a lot of work, but maybe just inserting newlines after every single one of them may make everything more clear. Oh, and at the very least take care of paragraphs and breaks. But if you also want line wrap that's a whole another story. (A call for Text::Wrap, most probably.)

OTOH did you look at the outcome of your post (as is recommended)?!? It screwed up the whole view for this thread. Use <code> tags around the stuff you pasted, although it's not strictly code. At least that has smart line wrap...

Update: the post has been fixed, hence the above comment does not apply any more.

Ciao


In reply to Re^3: Any pure-perl html to text? (Or: missing a perl equivalent to 'lynx -dump') by blazar
in thread Any pure-perl html to text? (Or: missing a perl equivalent to 'lynx -dump') by bronto

Title:
Use:  <p> text here (a paragraph) </p>
and:  <code> code here </code>
to format your post, it's "PerlMonks-approved HTML":



  • Posts are HTML formatted. Put <p> </p> tags around your paragraphs. Put <code> </code> tags around your code and data!
  • Titles consisting of a single word are discouraged, and in most cases are disallowed outright.
  • Read Where should I post X? if you're not absolutely sure you're posting in the right place.
  • Please read these before you post! —
  • Posts may use any of the Perl Monks Approved HTML tags:
    a, abbr, b, big, blockquote, br, caption, center, col, colgroup, dd, del, details, div, dl, dt, em, font, h1, h2, h3, h4, h5, h6, hr, i, ins, li, ol, p, pre, readmore, small, span, spoiler, strike, strong, sub, summary, sup, table, tbody, td, tfoot, th, thead, tr, tt, u, ul, wbr
  • You may need to use entities for some characters, as follows. (Exception: Within code tags, you can put the characters literally.)
            For:     Use:
    & &amp;
    < &lt;
    > &gt;
    [ &#91;
    ] &#93;
  • Link using PerlMonks shortcuts! What shortcuts can I use for linking?
  • See Writeup Formatting Tips and other pages linked from there for more info.