in reply to Re: How to extract untouched content of html tag with HTML::Parser
in thread How to extract untouched content of html tag with HTML::Parser

I wish it was that simple :) But it isn't :(
  • Comment on Re^2: How to extract untouched content of html tag with HTML::Parser

Replies are listed 'Best First'.
Re^3: How to extract untouched content of html tag with HTML::Parser
by Anonymous Monk on Nov 28, 2010 at 17:26 UTC
    It is that easy. You have a logic error. Your start handler, which you call start_handler, does no printing. You text handler does printing, but as documented, the text handler handles text not start tags. Also, your end handler does no printing.
      OMG!!! I can't believe I was that blind! Thank you very much! :))
        I can believe it, it happens to me every day, usually in between naps and coffee breaks
Re^3: How to extract untouched content of html tag with HTML::Parser
by roboticus (Chancellor) on Nov 28, 2010 at 16:40 UTC

    OK, then, did you look at the htstrip example in the distribution? The documentation (at the end of the EXAMPLES section) indicates that you can modify it to do what you want:

    More examples are found in the eg/ directory of the HTML-Parser distribution: the program hrefsub shows how you can edit all links found in a document; the program htextsub shows how to edit the text only; the program hstrip shows how you can strip out certain tags/elements and/or attributes; and the program htext show how to obtain the plain text, but not any script/style content.

    ...roboticus

      Yes I did examined all examples and played with them alot. But still can't get what I need. I can't understand why using 'text' instead of 'dtext' produces the same result - plain text instead of returning untouched content of that HTML tag...