Re: How to extract untouched content of html tag with HTML::Parser

in reply to How to extract untouched content of html tag with HTML::Parser

I've not used it in a while, but as I read the documentation, I'd suggest passing "text" rather than "dtext" to the handler specification so it can print the original text rather than the decoded text.

Comment on Re: How to extract untouched content of html tag with HTML::Parser

Replies are listed 'Best First'.
Re^2: How to extract untouched content of html tag with HTML::Parser by Lana (Beadle) on Nov 28, 2010 at 16:11 UTC
I wish it was that simple :) But it isn't :(	[reply]
Re^3: How to extract untouched content of html tag with HTML::Parser by Anonymous Monk on Nov 28, 2010 at 17:26 UTC
It is that easy. You have a logic error. Your start handler, which you call start_handler, does no printing. You text handler does printing, but as documented, the text handler handles text not start tags. Also, your end handler does no printing.	[reply]
Re^4: How to extract untouched content of html tag with HTML::Parser by Lana (Beadle) on Nov 28, 2010 at 17:33 UTC
OMG!!! I can't believe I was that blind! Thank you very much! :))	[reply]
Re^5: How to extract untouched content of html tag with HTML::Parser by Anonymous Monk on Nov 28, 2010 at 17:36 UTC
Re^3: How to extract untouched content of html tag with HTML::Parser by roboticus (Chancellor) on Nov 28, 2010 at 16:40 UTC
OK, then, did you look at the `htstrip` example in the distribution? The documentation (at the end of the EXAMPLES section) indicates that you can modify it to do what you want: More examples are found in the eg/ directory of the HTML-Parser distribution: the program hrefsub shows how you can edit all links found in a document; the program htextsub shows how to edit the text only; the program hstrip shows how you can strip out certain tags/elements and/or attributes; and the program htext show how to obtain the plain text, but not any script/style content. ...roboticus	[reply] [d/l]
Re^4: How to extract untouched content of html tag with HTML::Parser by Lana (Beadle) on Nov 28, 2010 at 17:22 UTC
Yes I did examined all examples and played with them alot. But still can't get what I need. I can't understand why using 'text' instead of 'dtext' produces the same result - plain text instead of returning untouched content of that HTML tag...	[reply]

In Section Seekers of Perl Wisdom