Getting parsed text from an HTML::Parser object

Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

HI all, I feel like an idiot as I'm sure this is basic, but I'm stuck.
I want to convert an HTML file to plain text (ie strip out all the HTML stuff) -- I think this will do it:

use HTML::Parser (); 

# Create parser object 
$p = HTML::Parser->new( ); 

$p->parse_file("default.htm");
[download]

Now what? I think the results are in "$p" but how do I get at them?
Thanks, and sorry for the basic question.

-- Deck

2005-01-02 Edited by Arunbear: Changed title from 'how do I read the output of a method?', as per Monastery guidelines

Comment on Getting parsed text from an HTML::Parser object Download Code

Replies are listed 'Best First'.
•Re: Getting parsed text from an HTML::Parser object by merlyn (Sage) on Jan 01, 2005 at 03:58 UTC
The results aren't "in" $p. You created a parser that does nothing with the results, so it parsed it and did nothing, just as you asked. You need to configure the parser, telling it what to do when it sees text, tags, comments, and so on. There are examples on the HTML::Parser manpage, although if you're just trying to get at the text, there's HTML::Filter that does it all at once. -- Randal L. Schwartz, Perl hacker Be sure to read my standard disclaimer if this is a reply. update: `HTML::Parser->new(text_h => [sub { print shift }, "text"])->parse_file +("some_file");` [download]	[reply] [d/l]
Re: Getting parsed text from an HTML::Parser object by jbrugger (Parson) on Jan 01, 2005 at 10:13 UTC
Also have a look at Strip HTML tags again the answer to this question is already given here (also by merlyn. It might be an idea to search a bit more before asking (answers are mostly already given, and you learn a lot of it) suc6, and a happy new year!	[reply]
Re^2: Getting parsed text from an HTML::Parser object by Mr. Muskrat (Canon) on Jan 01, 2005 at 17:22 UTC
If you post using `[id://178374]`, you get Strip HTML tags again. See What shortcuts can I use for linking to other information? for more linking goodness.	[reply] [d/l]
Re^3: Getting parsed text from an HTML::Parser object by jbrugger (Parson) on Jan 01, 2005 at 19:10 UTC
Thanks! i'll do that. (don't we all need to read :) )	[reply]