Anonymous Monk has asked for the wisdom of the Perl Monks concerning the following question:

HI all, I feel like an idiot as I'm sure this is basic, but I'm stuck.
I want to convert an HTML file to plain text (ie strip out all the HTML stuff) -- I think this will do it:
use HTML::Parser (); # Create parser object $p = HTML::Parser->new( ); $p->parse_file("default.htm");
Now what? I think the results are in "$p" but how do I get at them?
Thanks, and sorry for the basic question.

-- Deck

2005-01-02 Edited by Arunbear: Changed title from 'how do I read the output of a method?', as per Monastery guidelines

Replies are listed 'Best First'.
•Re: Getting parsed text from an HTML::Parser object
by merlyn (Sage) on Jan 01, 2005 at 03:58 UTC
    The results aren't "in" $p. You created a parser that does nothing with the results, so it parsed it and did nothing, just as you asked.

    You need to configure the parser, telling it what to do when it sees text, tags, comments, and so on. There are examples on the HTML::Parser manpage, although if you're just trying to get at the text, there's HTML::Filter that does it all at once.

    -- Randal L. Schwartz, Perl hacker
    Be sure to read my standard disclaimer if this is a reply.


    update:
    HTML::Parser->new(text_h => [sub { print shift }, "text"])->parse_file +("some_file");
Re: Getting parsed text from an HTML::Parser object
by jbrugger (Parson) on Jan 01, 2005 at 10:13 UTC
    Also have a look at Strip HTML tags again the answer to this question is already given here (also by merlyn. It might be an idea to search a bit more before asking (answers are mostly already given, and you learn a lot of it)
    suc6, and a happy new year!
        Thanks! i'll do that. (don't we all need to read :) )