in reply to Re^2: Parsing HTML question
in thread Parsing HTML question

Now you got us confused!

Do you actually know how to get a webpage into a Perl program? If not, I suggest you look into the LWP family of modules and more specifically the LWP::Simple module which has the get function which can do

my $webpage = get("http://www.perlmonks.org");

You then put $webpage through HTML::Strip to get at the contents stripped of tags.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Replies are listed 'Best First'.
Re^4: Parsing HTML question
by vit (Friar) on Jun 24, 2008 at 19:27 UTC
    Sorry for confusion. I know LWP. My task is to strip HTML file, I may not know its URL. I checked example with HTML::Strip and it does not look very good for me.
    Ideally I would like to have something like that:
    http://www.zubrag.com/tools/html-tags-stripper.php
    Try it. It works very well for me. I do not think it is possible to strip that good using HTML::Strip or I am missing something.
      http://www.zubrag.com/tools/html-tags-stripper.php Try it. It works very well for me

      Our notions of "well" might differ. I tried it, and first thing I noticed was that it broke all non-ascii characters on my page.

      Anyway, I don't think anybody can help you unless you describe in what way the output of HTML::Strip isn't fit for your purpose.