Re^3: Parsing HTML question

Now you got us confused!

Do you actually know how to get a webpage into a Perl program? If not, I suggest you look into the LWP family of modules and more specifically the LWP::Simple module which has the get function which can do

my $webpage = get("http://www.perlmonks.org");
[download]

You then put $webpage through HTML::Strip to get at the contents stripped of tags.

CountZero

A program should be light and agile, its subroutines connected like a string of pearls. The spirit and intent of the program should be retained throughout. There should be neither too little or too much, neither needless loops nor useless variables, neither lack of structure nor overwhelming rigidity." - The Tao of Programming, 4.1 - Geoffrey James

Comment on Re^3: Parsing HTML question Select or Download Code

Replies are listed 'Best First'.
Re^4: Parsing HTML question by vit (Friar) on Jun 24, 2008 at 19:27 UTC
Sorry for confusion. I know LWP. My task is to strip HTML file, I may not know its URL. I checked example with HTML::Strip and it does not look very good for me. Ideally I would like to have something like that: http://www.zubrag.com/tools/html-tags-stripper.php Try it. It works very well for me. I do not think it is possible to strip that good using HTML::Strip or I am missing something.	[reply]
Re^5: Parsing HTML question by moritz (Cardinal) on Jun 24, 2008 at 19:39 UTC
http://www.zubrag.com/tools/html-tags-stripper.php Try it. It works very well for me Our notions of "well" might differ. I tried it, and first thing I noticed was that it broke all non-ascii characters on my page. Anyway, I don't think anybody can help you unless you describe in what way the output of HTML::Strip isn't fit for your purpose.	[reply]