in reply to Getting the text of the html page
If I understand your question properly, I think you mean you want to strip out the HTML tags. If so, the following ought to do the trick
#!/usr/bin/perl -w use strict; print "Content-type: text/html\r\n"; my $file="path/to/page.html"; open(fp, $file) or die "Couldn't open file: $!"; while ( my $output = <fp> ) { $output=~s/<[^>]*?>//g; $output=~s/&/&/g; $output=~s/"/"/g; $output=~s/</</g; $output=~s/>/>/g; print $output . "\n"; }; close(fp);
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re^2: Getting the text of the html document
by bradcathey (Prior) on Jun 20, 2005 at 12:34 UTC | |
|
Re^2: Getting the text of the html document
by CountZero (Bishop) on Jun 20, 2005 at 13:12 UTC | |
by dyer85 (Acolyte) on Jul 19, 2005 at 09:26 UTC | |
by davorg (Chancellor) on Jul 19, 2005 at 09:36 UTC |