If I understand your question properly, I think you mean you want to strip out the HTML tags. If so, the following ought to do the trick
#!/usr/bin/perl -w use strict; print "Content-type: text/html\r\n"; my $file="path/to/page.html"; open(fp, $file) or die "Couldn't open file: $!"; while ( my $output = <fp> ) { $output=~s/<[^>]*?>//g; $output=~s/&/&/g; $output=~s/"/"/g; $output=~s/</</g; $output=~s/>/>/g; print $output . "\n"; }; close(fp);
In reply to Re: Getting the text of the html document
by dyer85
in thread Getting the text of the html page
by agynr
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |