As other monks suggested in another thread, why not get the text via the program 'lynx', letting it get rid of any HTML-junk for you?
$text = `lynx -dump $url`;...every application I have ever worked on is a glorified munger...
In reply to Re: Cleaning up text for indexing in DB
by t'mo
in thread Cleaning up text for indexing in DB
by TVSET
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |