in reply to Cleaning up text for indexing in DB
One thing that might help witht he possibly invalid html is a project called tidy (http://tidy.sourceforge.net/) and it has a library version as well.
You could call this and pass it the html code and it will clean it up for you before you remove the tags. This would make it easier so that if someone entered invalid html or forgot to close a tag, it would not hose up the regexen.
Ed
|
|---|