Ditto on what everyone else said about using existing tools.
One problem with what you are doing (and I've done such things myself and discovered the error of my ways) is that you'll get subjects containing: v1agra, vi@gra, v_iagra, v i a g r a,
v~iagra, etc, etc. It's really tough to match everything of that sort and not produce a lot of false positives; one can come close, but then it becomes a full time job.
chas