This will give you a hash of all words.$_=join('',<INFILE>); s/\s+/ /g; # clean all whitespace s/<[^>]*>//g; # clean all HTML-Like tags s/[^a-z]/ /gi; # Remove all but letters grep ++$count{$_} && undef, split;
In reply to Re: Cleaning up text for indexing in DB
by Skeeve
in thread Cleaning up text for indexing in DB
by TVSET
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |