in reply to Cleaning up text for indexing in DB
This will give you a hash of all words.$_=join('',<INFILE>); s/\s+/ /g; # clean all whitespace s/<[^>]*>//g; # clean all HTML-Like tags s/[^a-z]/ /gi; # Remove all but letters grep ++$count{$_} && undef, split;
|
|---|
| Replies are listed 'Best First'. | |
|---|---|
|
Re: Re: Cleaning up text for indexing in DB
by halley (Prior) on Jul 16, 2003 at 13:55 UTC | |
by Cody Pendant (Prior) on Jul 17, 2003 at 10:09 UTC | |
by Skeeve (Parson) on Jul 17, 2003 at 05:48 UTC | |
|
Re: Re: Cleaning up text for indexing in DB
by TVSET (Chaplain) on Jul 16, 2003 at 16:26 UTC |