As for your second query, what you want to remove are known as "stop words"; use that phrase as googlefodder, and see Lingua::StopWords.
In reply to Re: TagCloud and phrase frequency
by Fletch
in thread TagCloud and phrase frequency
by johnnywang
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |