in reply to term frequency and mutual info

I think you will want to take a look at Ted Pedersen's Ngram Statistics and SenseClusters packages.

Additional search terms that may help would be concordance, collocation, and alignment.

HTH,

planetscape

Replies are listed 'Best First'.
Re^2: term frequency and mutual info
by perl_lover_always (Acolyte) on Oct 22, 2010 at 08:31 UTC
    Well, I'm very familiar with those however there are some limitations and some restrictions! Since I have parallel corpus I need the line number to be indexed! moreover is not efficient to change their code and package totally! although the work is clean and interesting!