Thanks, I'm trying to use your suggested method! the first step created a 18 GB file and sorting it takes lots of time! I could finally sort it and I'm now going to the third step which is creating the last file of $ngram: @line_number. and try to see how can I access it using Search::Dict.
my main usage is that I can have two big files in that way and then calculate some statistics such as Mutual Information from those big files. so as long as I can have the line numbers of each n-gram for both files I try to see how to handle it using search::dict.