in reply to Using Word Tokens as Features

The faster way to do this is to store the word as a key in a hash, with a hash linked to an array that you can append a value to as you go along. This saves you the time of building a 2D matrix. If you parse each line by a delimiter, then you have a collection of tokens (strings) that you can easily add to the hash of arrays. The catch is you have to keep a counter to make sure that you correctly initialize the new arrays to the appropriate length.
Bioinformatics