Rather than splitting on whitespace since there is whitespace in the descriptions as well, why not just split the whole data chunk with a regex from the start. Assuming that your entries are always formatted with the chr, start, end, and description, you could do something like: