Reading the OP, I'm pretty sure pulling out a scalar is what ant wants to do. It firstly says that the file is large, about a gigabyte, so you are most likely to process the file a line at a time rather than slurp it into memory. It goes on to say

each line has <CS_REFCLT>12526489</CS_REFCLT> in it some where

which I take to mean just one occurance of the string per line, not multiples. Note also that ant says "I can't use substr to get at it." That strongly reinforces my interpretation.

Of course, my interpretation could be totally wrong :)