If this is an authorized activitiy, why don't you slurp it
from the actual data source? (i.e. database, text files, etc) It is generally much easier to do the that, because HTML scrapers are difficult do write and tend to break easily with minor HTML variances.