Thanks, the pages I'm working with have a lot of really awfully formatted HTML that ends up being quite long, this is probably a good workaround so I don't have to load it all into memory. Also I like your trick for getting rid of $1, I hate using the regex variables.