I'm only requesting html documents, so I added a handler to prevent downloading response content if the content type wasn't text/*. But I didn't think to monitor the size, so I'll set the max_size now. But I still think I need to move to something that can scale better. I was hoping something already exists, but I'm up for hacking on an AnyEvent or POE solution that incrementally parses the HTML, as it comes in or from file, with HTML::Parser OR XML::LibXML.