If you are scraping a web page then it will be HTML. Or are you trying to parse output from a web service that sends a response in something like XML or JSON format? There are modules to handle these scenarios but it is important first to know what you are dealing with. Can you be more precise and maybe give a URL?