in reply to What to test in a new module
Include sample pages with the dist that your code should handle correctly. Include URIs that aren't supposed to be gathered by your code, tricky URIs, etc. When you fix bugs later, add tests that make sure you have no regressions.
Refactor your code so you can call and test the 'logic' without grabbing a remote page.