in reply to testing externally controlled data sources

It is a good idea to create your own data, as you can not rely on the data you get from the spam filter. The reason is that when you test, you want to make sure that you test as many as possible different cases, within any limited amount of time, there is no gurantee that you would be able to gather enough data from the spam filter, so you have to create more cases. You want your program to cover more than what happened, and cover what could happen.

But at the same time, you should also take samples from what the spam filter created.

So what I am saying is that you should use two types of test data: the real ones from the spam filter, and the ones you created to fill the holes.

Also one thing you want to do is to keep all the test data, and all the expected results for each case. If you have time (you don't always have time, that's always a problem, especially when you have a deadline ;-), you should create a small tool for yourself, hopefully, it would run each test case for you automatically, of course one by one, and compare the result with the expected result, and automatically create a report for you, tell you what cases failed and what passed. This is for the future.

Don't expect that you will not modify your program later, and don't expect there is no bug, so the test data you created this time is a sort of very precious resource, keep them, and reuse them in the future.
  • Comment on Re: testing externally controlled data sources