It is a good idea to create your own data, as you can not rely on the data you get from the spam filter. The reason is that when you test, you want to make sure that you test as many as possible different cases, within any limited amount of time, there is no gurantee that you would be able to gather enough data from the spam filter, so you have to create more cases. You want your program to cover more than what happened, and cover what could happen.
But at the same time, you should also take samples from what the spam filter created.
So what I am saying is that you should use two types of test data: the real ones from the spam filter, and the ones you created to fill the holes.
Also one thing you want to do is to keep all the test data, and all the expected results for each case. If you have time (you don't always have time, that's always a problem, especially when you have a deadline ;-), you should create a small tool for yourself, hopefully, it would run each test case for you automatically, of course one by one, and compare the result with the expected result, and automatically create a report for you, tell you what cases failed and what passed. This is for the future.
Don't expect that you will not modify your program later, and don't expect there is no bug, so the test data you created this time is a sort of very precious resource, keep them, and reuse them in the future. | [reply] |
I have done this type of testing, and I only know of two
approaches. One is to keep a special database for testing.
The other is to continuously fix the tests as the
database changes.
Depending on the nature of your tests, you many need to
do some of each. To test new data, keep the code constant.
To test new code, keep the database constant.
The trick is to test one dimension at a time. Don't
test new code and new data at the same time.
If it is impractical to keep a test database, you
may be able to separate the code changes from the data
changes using statistics. I use R for this type of statistics.
It should work perfectly the first time! - toma | [reply] |
To follow on from what pg has mentioned, what you're after is control. You need to build your own data and seed it with results you will expect to see foreach type of test case, postive, negative, and any boundary conditions that may apply. As you know what the data is, you can safely expect to see your it in your output.
After the unit tests are successful (and documented, being careful to retain the data, and making sure all the tests are reproducible) it generally a good idea to grab some production data and perform another series of tests...
From here you can sumbit you system to UAT (User Acceptance Testing) and final QA and signoff by the owners of the process/procedure or whatever.
Depending on youre environment, Unit Testing that is well documented may be enuff, however keep in mind, your goal, is to prove your programming works as advertising. The required level of proof will differ from site to site, however IMO, one should at the very minimum have fully documented test cases at the Unit Test level.
Also having someone test your system who has not been involved in the design/coding is generally a good idea as they bring a fresh set of eyes and ideas that you may not have considered. | [reply] |
Thanks to Ryszard, pg, and toma for your replies.
Considering these modules are being created solely for my own consumption, I think the UAT, QA, and final signoff might not necessarily be appropriate :) ... but that's fine advice for any commercial endevours, and I thank you for it.
These tests were really intended as a sort of soft introduction to testing based on some simple code I'd just recently written. With that in mind, I think a simple test .db and basic unit tests will probably suffice for now.
| [reply] |