Expanding on the idea of multiple data sets with something I forgot earlier:
Traditionally, when you're teaching a program to do something, you use two data sets: a training set, which is properly marked ("this should match", "this shouldn't", etc), and a test set, which is also marked. You don't want to train the program on all the data at once, because you run the risk of overfitting (i.e. you get a program that does really well at matching the training data set, but is so specific to the training data that it fails on real-world data).
--In reply to Re(4): generating regexes?
by FoxtrotUniform
in thread generating regexes?
by mortis
| For: | Use: | ||
| & | & | ||
| < | < | ||
| > | > | ||
| [ | [ | ||
| ] | ] |