in reply to Re: Re: (FoxUni) Re: generating regexes?
in thread generating regexes?

Expanding on the idea of multiple data sets with something I forgot earlier:

Traditionally, when you're teaching a program to do something, you use two data sets: a training set, which is properly marked ("this should match", "this shouldn't", etc), and a test set, which is also marked. You don't want to train the program on all the data at once, because you run the risk of overfitting (i.e. you get a program that does really well at matching the training data set, but is so specific to the training data that it fails on real-world data).

--
:wq