Re^2: NNflex problems (win32)

Ah! I have very little neural net experience and didn't realize the learning process was not strictly deterministic but had a random element. I see that now by running xor and xor_minimal several times sequentially: each run had a different number of learning epochs but produced the same result.

There's something that troubles me, though. The learning process seems to occasionally go on infinitely. For example, my last xor_minimal run went for 67,481 epochs before I decided to stop it. I ran xor about 8 times, all with epochs under 256, but the last time I ran it I let it go for a few minutes up to epoch 37892 before I killed it -- it got stuck on an error value of 5.30893529204799. Is it possible that the learning process even for small nets will never show an error level less than .001 depending on the random initialization?

I was awed by the implications of the ex_add example in the Mesh package. If it's possible to "teach" a system to add, subtract, or do other basic math by example with reasonable accuracy, then I have pretty high expectations for my data. So, right now, I'm trying to prove to myself that I can "teach" a net to do basic operations -- things that I can verify independently and easily. When I set it loose on my data, after the initial tweaking and verifying, it's going to get more and more expensive (in terms of manual labor) for me to verify all of the results so I'd like some confidence that I understand what it's doing up front.

Your suggested modification, removing atanh as the errorfunction, seems to work on Linux. It's on Epoch 400 with an error around .35, which has steadily been dropping from the initial error level of about 200. I'll let it run a little longer.

With that said, I've taken your xor example and modified it with the ex_add example from the Mesh package. Here's the resulting code:

use AI::NNFlex::Backprop;
use AI::NNFlex::Dataset;

my $network = AI::NNFlex::Backprop->new(
                learningrate=>.2,
                bias=>1,
                fahlmanconstant=>0.1,
                momentum=>0.6,
                round=>1);



$network->add_layer(    nodes=>2,
            activationfunction=>"tanh");


$network->add_layer(    nodes=>2,
            activationfunction=>"tanh");

$network->add_layer(    nodes=>1,
            activationfunction=>"linear");


$network->init();

# Taken from Mesh ex_add.pl
my $dataset = AI::NNFlex::Dataset->new([
[ 1,   1   ], [ 2    ],
[ 1,   2   ], [ 3    ],
[ 2,   2   ], [ 4    ],
[ 20,  20  ], [ 40   ],
[ 50,  50  ], [ 100  ],
[ 60,  40  ], [ 100  ],
[ 100, 100 ], [ 200  ],
[ 150, 150 ], [ 300  ],
[ 500, 500 ], [ 1000 ],
[ 10,  10  ], [ 20   ],
[ 15,  15  ], [ 30   ],
[ 12,  8   ], [ 20   ],
]);

my $err = 10;
# Stop after 4096 epochs -- don't want to wait more than that
for ( my $i = 0; ($err > 0.001) && ($i < 4096); $i++ ) {
    $err = $dataset->learn($network);
    print "Epoch = $i error = $err\n";
}

foreach (@{$dataset->run($network)})
{
    foreach (@$_){print $_}
    print "\n";    
}

print "this should be 1 - ".@{$network->run([0,1])}."\n";

# foreach my $a ( 1..10 ) {
#     foreach my $b ( 1..10 ) {
#     my($ans) = $a+$b;
#     my($nnans) = @{$network->run([$a,$b])};
#     print "[$a] [$b] ans=$ans but nnans=$nnans\n" unless $ans == $nn
+ans;
#     }
# }
[download]

Comment on Re^2: NNflex problems (win32) Download Code

Replies are listed 'Best First'.
Re^3: NNflex problems (win32) by QM (Parson) on May 13, 2005 at 22:10 UTC
To evaluate the learning progress of a network, you need to know a little more about how neural nets find solutions. Imagine the space of all possible solutions (from perfect to awful) as a 3D landscape. The altitude is the error value, the coordinates are internal and external values in the NN sim. Training the neural net is something like a marble rolling along this terrain, tending to roll downhill. Sometimes the marble will stop at the bottom of a sinkhole on a mesa, nowhere near the global minimum error value. The odd random kick may send the search out of the local minimum toward a better solution. The size of the random kick may be changed over time, such that later kicks tend to be smaller. Multiple training sessions may be run, and the "mean training time" to a certain error limit computed. (Neural nets also benefit from having noisy connections, even after training is complete. ) A single training run may fail to meet the error spec, even if run forever. Some problem spaces may also have a fractal or chaotic solution space -- slight changes to the starting conditions can drastically alter the solution found. -QM -- Quantum Mechanics: The dreams stuff is made of	[reply]
Re^3: NNflex problems (win32) by g0n (Priest) on May 15, 2005 at 17:58 UTC
A step in the right direction: By tinkering a little with your code, I've got it learning the data set at least some of the time, as follows: `Epoch = 40094 error = 0.0188258375344725 Epoch = 40095 error = 0.0188239027473993 1.99620990178657 999.999223878912 2.99174991748112 3.99621594182089 39.9963246624386 99.9965058634682 100.085765949921 199.996807865184 299.9971098669 19.9962642620954 29.9962944622671 20.0141162793859` [download] (Compare with the code below). What have I changed? Here's the code as it stands at the moment: use AI::NNFlex::Backprop; use AI::NNFlex::Dataset; my $network = AI::NNFlex::Backprop->new( learningrate=>.00000001, fahlmanconstant=>0, momentum=>0.4, bias=>1); $network->add_layer( nodes=>2, activationfunction=>"linear"); $network->add_layer( nodes=>2, activationfunction=>"linear"); $network->add_layer( nodes=>1, activationfunction=>"linear"); $network->init(); # Taken from Mesh ex_add.pl my $dataset = AI::NNFlex::Dataset->new([ [ 1, 1 ], [ 2 ], [ 500, 500 ], [ 1000 ], [ 1, 2 ], [ 3 ], [ 2, 2 ], [ 4 ], [ 20, 20 ], [ 40 ], [ 50, 50 ], [ 100 ], [ 60, 40 ], [ 100 ], [ 100, 100 ], [ 200 ], [ 150, 150 ], [ 300 ], [ 10, 10 ], [ 20 ], [ 15, 15 ], [ 30 ], [ 12, 8 ], [ 20 ], ]); my $err = 10; # Stop after 4096 epochs -- don't want to wait more than that for ( my $i = 0; ($err > 0.001) && ($i < 40096); $i++ ) { $err = $dataset->learn($network); print "Epoch = $i error = $err\n"; } foreach (@{$dataset->run($network)}) { foreach (@$_){print $_} print "\n"; } # foreach my $a ( 1..10 ) { # foreach my $b ( 1..10 ) { # my($ans) = $a+$b; # my($nnans) = @{$network->run([$a,$b])}; # print "[$a] [$b] ans=$ans but nnans=$nnans\n" unless $ans == $nn +ans; # } # } [download] The alterations are: The tanh activation function doesn't play nicely with numbers > 1, so I've changed all the layers to a linear activation function. The numbers are quite large, so I've set the learning rate very very small. I've taken out the fahlman constant - it's difficult to say what that will do with linear activation, but I'd be surprised if it was anything good. I've changed the order in which the data items are presented - putting 500+500 directly after 1+1. That was a bit of a hunch, but seemed to improve matters. Possibly because you then get large changes in weights together rather than in different places in the epoch, which will make things unstable. I changed the max number of epochs to 40096, because it was trending towards a solution but not reaching it in time. I'll carry on looking at this - I've never really used this code for non binary represented data, so there will almost certainly be improvements that can be made. Looking at NeuralNet-Mesh, it learns this data set very quickly, so there may be something I can derive from looking at that code. But at least you can now derive and save a weight set that will do additions (although you might have to interrupt and restart a few times to get a good, quick run). Its likely that it can be improved by: Altering the range of starting positions (with `randomweights=>MAXIMUM STARTING VALUE`, perhaps set to 20 to start) Experimenting a little more with (probably smaller) values for learningrate & momentum Changing the order of the dataset to orders of magnitude (answer = 10, answer=20, answer=100 etc) I'll post again on this thread if I find anything really useful. Update: Gah! fahlman constant is the default position. I've amended the code to set the fahlman constant to 0, that seems to work better. -------------------------------------------------------------- g0n, backpropagated monk	[reply] [d/l] [select]