perlchris has asked for the wisdom of the Perl Monks concerning the following question:

#!/usr/bin/perl -w use strict; use warnings; my $numberofinputnodes = 2; my $numberofhiddennodes = 2; my $learningr +ate = .1; my $maxepochs = 10; my $errormargin = .1; my @weight = (.6,.6,.6); my @hiddenweight = (.6,.6,.6); my @data = (["-1","0.000000","0.000000","0"],["-1","0.000000","1.00000 +0","1"], ["-1","1.000000","0.000000","1"],["-1","1.000000","1.000000","0"]); my $numberofepochs = 1; my $net = 0; my $netk = 0; my $avgrmse = 0; my + @outputh = (); my @outputk = (); my $deltak = 0; my $deltah = 0; my @rmse = (); my $m +axrmse =0; while (($numberofepochs <= $maxepochs) && (($avgrmse == 0) || ($avgrms +e >= $errormargin))){ $avgrmse = 0; for (my $j=0; $j <= $#data; $j++) { for (my $k=0; $k < $#{$data[0]}; $k++){ $net = $net + $weight[$k]*$data[$j][$k]; } $outputh[$j] = 1 / (1 + exp(-$net)); $net=0; for (my $o = 0; $o <= $numberofhiddennodes; $o++){ if ($o==0){ $netk = -$hiddenweight[0]; } $netk = $netk + $hiddenweight[$o]*$outputh[$j]; } $outputk[$j] = 1 / (1 + exp(-$netk)); $netk=0; $deltak = $outputk[$j]*(1 - $outputk[$j])*($data[$j][-1] - $ou +tputk[$j]); my @deltah; for (my $o = 0; $o < $#{$data[0]}; $o++){ $deltah[$o] = $outputh[$j]*(1-$outputh[$j])*$weight[$o]*$d +eltak; } for (my $m=0; $m < $#{$data[0]}; $m++){ my $update = $learningrate*$deltah[$m]*$data[$j][$m]; $weight[$m] = $weight[$m] + $update; } for (my $n=0; $n <= $numberofhiddennodes; $n++){ my $update = $learningrate*$deltak*$outputk[$j]; $hiddenweight[$n] = $hiddenweight[$n] + $update; } $deltak=0; @deltah = (); } for (my $j=0; $j <= $#data; $j++) { for (my $k=0; $k < $#{$data[0]}; $k++){ $net = $net + $weight[$k]*$data[$j][$k]; } $outputh[$j] = 1 / (1 + exp(-$net)); for (my $o = 1; $o < $numberofhiddennodes; $o++){ $netk = $netk + $hiddenweight[$o]*$outputh[$j]; } $netk = $netk - $hiddenweight[0]; $outputk[$j] = 1 / (1 + exp( +-$netk)); $net = 0; $netk = 0; } my $totalrmse = 0; for my $q (0..$#data){ $rmse[$q] = sqrt(($data[$q][-1] - $outputk[$q])**2); $totalrmse = $totalrmse + $rmse[$q]; } $avgrmse = $totalrmse / ($#data +1); @rmse = sort( {$a <=> $b} @rm +se); $maxrmse = $rmse[-1]; if($numberofepochs==$maxepochs){ print "\n","***** Epoch $numberofepochs *****", "\n"; print "Maximum RMSE: $maxrmse", "\n"; print "Average RMSE: $avgrmse", "\n"; } @rmse = (); $totalrmse = 0; $numberofepochs++; } exit;

The final output looks like this:

***** Epoch 10 ***** Maximum RMSE: 0.570149294921984 Average RMSE: 0.500071929555965

The output should be this

***** Epoch 10 ***** Maximum RMSE: 0.5432215742673802 Average RMSE: 0.4999911891911738

Which shows its behaving with some precision, but it isn't accurate and I have no idea why. The values move in the right direction, but something seems wrong with exactly how much the values change.

I'm trying to implement backpropagation with stochastic gradient descent (this example just uses fixed initial weights instead of random ones) for a Machine Learning course.

I've been able to get the final output incrementally closer to the correct answer over several revisions, but I haven't been able to improve it anymore. So I think additional sets of eyes might be useful.

I'd like to think this code shows I understand how the algorithm works and that the problem is boiling down to some sort of scoping issue or aggregate error that I'm not seeing, but I don't know. Thus I humbly seek Perl Wisdom from the Monks! Thank you in advance for any suggestions.

Replies are listed 'Best First'.
Re: backpropagation accuracy issue
by GrandFather (Saint) on Feb 19, 2011 at 02:57 UTC

    As a quick test I added use bignum; (see bignum) to the test script and obtained essentially the same erroneous values to greater precision so I'd guess the errors are unlikely to be due to rounding issues or accumulated errors. That implies that the algorithm has an error somewhere. How did you obtain the "correct" values? Can you provide a reference to a public online resource that describe the algorithm?

    True laziness is hard work

      The correct values were provided and are posted here

      The algorithm is described in Tom M. Mitchell's "Machine Learning" (p. 98) there are pdf slides here slides numbered 88-92 on pages 4 and 5 respectively.

      That's unfortunate that its not a rounding error I will comb through more closely and hopefully find where the problem is. If you have any suggestions it is certainly appreciated. Thank you for your help so far.

        @hiddenweight after one epoch:
        $VAR1 = [ '0.596062038434604', '0.596062038434604', '0.596062038434604' ];

        After two:

        $VAR1 = [ '0.592154093746109', '0.592154093746109', '0.592154093746109' ];

        Shouldn't they be different? (Just guessing here.)

        Doesn't even start the same.

        Him:

        ***** Epoch 1 ***** Maximum RMSE: 0.5435466682137927 Average RMSE: 0.4999991292217466
        You:
        ***** Epoch 1 ***** Maximum RMSE: 0.574001103043358 Average RMSE: 0.50006970432383
      Further support that this is not a floating pointing point issue: The max is correct to one digit and the average is correct to 4 digits, but doubles have about 17 digits (53 bits) of precision.

        If I could get both to 6 digits (or perhaps more) it'd be acceptable. The program that made the output to test against though is not written in Perl. I don't know if the casts are implemented significantly different in different languages, but I believe the other program was written in Java.

        We were allowed to write in any language and I thought Perl would be a good way to do it since that's what I've been reading on as of late.

        Again thank you for the help and suggestions. They are very appreciated. I will continue to look through and see if I can identify the problem with my implementation.