What effect does the neural transfer function have on the ‘interestingness’ of circuit dynamics? What counts as ‘interesting’ dynamics? I will present here one very simple and very preliminary study that suggests this is worth looking further into.
In continuous-time recurrent neural networks (CTRNNs), the most common transfer function is the logistic. My impression is that the second most used would be the tangent hyperbolic.. ? However, there’s never been any substantial claim (at least that I know of) of any large and fundamental differences between these two. Indeed, there isn’t any justification (as far as I’m aware) to use anything different than the logistic function – which is arguably the most biologically natural. Furthermore, as has been pointed out to me by Lionel, in the case of the tanh, the change is merely a mapping. Thus, the differences are not, in theory, any.
Even if there are no theoretical differences, there may still be quite important practical differences. This is what this post is concerned with. In particular, given the same volume of parameter space, qualitatively different dynamics could be differently represented. This would make for a substantial difference, particularly for evolutionary robotics type methodologies. For example, it could be the case that the volume in parameter space that generates CTRNNs that have phase-portraits with only stable point attractor(s) when using the logistic transfer function is larger than when using the tanh. This could mean, that in practice, more ‘interesting’ dynamics could be obtained (keeping everything else the same) with the latter.
So, what could we mean by ‘interesting’ dynamics? There could be several layers of ‘interestingness’ which I won’t talk about now. Let’s focus instead in the most minimal form of ‘interestingness’. The simplest behaviour one can obtain in a dynamical system is the point attractor. For the experiment in this post, I will define ‘minimal interestingness’ as anything but a point attractor. More precisely, I will explore randomly chosen networks from the following range: weights [-10,10], bias [-10,10], and time-constants [e^0,e^5]. I will do this for different sized networks, N in [2, 20], and 4 different transfer functions which I will explain ahead.
Each randomly chosen network is integrated (only once) for 100 units of time using Euler integration with a 0.01 time-step. During the last 50 units of time, the change of all neurons in the circuit are added up. The idea is that if it has settled into a point attractor, this accumulated change will be very small (in practice around 10^-12), if, on the other hand it settles into anything else (e.g. limit cycle, chaotic attractor, etc.) it will register as a larger accumulated difference (usually greater than 10). Note that the point of this exercise is not to check whether the randomly chosen circuit has ANY point attractor, but rather the proportion of falling in one by chance – as opposed to falling into anything else. Thus, each randomly chosen circuit is integrated only once. At the start of each test the activation of the neurons are set to random between [0,1].
The four transfer functions tested are the following:
(1) f = 1 / (1 + e^-x)
(2) f = tanh()
(3) f = 0.75*tanh() + 0.25*sin()
(4) f = a*tanh() + (1-a)*sin() ; where a is a random number between [0,1] for each neuron in the circuit.
Each point in the figure represents the proportion of circuits that are ‘minimally interesting’ (as defined above) out of 500 randomly chosen circuits. The effect seems to be a big one… particularly between the monotonic transfer functions (1 and 2) and the non-monotonic ones (3 and 4). Although not as pronounced, there seems to be some difference between the logistic and the tanh(). This is very much unexpected. The difference between (3) and (4) is even less pronounced, if any. I suspect raising the level of what counts as ‘interesting’ will probably make the differences between these last two grow. This I speculate from what I can see merely from eye-balling example dynamics.
What do you think? Why is this happening? Do you think this is worth looking at in more depth?
Note: as you may know already, the idea for the non-monotonic transfer function comes originally from Ollie Bown’s work.