What effect does the neural transfer function have on the ‘interestingness’ of circuit dynamics? What counts as ‘interesting’ dynamics? I will present here one very simple and very preliminary study that suggests this is worth looking further into.

In continuous-time recurrent neural networks (CTRNNs), the most common transfer function is the logistic. My impression is that the second most used would be the tangent hyperbolic.. ? However, there’s never been any substantial claim (at least that I know of) of any large and fundamental differences between these two. Indeed, there isn’t any justification (as far as I’m aware) to use anything different than the logistic function – which is arguably the most biologically natural. Furthermore, as has been pointed out to me by Lionel, in the case of the *tanh*, the change is merely a mapping. Thus, the differences are not, in theory, any.

Even if there are no theoretical differences, there may still be quite important practical differences. This is what this post is concerned with. In particular, given the same volume of parameter space, qualitatively different dynamics could be differently represented. This would make for a substantial difference, particularly for evolutionary robotics type methodologies. For example, it could be the case that the volume in parameter space that generates CTRNNs that have phase-portraits with only stable point attractor(s) when using the logistic transfer function is larger than when using the *tanh*. This could mean, that in practice, more ‘interesting’ dynamics could be obtained (keeping everything else the same) with the latter.

So, what could we mean by ‘interesting’ dynamics? There could be several layers of ‘interestingness’ which I won’t talk about now. Let’s focus instead in the most minimal form of ‘interestingness’. The simplest behaviour one can obtain in a dynamical system is the point attractor. For the experiment in this post, I will define ‘minimal interestingness’ as anything *but* a point attractor. More precisely, I will explore randomly chosen networks from the following range: weights [-10,10], bias [-10,10], and time-constants [e^0,e^5]. I will do this for different sized networks, N in [2, 20], and 4 different transfer functions which I will explain ahead.

Each randomly chosen network is integrated (only once) for 100 units of time using Euler integration with a 0.01 time-step. During the last 50 units of time, the change of all neurons in the circuit are added up. The idea is that if it has settled into a point attractor, this accumulated change will be very small (in practice around 10^-12), if, on the other hand it settles into anything else (e.g. limit cycle, chaotic attractor, etc.) it will register as a larger accumulated difference (usually greater than 10). Note that the point of this exercise is not to check whether the randomly chosen circuit has ANY point attractor, but rather the proportion of falling in one by chance – as opposed to falling into anything else. Thus, each randomly chosen circuit is integrated only once. At the start of each test the activation of the neurons are set to random between [0,1].

The four transfer functions tested are the following:

(1)* f = 1 / (1 + e^-x)*

(2) *f = tanh()*

(3) *f = 0.75*tanh() + 0.25*sin()*

(4) *f = a*tanh() + (1-a)*sin()* ; where *a* is a random number between [0,1] for each neuron in the circuit.

Each point in the figure represents the proportion of circuits that are ‘minimally interesting’ (as defined above) out of 500 randomly chosen circuits. The effect seems to be a big one… particularly between the monotonic transfer functions (1 and 2) and the non-monotonic ones (3 and 4). Although not as pronounced, there seems to be some difference between the *logistic* and the *tanh()*. This is very much unexpected. The difference between (3) and (4) is even less pronounced, if any. I suspect raising the level of what counts as ‘interesting’ will probably make the differences between these last two grow. This I speculate from what I can see merely from eye-balling example dynamics.

What do you think? Why is this happening? Do you think this is worth looking at in more depth?

Note: as you may know already, the idea for the non-monotonic transfer function comes originally from Ollie Bown’s work.

Wow.

I only barely understand this, but the results are pretty amazing nonetheless. I wonder, though, how transfer functions 3 and 4 were derived? Are their terms precisely arranged to give the effect shown?

Yes and no.

Nobecause with non-linear dynamical systems such as these it is hard to engineer solutions with particular properties. Even small 2-node fully connected circuits have around 16 qualitatively different dynamics, adding just one node makes the number of possible dynamics even hard to systematically study. Functions 3 and 4 are the combination of a sigmoid and the sine wave, predicting what their potential dynamics is would not be trivial. I started studying it only because of Ollie Bown’s work. He came up with it because he was trying to generate music out of these dynamical systems. Basically he needed something that would be crazy more often. For that reason,yes. I guess he tried several until finding one that proved interesting, often. For more info on the transfer function, the motivation, and some examples read this post.Personally I like to use hyperbolic tangent just because tanh(0) = 0.

The fact that logistic(0) = 0.5 has caused some unnecessary confusion

for me in the past when doing manual calculations.

Perhaps some people prefer the logistics function over tanh because

e^x is cheaper to differentiate?

The only annoying thing about tanh, is that you have to take negative values

into consideration. With e.g. roulette-wheel selection, you have to

convert each fitness value x to x = x + |n| + 1 where n in the smallest

fitness value.