Here is the continuation of the preliminary comparison in evolutionary performance between the three transfer functions in 3 node circuit CTRNNs over 20 evolutionary runs each. The green trajectory is the average over the best fitness for the logistic sigmoid activation function; the red corresponds to the f=tanh(x); and the blue corresponds to the f=(alpha)*tanh(x) + (1-alpha)*sin(x) activation function (where each node has its own evolved alpha parameter). The triangles at the end of the evolutionary runs correspond to the standard deviation of the 20 best fitness, colored respectively.
If instead of looking at the full detail of the trajectories we concentrate on the broad areas in the 1D space the agent finds itself in over time, then we can have a higher-level view of its behavior: concentrating not in particular trajectories but in the overall performance. Imagine we color code each region like follows:
One peak is blue, the other red, the in-between region is neutral-white and the outer regions are darker blue and red accordingly. We can start the agent in 400 different positions uniformly between the range [-10,10], all with the same internal starting state, in the entirely deterministic simulation (no noise except for the inevitable computer rounding off errors) with a time-step of integration of two orders of magnitude smaller than the smallest time-constant to avoid absurd time-integration errors (timestep=0.01, smallest time-constant parameter=1, the time-step actually used for evolution is 0.1), for the regular 500 units of time. The trajectory of the individual is color-coded according to the previous figure.
During the start the agent is not very chaotic, similar environmental situations cause similar behaviors, but as time passes the agent’s trajectory becomes more and more unpredictable.
What does this mean exactly? I’m still figuring this out in the broader picture of what I would like to say about this task and agent. In the more detailed and technical sense, however, I definitely feel that this is proof that the system is sensitive to initial configurations. Also, I think it suggests that the decision of what the system does next (at most? times) depends on its internal state more than its environment. What I would like to find out is how to tell when (or even whether) it is also being influenced by the environment. My intuition is that this is, of course, happening when, for example, the agent has to find the actual ‘hot’ regions. Ideally, I have to figure out how to present a map that can tell me when it is following the environment and when it is following its internal dynamics (but I’ve said many times before already).