Goal-directed behavior of single cells Part 2 (Subjective nature of motivation (a single neuron can want))

Interruption of pairing for 5-10 minutes after acquisition or 20 minutes after extinction increases response to the first CS+ after the break (Fig. 3.4, top). Arrows between the last values during acquisition and the first values during extinction (or the last value during extinction and the first value during reacquisition) indicate the significance of the difference between these values. Increase in response to the CS+ after acquisition was selective and was not observed for responses to the CS~. This means that reorganizations of the selective excitability slowly (during minutes) continue after the end of learning and this interval corresponds to a period of reorganization of metabolic pathways during injury and the initiation of motivation.

Fig. 3.4. Neuronal activity during acquisition, extinction and reacquisition of classical conditioning. At the top the number of APs in the responses to the conditioned (closed symbols) and discriminated (open symbols) stimuli v.s. trial number. Medians and significance in the difference between responses to the conditioned stimulus and discriminated stimulus are shown at the top (Mann-Whitney U test, *p < 0.05; **p < 0.01; ***p < 0.001). At the bottom mean membrane potential and confidence intervals (p < 0.05) are shown. Ordinate for membrane potential at the all figures is traditionally directed in the reverse course, since a membrane potential is always negative. Methods. The discriminated stimulus CS~ was never paired with the US and was presented every one to four combinations of CS+ and US over the course of training. The training procedure consisted of the acquisition (25-35 combinations of the conditioned and unconditioned stimuli), an extinction series (15-20 presentations of the isolated conditioned stimulus after 5-10 min break), and a second acquisition stage consisting of repeated development of the conditioned reflex following a 20-minute break. The Fig. 3.4 was redrawn in accordance with the data [1257].

Possibly, depolarizing influences of the harmful US were homeostacally compensated. This compensation supported membrane potential during 2-4 consecutive learning trials, then changed the membrane potential for another quasi-stable level and changed again (Fig. 3.3). However, average membrane potential level during acquisition stayed stable (Fig.3.4, bottom). Regular modification in the strategy of maintenance of membrane potential during acquisition is not entirely clear. Probably, this is somehow connected with the brain strategy for search of right response: "trial-and-error", which is better displayed during instrumental conditioning (see further).

Weak average shifts of the membrane potential during habituation and acquisition of classical conditioning is in accordance with data previously reported [48, 549, 1258, 1259, 865]. However, after intensive classical defensive conditioning in the snail (160 pairs), membrane potential in identified neurons of pneumostome closuring was reported to decrease during 40 days and recovered only through 52 days following conditioning [444]. Evidently, such a protracted session of non-avoidant pain leads to profound homeostatic reorganization of ion equilibrium in the vicinity of a new set-point, as we have discussed earlier (similar events are observed in drug abusers). Moreover, prolonged depolarization may be a false result of improper measurement. In aforementioned experiments, neuronal activity from an identified neuron was recorded from the already intensively learned animals. It is not excluded that after strong defensive conditioning, defensive neurons were sensitive to damage and after impaling by the microelectrode neurons were depolarized because of physical injury of the membrane. In similar circumstances, recording from pyramidal neurons in hippocampal slices prepared from rabbits after exhaustive eyeblink conditioning revealed unspecific increase in excitability (excitability was recovered after 7 days), although no learning-related effects were observed on AP amplitude or duration, or resting membrane potential [865]. Unspecific change in excitability (without change in the membrane potential) may be observed in cerebellum after stressed learning. In Purkinje cells of the slices prepared from animals trained to classical eyeblink conditioning, increase in unspecific excitability (effect of direct current stimulation was measured) was still presented following 30 days. Although membrane potential, input resistance and AP amplitude were not different in cells from paired or control animals, a potassium channel-mediated transient hyperpo-larization was smaller in cells from a trained animal [1100]. These effects were observed because, in the trained animals, part of the cells had a low threshold and might be more susceptible to impaling the cell by an electrode.

Values of responses in our experiments depended on the contingency between appearance of these stimuli (CS+ and CS~) and a painful US. During an extinction session, this contingency changes and responses to the signals change, too. During the classical paradigm, the conditioned response increased, but it decreased during extinction sessions and recovered rapidly during reacquisition (Fig. 3.4, top). This was not the case for responses to the discriminated stimulus.

Therefore, we investigated how during classical conditioning a completely compensated membrane potential will behave after cancellation of the unconditional painful stimulation in a Helix mollusk. For that reason, it is important to study how membrane potential changes during a whole training session, when the contingency alters.

Depolarization is an immediate result of negative reinforcement. Punishment evokes excitation and a long residual depolarization. However this depolarization was not accumulated, since membrane potential, on the average, stays the same during acquisition of the conditional reflex. Hence, during training, a homeostatic protection counteracts deleterious influences of punishment. The existence of such compensation is displayed after a break in training. Homeostasis is an inertial system and it temporarily continues its performance after a fall in the stressor. Interruption in stimulation after acquisition and after extinction leads to disturbance of the homeostatic equilibrium which had reached the steady state during training. Homeostasis, evidently, completely compensates excitations evoked by unconditioned stimuli, by means of prolonged compensational hyperpolarization. Since homeostatic compensation carries on after disruption of acquisition, this compensation influences membrane potential. Depolarization of the membrane potential through the action of unconditioned stimuli can affect ion homeostasis and disturb excitability, but is compensated via some unknown homeostatic process seen after a pause in conditioning (arrow at Fig. 3.4, bottom). After completion of the acquisition session and cancellation of the unconditioned stimulus presentation a compensational mechanism continues to hyperpolarize neurons and the value of the membrane potential increases, as can be seen from the voltage shift at the onset of extinction. The unconditioned stimuli were omitted during a session of extinction and the membrane potential then recovered, though response to the conditioned stimulus decreased. The abolition of the unconditioned stimulus evokes rapid hyperpolarization followed by slow recovering of the membrane potential. A 5-10 minute time break between acquisition and extinction approximately corresponds to a period of stability of membrane potential during acquisition (Fig. 3.4). During extinction, the compensation gradually decreases and membrane potential recovers. Although extinction, in outward appearance, resembles a habituation, the membrane potential of neurons during habituation does not change, but it does change during extinction. The main difference between extinction and habituation is abolition of the reinforcement during extinction and the absence of reinforcement during habituation.

Under motivational tension, classical conditioning, evidently, increases vulnerability to damage, although compensation recovers the normal level of membrane potential. Complete compensation of background neuronal activity is observed also after depotentiation of LTP.During ha-bituation, classical conditioning and after completion of tetanic stimulation an appearance of outer impacts is predictable and homeostasis leads to recovery of the initial conditions. During classical conditioning, the appearance of a conditioned stimulus always brings an end to the unconditioned stimulus independently of the generation or failure of an AP in response to the conditioned stimulus. However, homeostasis cannot recover initial conditions when appearance of punishment or reward is unpredictable, since homeostatic compensation is a slow process and it needs some minutes or more for its development. Therefore, homeostasis falls to recover the normal conditions if the environment changes faster. This phenomenon is well presented in the experiments with an irregular appearance of salient irritations. For example, abrupt cancellation of narcotic doses in drug addicts leads to neuronal overexcitation and evokes withdrawal syndromes. At the same time, regular consumption of drugs is accompanied by a compensation and even overcompensation (if, of course, one does not take into considerations the remote consequences). So, long-term, during a 3 month period, ingestion of high doses of ethanol causes in mice a decrease in spontaneous firing of the Purkinje cells and impairment of motor coordination, if experiments were carried out without withdrawal. These alterations in Purkinje cell firing did not affect the ability to learn or to recall a motor coordination task [1118].

Temporary increase in neuronal response to the CS+ at the beginning of extinction was not connected with the recovery of excitability by means of protective hyperpolarization that was observed at the same time. Such background depolarization is unspecific and cannot increase response to the CS+, while response to the CS~ did not change. Recovering influences of the protective inhibiting during the damage of excitability arises, evidently, as an advancing homeostatic reaction to the appearance of the CS+, which is causally connected with punishment. This is the main meaning of the selective change in excitability and, probably, selective protection. At any given moment, excitability of a neuron is its general property. However, excitability may be transiently changed in response to current stimulus. This change depends on experience and, perhaps, on having at a given moment some homeostatic protection.

Thus, we have considered how a neuron’s properties change during habit-uation and classical conditioning in connection with motivational behavior. The level of motivation in these cases depends upon the presence of a US in the environment, that is upon a reward or punishment, which protects or injures neurons and this is displayed as a modification of the state of an excitable membrane. Besides, during these simple forms of learning, expectation of a US changes during the training session: it decreases during habituation and during extinction of classical conditioning, but it increases during a stage of acquisition. At the finish of acquisition, expectation of a US appearance after a CS+ reaches a maximum, while after the CS~, expectation of a US tends to decrease to a minimum. The value of expectation is important for determining the value of homeostatic compensation. Compensation that is far ahead the impending damage can be properly generated only if expectation is high enough. In this case, damage and protection may counterbalance each other and the membrane potential will acquire a stable character, as, for instance, during classical conditioning. Abolition of the US leads to uncertainty of expectation and to a loss of established equilibrium. Such evaluation of impending reward or punishment is essential also during elaboration of instrumental conditioning, but here there are some peculiarities. In classical conditioning, the properties of the environment are very simple and do not depend on animal actions. In such cases, a neuronal homeostasis can compensate for the harmful effects of an unconditioned stimuli, while instrumental conditioning is a much more complex process. Generation of voluntary actions is especially tightly connected with evaluation of coming events but evaluation of expectation is a much more difficult task, since it is additionally dependent on correspondence between the animal’s own actions and future events. We will consider the problem of goal-directed behavior in the next topic, but now let us consider how properties of cellular membrane are altered during instrumental conditioning. During instrumental conditioning, if an experimental (trained) neuron did not respond to the conditioned stimulus, the animal received a shock, while firing of the control neuron remained unrelated to the shock. Thus, the meaning of the originally neutral stimulus was changed for the trained neuron, but not for the control neuron. The neural system had to determine which neuron was responsible for avoidance of the punishment. Fig. 3.5 portrays the results of a representative experiment. The dynamics of neuronal activity were not monotonous. Fig. 3.5 demonstrates both the correct and the incorrect responses depending on the trained neuron reaction to the CS+. US were presented only when trained neuron RPa3 (top curve at the each frame) failed to generate an AP. US appearance did not depend on activity of the control neuron LPa3 (bottom curve at the each frame) and on responses to the CS~. A neuronal analog of instrumental conditioning satisfied all well-known properties of instrumental behavior.

The response of the trained neuron to an originally neutral conditioned stimulus decreased at the beginning of the learning session but, at the end, an instrumental reaction was selectively generated by the trained neuron (Fig. 3.5). Only responses of the trained neuron to the conditioned stimulus did not decrease significantly after training. During elaboration of the instrumental reflex the responses of the trained neuron to a conditioned stimulus decreased during the first trials, gave rise to a harmful unconditioned stimulus (Fig. 3.5, trials N 11, 14, conditioned stimulus), but was later restored (Fig. 3.5, trials N38, conditioned stimulus).

On average, at the beginning of training, till the 25th trial, responses of the trained neuron to the CS+ decreased, and the animal received more and more punishments. Further, the trained neuron began to generate an AP in response to the CS+ and the number of punishments decreased, although did not cease. (Fig. 3.6).

Fig. 3.7 demonstrates average data for change in the trained and control neuronal responses during instrumental conditioning. The profiles of the learning curves had different (Fig. 3.7, top) shapes for the trained and control neuron responses to the conditioned stimulus, but similar shapes for responses to the discriminated stimulus (Figs. 1.8,1.9). Responses of a trained neuron to an originally neutral CS+ decreased at the beginning of a session (Fig. 3.7, top) and gave rise to the emergence of a harmful US. Then the neuron increased the response that served the instrumental reaction and the number of harmful stimuli decreased. The responses of the control neurons to the CS+ decreased when the response passed a second maximum (Fig. 3.7,top, 20-30 trials). A control neuron in this period of training generates an erroneous instrumental reaction, instead of a correct instrumental reaction of the trained neuron [1271]. At the end of the learning session, an instrumental reaction was selectively generated by the trained neuron AP

Fig. 3.5. Representative intracellular recordings of neuronal responses during elaboration of a local instrumental conditioned reflex with the trained neuron RPa3 (top in each frame) and the control neuron LPa3 (bottom) in the Helix mollusk. The number of the CS+ is indicated for each exposure. For the responses to the CS~, the number of the preceding CS+ is indicated. Failure of the spike in the responses to CS+ numbers 1, 6, 11, 22, 30, 38 (circle) in the trained neuron results in a US (triangles). Spike generation in the control neuron does not prevent delivery of a US (trials 1, 6, 22) and spike failure did not result in a US (trials 16, 41). Responses to the CS~ are shown with a shorter exposure (squares). Arrow at the trials CS+ 16 and CS+ 27 indicate reaction that had arisen, correspondingly, in the trained and control neurons before the responses to US ( probably, reaction to time). Mean membrane potential during recording in the trained neuron was -58,8 mV and in the control neuron -60.2 mV. Dotted lines indicate level of potential -65 mV. Calibrations are shown on the plot.

Fig. 3.6. Share of incorrect responses in response of the trained neuron to the CS+ during instrumental conditioning. Average for all recording neurons. Standard errors are indicated. Abscissa number of trials. Ordinate probability of the failure of action potential generation in the trained neurons after presentation of CS+. .

The neural system acquired information relative to the instrumental action through several stages, with varying intermediate "conclusions", relative to salient features of the ways for punishment avoidance (see topic 1). The response of the control neuron to the conditioned stimulus decreased in two phases on either side of a transient increase (erroneous instrumental reaction, trials 20-30; Fig. 3.7). When the control neuron generated an erroneous instrumental reaction, the trained neuron decreased its participation in the reaction. Thereafter, the trained neuron began to generate instrumental action potential in its response to the conditioned stimulus. Therefore, during instrumental conditioning, the dynamics of neuronal responses to a conditioned tactile stimulus were much more complex, consisted of several phases and point to the fact that neurons have no time in which to develop a strategy to compensate for the harmful influence of punishments. Thus, predictability of environmental alterations appears essential for the development of compensation and there appears to be a relationship between a compensa-tional change in the membrane potential and the predictability of behavioral outcome. Homeostasis can compensate for steady damage but is unable to compensate for influences of varying unpredictably in time.

Fig. 3.7. Behavior of neuronal activity during the acquisition of instrumental conditioning. Top the number of APs in the responses to the conditioned stimulus (medians, Mann-Whitney U test). Bottom change in the average membrane potential during training, mV. Confidence intervals are shown, p < 0.05. Solid symbols represent the data for the control neurons and open symbols the data for the trained neurons. An interval 20-30 trial is indicated.

At the end of the training session, an instrumental reaction was selectively represented by the trained neuron AP, i.e. the learning process demonstrated output selectivity (Fig.3.7, top). Responses of both the trained and the control neuron to the CS~ decreased as a result of training (Fig. 3.5, cf. trials N1 CS~and N38, CS_). Therefore, the learning process also displayed input selectivity. Although the initial excitability of the control neuron in respect to the conditioned stimulus in our experiments was usually larger than the excitability of the trained neuron, the difference in excitability is not responsible for the difference in the behavior of the neuronal activity between the control and trained neuron, as previously was demonstrated for habituation and instrumental conditioning [1270, 1271].

There is little doubt that we had a satisfactory neuronal model for instrumental conditioning. The only difference between the neuronal analog of instrumental conditioning and behavioral instrumental conditioning is the utilization of an AP in an identified neuron instead of a motor action. During training, the animal learned to determine which neuronal discharge was essential for the avoidance of punishment. Instrumental conditioning was selective by both input and output. Therefore, although we observed only neuronal activity, we may consider this neuronal activation as the instrumental action of the entire mollusk. Such like-behavior is called a neuronal analog of learning, since instead of behavioral reactions there may be recorded only electrical reactions of neurons.

During instrumental conditioning, the maintenance of membrane potential is disturbed, or at least changed in a complex manner (Fig. 3.7, bottom). At the beginning of training, an animal still did not receive information about the particularity of the environment, regularities of the US appearance are not clear for the animal and neurons have no time in which to develop a strategy in order to compensate for the harmful influence of punishments. The membrane potential of both neurons at the beginning of learning slightly increased. A control neuron apparently overcompensates for harmful influences and hyper-polarizes in the region of the vicinity of a local maximum for output response (Fig. 3.7, top), but further hyperpolarization of the control neuron continued to increase and reached a maximum (-68 mV compared with the -62 mV at the beginning) approximately in the middle of the session (Fig. 3.7, bottom). Later, the membrane potential in the control neuron recovered. Membrane potential in the trained neuron decreased roughly at the same time, when the control neuron was hyperpolarized. Thus, during instrumental conditioning, depolarizing influences of a painful US were not compensated for by the protective impact of homeostasis. Just when an instrumental reaction began, the trained neuron was depolarized, membrane potential reached the minimum (in the trials 28-33), and then rapidly repolarized. Changes in membrane potential during neuronal damage (Fig. 2.1), described in topic 2 and during instrumental conditioning (Fig. 3.7, bottom), are similar.

Alterations in the membrane potential were not the only immediate reasons for the change in the responses. For example, Fig. 3.5 demonstrates that the control neuron generated maximal response to the US when it was hy-perpolarized (response to CS+ 22) and it generated minimal response to the US when its membrane potential decreased (response to CS+ 38). Similarly, a trained neuron did not generate an AP in response to the CS+ when it was depolarized (responses CS+ 30 and CS+ 38), while it did generate a response when its membrane potential increased (CS+ 41). Averaged data (Fig. 3.7) also demonstrated the absence of obvious connections between membrane potential and the value of neuronal reactions.

During instrumental conditioning, the membrane potentials in the trained and control neurons also displayed coordinated changes, but cross-correlation revealed non-symmetrically significant peaks (Fig. 3.3). Trained and control neurons during instrumental conditioning fail to demonstrate highly synchronous alterations of membrane potentials (Fig. 3.8), as we have found for neurons during classical conditioning.

The origin of the correlation between the neuronal responses is impossible to explain by their similar changes during training, since cross-correlation between the membrane potentials, preliminarily averaged through the neurons, was negative and the mean membrane potentials in the trained and control neurons changed oppositely (Fig.3.7, bottom).

Fig. 3.8. Time delay between the factors of neuronal activity during instrumental conditioning. Ordinate: cross-correlation between membrane potentials of the trained neuron (input) and of the control neuron (output). Cross-correlation (bars) was calculated for pairs of neurons in each experiment and significance of the mean value (through neurons, p < 0.05, asterisk) for each lag was evaluated (Mann-Whitney U test). The interval corresponding 15 minutes is indicated.

Within an experiment, changes in the membrane potential of the control neuron were 3-5 trials, i.e. approximately 10 – 15 min, ahead of the corresponding changes in the trained neuron. The latencies of membrane potential changes during conditioning are too large for a spreading of the electrical signal through neural tissue, but were sufficient for cardinal reorganization of biochemical processes in the tissue and similar to both the latency of motivationally-relevant substances that act on motivation and also the latency of necrotic changes in neuronal electrical activity.

We have to remind ourselves that, all considered, here dependencies of the responses from membrane potential were received as a result of spontaneous change of the neuron state. In passing, direct artificial hyperpolarization of neurons by the current pulse can change only unspecific excitability. However, there are principal difficulties in an experimental examination of this suggestion, since neurons accept a current pulse as a signal and this procedure changes quality of the learning. And since we studied only alterations of membrane potential, which arose at the unintentional base, it was not clear what causal connection there is between changes in membrane potential and neuronal responses. We do not rule out the possibility of parallel development of these alterations or the existence of one common reason. Therefore, an increase in specific responses on the background of hyperpolarization may not be determined by the influence of the hyperpolarization itself. It is, of course, known that hyperpolarization affect differently responses during habit-uation and classical conditioning. Nevertheless, as we already have discussed, hyperpolarization in both cases promotes learning-evoked reorganization of responses and thus protected the organism. This is in agreement with the belief that an inhibition has a protective function, though inhibition may be directed not only to electrical events, but to metabolic processes also, for example, by means of inhibitory Gj proteins. Therefore, it is clear that the first approximation for paradoxical generation of neuronal reaction by means of protective hyperpolarization, which has been considered in topic 2 is deficient in the given case. Really, beneficial reorganization of neuronal plasticity during learning only in a particular case may be explained by the protective amelioration of the training-induced deterioration of membrane properties. In some cases learning decreases in excitability (habituation), whereas in other cases (classical conditioning) excitability increases and both modifications are beneficial for animal. In both cases, hyperpolarization was protective (since it promotes learning-related plasticity), but this protective influence was inhibitory for habituation, while it was excitatory for classical conditioning. Not only a hyperpolarization, but postsynaptic chemical influences also may produce ambivalent effects16.

Goal-directed behavior of single cells Part 2 (Subjective nature of motivation (a single neuron can want))

Related Links

:: Search WWH ::