How we change what others think, feel, believe and do
Positive reinforcement is guiding the learning about stimuli and actions that leads to desired reward. It is about the creation and strengthening of mental links between action and reward, such that in future the target takes specific actions in a predictable way and given intentional cues.
It can be used to link a command with an action by ensuring the reinforcement includes both command and reward. In time, the reward can be made less and less frequently as the command takes over as the primary stimulus. This is a far more effective way of training than using negative reinforcement.
To use positive reinforcement, first catch the subject doing something you want them to do, even in a small way, then reward them. Dogs often respond to food. People often respond well to praise. Then add a command if needed. Then gradually reduce the frequency and intensity of the reward. Then gradually reduce the size of the command. Eventually you will be able to create the desired action with a simple look or small movement.
Important principles include:
Common rewards/reinforcers include food, touch, praise, attention and play. Do experiments to find what works and in what situations.
A dog is given a bit of food every time it sits when the word 'sit' is said.
A child is praised when they eat all of their dinner.
You stop for a drink after exercising in the gym.
We naturally learn using positive reinforcement as we do more of that which makes us feel good, which means learning what we have to do to get things that make us feel good. When a dog lies down in front of a fire, they like the warm feeling they get, so in future they seek out fires, especially when they are cold. When a child first eats chocolate, it tastes good, so they seek more chocolate in order to re-stimulate the pleasure of the smooth, sweet taste. Life is full of such positive reinforcement, even when the result is unhelpful for the person (such as the use of narcotics) or other people (such as the bully who enjoys the feeling of power).
The sequence of basic reinforcement is:
Need --> Seeking --> Stimulus --> Perceived reward --> Desire --> Action --> Reward --> Learning (--> Seeking...)
We start with a general need, such as to eat. This leads us either to actively seek out ways to satisfy the need or to maintain a level of monitoring that will arouse us when the need may be met. At some external point, we sense something (the stimulus). On examination, we conclude that the stimulus includes a reward, which is something that satisfies the need to some extent. This creates the tension of desire that leads us to act in order to gain the reward and satisfy the need. We learn from this pleasure, remembering all we can about the stimulus and the action that gained the reward, so next time we will be more alert and able to satisfy our desires and meet our needs.
Needs and rewards include:
Stimulus and consequent desire can depend on the situation, and particularly whether how well the need is currently satisfied. For example when a dog has just eaten, a food stimulus may well be less effective. Distractions can also be a problem, for example where a dog is more interested in another dog than coming to you for a bit of food or praise.
Repetition is important in learning and conditioning, and the more we go through a reinforcement loop, the more we learn exactly what does and does not gain us the desired reward. Initially, the stimulus can be anything in the environment, including what the reward provider is wearing to what they say, as well as more direct things such as an item of food. The process of learning includes working out that which is necessary for the reward, that which is unimportant and how likely the reward is to be given. A dog, for example, may learn that when its owner says 'time for dinner', or even 'five o'clock', then it is about to be fed.
Rewards should be as small as possible while remaining effective. A reward which is not sought is not currently a reward (though it may be at another time). When using food with a dog, this enables you to keep training longer before the dog is sated. With people, if the reward is big, it becomes extrinsic and the person will become focused on the reward rather than the intrinsic value of the action. Small rewards lead people to explain their actions as 'I wanted to do it' as the reward is too small to use as a reason for their effort. Using a range of different rewards can stop them focusing on a single reward and give you the flexibility to reward in ways suitable to the situation.
When you are working with a dog, it can be useful to have a small supply of food with you at all times. The dog will smell and learn about this and ensure it continues to pay attention to you and be constantly ready to obey your commands.
Dogs that scrounge are training you, not the other way around. They provide a stimulus of dribbling or 'poor little doggie' expressions, which leads you to feed them, which the dog then rewards by wagging its tail. Children do this too when they nag (stimulus) then are grateful (reward).
What we call 'intelligence' includes being able to quickly work out the pattern of important stimuli and required action. A less intelligent subject will often need more repetitions to work out what is needed. From an evolutionary perspective, it seems an important survival skill to have enough intelligence to be able to learn about the stimuli and rewards in one's environment. If the environment is stable, then less intelligence is needed. In a changing and complex environment, more intelligence becomes important.
Reinforcement occurs when a stimulus that has previously led to a reward happens again, such that the stimulus leads to an increased perception of reward and consequent more rapid and reliable action. The stimulus is now a reinforcer as it strengthens the pattern of stimulus to action.
The point about uncertainty of reward is important, as hope is often enough to cause an intended action. This is used by dog trainers who gradually remove food (the reward) but leave a command. This eventually conditions the dog to obey the command rather than seek food. A pleasurable positive reinforcement may still be retained, such as praise, to help sustain the action.
Subjects may try to take actions that gain rewards by responding to unintended stimuli, such as dogs which scrounge when you sit down to eat (especially when you have previously given them a morsel). Just holding up a bit of food can lead them to go through a whole repertoire of sitting, lying down, and so on as they try to pre-empt your command. Do not reward them until you have a complete sequence of command and action. Early and inconsistent reward reinforces partial action (for example a dog half-sitting).