Learning

Human Choice and Reinforcement Learning (3)

Goals of Paramter Estimation When estimating paramters for a given model, we typically aim to make an inference on an individual’s underlying decision process. We may be inferring a variety of different factors, such as the rate at which someone updates their expectations, the way that someone subjectively values an outcome, or the amount of exploration versus exploitation that someone engages in. Once we estimate an individual’s parameters, we can compare then to other people or even other groups of people.

Answer to post 1 In the previous post, I reviewed the Rescorla-Wagner updating (Delta) rule and its contemporary instantiation. At the end, I asked the following question: How should you change the learning rate so that the expected win rate is always the average of all past outcomes? We will go over the answer to this question before progressing to the use of the Delta rule in modeling human choice.

Short history In 1972, Robert Rescorla and Allan Wagner developed a formal theory of associative learning, the process through which multiple stimuli are associated with one-another. The most widely used example (Fig. 1) of associative learning comes straight from Psychology 101–Pavlov’s dog. Figure 1 The idea is simple, and it’s something that we experience quite often in everyday life. In the same way that Pavlov’s dog begins to drool after hearing a bell, certain cognitive and/or biological processes are triggered when we are exposed to stimuli that we have been exposed to in the past.