Human Choice and Reinforcement Learning (3)

Goals of Paramter Estimation

When estimating paramters for a given model, we typically aim to make an inference on an individual’s underlying decision process. We may be inferring a variety of different factors, such as the rate at which someone updates their expectations, the way that someone subjectively values an outcome, or the amount of exploration versus exploitation that someone engages in. Once we estimnate an individual’s parameters, we can compare then to other people or even other groups of people. Further, we can compare paramters within subjects after an experimental manipulation (e.g., does drug X affect a person’s learning rate?).

Below, we will explore muliple paremter estimation methods. Specifically, we will use: (1) maximum likelihood estimation, (2) maximum a posteriori estimation, (3) and Bayesian estimation. First, we will simulate data from models described in the previous post on a simple 2-armed bandit task. Importantly, we will simulated from known parameter values, which we will then try to estimate from the simulated data alone. We will refer to the known paramters as the true paramters.


For our simulation, we will simulate choice from a model using delta-rule learning and softmax choice. To keep things simple, the learning rate will be the only free paramter in the model. Additionally, we will simulate choices in a task where there are two choices, where choice 1 has a mean payoff of 1 and choice 2 has a mean payoff of -1. Therefore, a learning agent should be able to learn that choice 1 is optimal and make selections accordingly. However, we will add noise to each choice payoff (sigma below) to make things more realistic.

The following R code simulates 100 trials using the model ans task described above:

# For pretty plots

# Simulation paramters
mu    <- c(1, -1)  # Mean payoff for choices 1 and 2 
sigma <- 5         # SD of payoff distributions
n_tr  <- 100       # Number of trials 
beta  <- 0.1       # True learning rate

# Initial expected value
ev <- c(0, 0) 

# Softmax choice function
logsumexp <- function (x) {
  y <- max(x)
  y + log(sum(exp(x - y)))
softmax <- function (x) {
  exp(x - logsumexp(x))

# Simulate data
sim_dat <- foreach(t=1:n_tr, .combine = "rbind") %do% {
  # Generate choice probability with softmax
  pr <- softmax(ev)
  # Use choice probability to sample choice
  choice <- sample(c(1,2), size = 1, prob = pr)
  # Generate outcome based on choice
  outcome <- rnorm(1, mean = mu[choice], sd = sigma)
  # Delta-rule learning
  ev[choice] <- ev[choice] + beta * (outcome - ev[choice])
  # Save data
  data.frame(trial   = t,
             ev_1    = ev[1],
             ev_2    = ev[2],
             choice  = choice,
             outcome = outcome)

# Change in expected values across tirals
qplot(data = sim_dat, x = trial, y = ev_1, geom = "line", color = I("red")) +
  geom_line(aes(y = ev_2), color = I("blue")) +
  ggtitle("Red = Choice 1\nBlue = Choice 2") +
  xlab("Trial") +
  ylab("Expected Value") +
  theme_minimal(base_size = 20)

Maximum Likelihood


comments powered by Disqus