# Human Choice and Reinforcement Learning (3)

## Goals of Paramter Estimation

When estimating paramters for a given model, we typically aim to make an inference on an individual’s underlying decision process. We may be inferring a variety of different factors, such as the rate at which someone updates their expectations, the way that someone subjectively values an outcome, or the amount of exploration versus exploitation that someone engages in. Once we estimnate an individual’s parameters, we can compare then to other people or even other groups of people. Further, we can compare paramters within subjects after an experimental manipulation (e.g., does drug X affect a person’s learning rate?).

Below, we will explore muliple paremter estimation methods. Specifically, we will use: (1) maximum likelihood estimation, (2) maximum a posteriori estimation, (3) and Bayesian estimation. First, we will simulate data from models described in the previous post on a simple 2-armed bandit task. Importantly, we will simulated from known parameter values, which we will then try to estimate from the simulated data alone. We will refer to the known paramters as the true paramters.

## Simulation

For our simulation, we will simulate choice from a model using delta-rule learning and softmax choice. To keep things simple, the learning rate will be the only free paramter in the model. Additionally, we will simulate choices in a task where there are two choices, where choice 1 has a mean payoff of 1 and choice 2 has a mean payoff of -1. Therefore, a learning agent should be able to learn that choice 1 is optimal and make selections accordingly. However, we will add noise to each choice payoff (sigma below) to make things more realistic.

The following R code simulates 100 trials using the model ans task described above:

# For pretty plots
library(ggplot2)
library(foreach)

# Simulation paramters
mu    <- c(1, -1)  # Mean payoff for choices 1 and 2
sigma <- 5         # SD of payoff distributions
n_tr  <- 100       # Number of trials
beta  <- 0.1       # True learning rate

# Initial expected value
ev <- c(0, 0)

# Softmax choice function
logsumexp <- function (x) {
y <- max(x)
y + log(sum(exp(x - y)))
}
softmax <- function (x) {
exp(x - logsumexp(x))
}

# Simulate data
sim_dat <- foreach(t=1:n_tr, .combine = "rbind") %do% {
# Generate choice probability with softmax
pr <- softmax(ev)

# Use choice probability to sample choice
choice <- sample(c(1,2), size = 1, prob = pr)

# Generate outcome based on choice
outcome <- rnorm(1, mean = mu[choice], sd = sigma)

# Delta-rule learning
ev[choice] <- ev[choice] + beta * (outcome - ev[choice])

# Save data
data.frame(trial   = t,
ev_1    = ev[1],
ev_2    = ev[2],
choice  = choice,
outcome = outcome)
}

# Change in expected values across tirals
qplot(data = sim_dat, x = trial, y = ev_1, geom = "line", color = I("red")) +
geom_line(aes(y = ev_2), color = I("blue")) +
ggtitle("Red = Choice 1\nBlue = Choice 2") +
xlab("Trial") +
ylab("Expected Value") +
theme_minimal(base_size = 20)