, 2009 and Moustafa et al , 2008) In this task, participants obs

, 2009 and Moustafa et al., 2008). In this task, participants observe a clock hand make a clockwise rotation about a clock face over a 5 s interval (Figure 1A). Participants press a button on a keypad to stop the rotation and win points. The probability and magnitude of rewards varied as Tyrosine Kinase Inhibitor Library a function of response time (RT), such that the expected value increased, decreased, or stayed constant for different levels of RT (Figures 1C and 1D). For a given function, participants can

learn the optimal style of responding (e.g., fast or slow) to maximize their reward. Individual subject performance on the task was fit using a previously developed mathematical model (Frank et al., 2009) that allows trial-by-trial estimates of several key components of exploratory and exploitative choices. In this model, different mechanisms advance these contradictory drives in an attempt to maximize total reward. In what follows, we will discuss the key components of the model relevant to the current fMRI study (full model details are discussed in the Supplemental Experimental Procedures, available online). We also conducted

a number of simulations using simplified and alternative models in order to assess robustness of the effect of relative uncertainty in RLPFC and its sensitivity to the specific model instantiation. These alternate models are described fully further selleck screening library below and in the Supplemental Information, though we will Astemizole briefly refer to them here. Both exploitation of the RTs producing the highest rewards and exploration for even better rewards are driven by errors of prediction in tracking expected reward value V. Specifically, the expected reward value on trial t is: equation(1) V(t)=V(t−1)+αδ(t−1)V(t)=V(t−1)+αδ(t−1)where α is the rate at which new outcomes are

integrated into the evaluation V and δ is the reward prediction error [RPE; Reward(t − 1) – V(t − 1)] conveyed by midbrain dopamine neurons ( Montague et al., 1996). A strategic exploitation component tracks the reward structure associated with distinct response classes (categorized as “fast” or “slow,” respectively). This component is intended to capture how participants track the reward structure for alternative actions, allowing them to continuously adjust RTs in proportion to their relative value differences. The motivation for this modeling choice was that participants were told at the outset that sometimes it will be better to respond faster and sometimes slower. Given that the reward functions are monotonic, all the learner needs to do is track the relative values of fast and slow responses and proportionately adjust RTs toward larger value.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>