QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Sequential cost-sensitive decision-making with reinforcement learning
Views: 333, Unique: 221 
Subscribers: 0
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages    << 6-9  5-5 of 9  1-4 >>
About these ads
Who | When
Messagessort recent-top   
Post a new message
 
Dana Dahlstrom  5
04-04-2002 03:03 PM ET (US)
I think Kristin's qualm with the notation in section
2.1 is right; it seems there is some inconsistency:

In paragraph 3 the sequences of actions and rewards
begin with $a_1$ and $r_1$. The state $s_0$ is
mentioned, but not $a_0$ or $r_0$. This leads me to
suppose $r_i$ is a function of $s_{i-1}$ and $a_i$.

Equation 1 is consistent with this supposition: $r_1$
is discounted by $\gamma^{0} = 1$, and so is treated as
an immediate reward. But in Equation 2 $r_1$ is
discounted by $\gamma$, and suddenly there is an $a_0$.
I think this can be remedied by changing $\gamma^t$ to
$\gamma^{t-1}$ and $a_0$ to $a_1$ in Equation 2.

Equation 3 is consistent with the others if the pair
$(s,a)$ corresponds to pairs $(s_t,a_{t+1})$ and $r$
correponds to $r_{t+1}$ from the sequences. That is,
action $a_{t+1}$ is performed in state $s_t$, and the
reward $r_{t+1}$ results.

Perhaps I overlooked something. Does this sound right?
RSS link What's this?
All messages    << 6-9  5-5 of 9  1-4 >>
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.