QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Policy invariance under reward transformations: ... reward shaping
Views: 355, Unique: 197 
Subscribers: 0
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages    << 22-31  21-21 of 31  5-20 >>
About these ads
Who | When
Messagessort recent-top   
Post a new message
 
Kristin Branson  21
04-09-2002 09:46 PM ET (US)
In response to a question asked during my presentation, the experiments with the foraging robots used Q-learning. The measure of correctness shown in the graph was the fraction of condition-behavior pairs that were correct (the optimal policy was calculated by hand). Since there are only a few actions (4), the policy learned without shaping was no better than random.

My slides are posted at http://www.cs.ucsd.edu/~kbranson/rewardshaping.ppt

Finally, the top of column two on page five clarifies the confusion I had with the sufficiency proof. If a policy maximizes Q(M'), then it maximizes Q(M) - phi(s), and since phi(s) does not depend on the action to take in state s, it also maximizes Q(M). Therefore, an optimal policy in M' is an optimal policy in M.
RSS link What's this?
All messages    << 22-31  21-21 of 31  5-20 >>
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.