QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Policy invariance under reward transformations: ... reward shaping
Views: 363, Unique: 201 
Subscribers: 0
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages    << 24-31  23-23 of 31  7-22 >>
About these ads
Who | When
Messagessort recent-top   
Post a new message
 
Degui Zhi  23
04-10-2002 06:09 PM ET (US)
Also I was trying to understand the potential with discount. Consider a circle s1->s2->s1:
shaping reward for s1->s2 is gamma*phi(s2)-phi(s1)
shaping reward for s2->s1 is gamma*phi(s1)-phi(s2)
adding up we get total shaping reward is
-(1-gamma)(phi(s1)+phi(s2))

then the question is: phi(.)>0? or phi(.)<0, however I think the learning (of optimal policy) should be invariant wrt to the sign of the phi function...

I assume the author has just relaxed the definition of discounted potential-based function and assume phi(.) > 0 so that the discount is always negative. (however, in their experiment they used negative potential which is confusing to me)
RSS link What's this?
All messages    << 24-31  23-23 of 31  7-22 >>
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.