QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
$39 June special! Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro
Topic: PolicyBlocks: An Algorithm for Creating Useful Macro-Actions in RL
Branched from topic: Cluster validation by prediction strength
Views: 1155, Unique: 536 
Subscribers: 1
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages            0-14 of 14        
About these ads
Who | When
Messagessort recent-bottom   
Post a new message
 
tonybattery  14
07-01-2008 10:49 PM ET (US)
Edit
Delete
 
Messages 13-12 deleted by topic administrator between 05-16-2008 02:21 AM and 07-22-2006 09:26 AM
Deanne  11
07-21-2006 04:42 PM ET (US)
Hello, excellent design and nice discussion, best wishes from me! actos webpage devoted to actos. buy acne medication on line webpage devoted to buy acne medication on line. is also very good.
 
Messages 10-9 deleted by topic administrator 07-21-2006 08:57 AM
Dustin Boswell  8
04-15-2003 01:18 AM ET (US)
I'm curious how well PolicyBlocks would apply to games. In the intro, the authors alude to chess as an example, where opening moves (and endgames and general strategies?) would be an example of macros for that domain. Clearly, chess has too large of a state-space to attack the problem this way, but is this the only problem?

One thing I'd like to know is why gamma (the discount factor) has to be strictly less than 1 for the discounting.
Algebraicly, the infinite sum won't converge I guess, but
in a situation like chess (or other games), there isn't an inherent discounting? (That is, I don't care how many moves it takes me to win, as long as I do.) In situations like these, do you just let gamma=1 ignoring the requirement, or do you set gamma=0.99999 and feel better? :)
Eric  7
04-14-2003 07:05 PM ET (US)
For Douglas' question:
  Reuse creates one option that is a random combination of all the previous optimal policies. Depending on how long the agent follows this policy until it chooses another action, the agent may wind up running around in circles for a long time.

For Dave:
  My guess is that many, if not all of the options are sub-optimal for the test problems. This is ok because there are primitive actions to fall back on. The overfitting comes in when exploration is considered. If the options are formed for only a few sample problems, there may be several options that essentially direct the agent to a particular area of the state-space. This is the bias that the paper talks about.
Andrew Smith  6
04-14-2003 07:02 PM ET (US)
If I am interpreting this correctly, REUSE outputs a stochastic policy that is _expected_ to take the action that the majority of the input policies (the set L) take. (so if 2 say go left and 1 says go right, it goes left with prob 2/3 and right with prob 1/3).

Looking at the top-left "room" of the Grid-world (8 cells big), you can see how the example policyBlocks option directs the agent through the "doorway." This is to be expected -- those actions for that room should be common to most input policies, since the goal cell for each of those polices is probably not in that room. By this reasoning we should expect the REUSE policy to be similar in that room.

However, for the "doorway" in the middle (10 down, 3 from right) we would expect about half the policies to direct the agent up and half to direct the agent down so the REUSE policy would just wander randomly.

The Resivoir problem makes better use of REUSE because most of the input policies are similar (the probability that two differ on any given state is low). So there are few states analagous to the "doorway" example above in which the REUSE policy isn't sure what action to take.
Douglas Turnbull  5
04-14-2003 05:46 PM ET (US)
It is a little unclear why Berstein's Reuse rule is a weakness in the Grid-world task and a strength in the Reservoir problem? It is discussed in the the fourth paragraph of the discussion section.
Dave KauchakPerson was signed in when posted  4
04-14-2003 02:17 PM ET (US)
I found this paper very interesting. I think one of the best things that they did was try and introduce some more formal definition of their solution as an algebra, rather than just an adhoc approach. What are the advantages and disadvantages of their creation of partial policies?

The authors don't really discuss this, but the shared information between optimal policies Li is important to how well this method will work. If the shared information is too high, then the learned partial policies will simply approximate the Li. If the shared information is too low, then when intersecting policies no partial policies will result. It would be interesting to investigate how this shared information affects performance.

Is it possible to have sub-optimal partial policies? Does this relate at all to the "overfitting" mentioned in the discussion.

In the discussion, they mention that the main difference between their method and SKILL is its use of PerformanceLoss and usage function instead of the subtraction operation. It would be curious to implement either the subtraction method for SKILLS or PerformanceLoss for PolicyBlocks and see how these performed. This would be a nice study to confirm their speculations that the benefit was from the subraction method.

I wonder how this algorithm will work as the problems become more complex. In real world problems the size of the state space may be much, much larger than 18x18. This may make finding good partial policies with the intersection operation expensive.
Eric  3
04-14-2003 01:36 PM ET (US)
There's a typo on page 2 of the paper. In section two, the equation for accumulated discounted reward is (in LaTex):

\sum_{t=0}^{\infty} \gamma^t r_t

PS. ask some questions!
Charles Elkan  2
04-01-2003 04:28 PM ET (US)
New topic: Rational Kernels
Charles Elkan  1
03-31-2003 05:24 PM ET (US)
RSS link What's this?
All messages            0-14 of 14        
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2006 Internicity Inc. All rights reserved.