| Who | When |
Messages | |
(not accepting new messages)
|
|
| Charles Elkan
|
137
|
 |
|
12-08-2008 12:09 PM ET (US)
|
|
What if we stop when the policy doesn't change and the change in Q is smaller than some tunable variable which we find by experimenting?
Sounds reasonable. In your report, explain your reasoning and experimentation that justify your procedure.
|
| Meir Schwarz
|
138
|
 |
|
12-08-2008 04:03 PM ET (US)
|
|
I showed up to discussion section at 11:05 but there was no one in the room.
|
| matt
|
139
|
 |
|
12-08-2008 04:30 PM ET (US)
|
|
yea, i figured since ta didnt respond to email there wasnt a section.
as for the final, is it possible for us to start at 8am if we want? I think I need every minute I can get to work on the test even if its made shorter.
|
| Meir Schwarz
|
140
|
 |
|
12-08-2008 10:06 PM ET (US)
|
|
/m137: This doesn't seem to work very consistently. Is there any sort of convention on when to stop?
|
| Charles Elkan
|
141
|
 |
|
12-08-2008 11:16 PM ET (US)
|
|
/m138, /m139: I apologize for the lack of communication. The TA told me earlier that no one came to the section. I'll investigate.
|
| Charles Elkan
|
142
|
 |
|
12-08-2008 11:18 PM ET (US)
|
|
/m137, /m140: I don't know any convention about when to stop Q-learning. If you have interesting findings, discuss them in your report. (We never penalize anyone for bad results, just occasionally for bad design decisions or bad explanations.)
|
| Meir Schwarz
|
143
|
 |
|
12-09-2008 02:08 PM ET (US)
|
|
Can you give us a breakdown of what the final is going to look like (e.g. 1 question on each of these topics, 10 true/falses over these...)? Any direction to help focus what we should study the most would be appreciated.
|
| Charles Elkan
|
144
|
 |
|
12-09-2008 03:56 PM ET (US)
|
|
/m143: The final covers all topics from the first nine weeks of the quarter, and from the assignments. The material in the last week's lectures by Dr. Noto is not included. The exam will be similar to the midterm in format, but maybe 50% longer. The instructions will be the same: open book, bring calculator, etc. There will be multipart regular questions (more than half of all points) and also true/false questions (less than half).
|
| Mike Rose
|
145
|
 |
|
12-09-2008 05:25 PM ET (US)
|
|
How can we find V(start) from Q(start)? Is there a good way to estimate this? It is supposed to be one of the metrics to report.
|
| Charles Elkan
|
146
|
 |
|
12-09-2008 06:20 PM ET (US)
|
|
You could use the definition V(start) = Q(start,a) where a is the action recommended by the final learned policy.
However, the final Q values may not be perfectly accurate, so it is better to do what the project description says: use policy evaluation to measure the goodness of the final learned policy.
|
| Meir Schwarz
|
147
|
 |
|
12-09-2008 09:54 PM ET (US)
|
|
What exactly are we looking for in the V values for evaluating the policy. We get the optimal policy more than 9/10 times with out looking at the Vs. Do we need to change something?
|
| Meir Schwarz
|
148
|
 |
|
12-09-2008 10:28 PM ET (US)
|
|
So out of curiosity I coded in running the Policy Iteration every 50 times and stopping if V(1)>0.7. When it works it ends up with the optimal policy after 50 times. Most of the time it causes an infinite loop and sometimes it even prints "matrix is singular to working precision" repeatedly.
|
| Charles Elkan
|
149
|
 |
|
12-09-2008 11:07 PM ET (US)
|
|
/m147, /m148: Evaluating the policy cannot be part of the agent's learning process. It is simply a way for you the programmer to measure the success of the learning process you design. Since the agent could never evaluate a policy with policy iteration (PI), the learning process cannot decide to stop based on this. However, after the agent stops learning using a heuristic, then you the programmer can run PI. Your goal is to invent a heuristic that the agent can use to stop as quickly as possible with a policy that is good as possible.
|
| Meir Schwarz
|
150
|
 |
|
12-09-2008 11:25 PM ET (US)
|
|
Edited by author 12-10-2008 02:08 AM
/m149: OK, That makes more sense. So V(1) is just a metric that tells us how well a particular run did (the higher the better)?
|
| Charles Elkan
|
151
|
 |
|
12-10-2008 01:10 PM ET (US)
|
|
So V(1) is just a metric that tells us how well a particular run did (the higher the better)? Yes, where "run" means "learning process."
|
| Charles Elkan
|
152
|
 |
|
12-10-2008 01:15 PM ET (US)
|
|
Final exam will start at 8am tomorrow (Thursday)
Some people prefer the full three hours, which is quite reasonable and the official UCSD policy. So we will follow this norm and go from 8am to 11am.
|