Greg Hamerly
|
3
|
 |
|
04-23-2002 02:43 PM ET (US)
|
|
One thing I liked about this paper is the experiment they performed in figures 3 & 4, which has the mentor giving misleading information. Clearly, giving a training signal that is better than random information (with a mentor that has a correct policy) will give an improved performance -- but what about evil mentors, or error-prone mentors? This experiment speaks to those.
Obviously the constraints that the mentor's state space be a subset of the observer's is a strict one; can anyone comment on how well this restriction can be overcome?
|