| Who | When |
Messages | |
|
|
|
| Douglas Turnbull
|
1
|
 |
|
04-28-2003 06:13 PM ET (US)
|
|
The paper I will be presenting involves an intelligent social agent in text-based chat environment called LambdaMOO. You can check out the virtual environment by going to: telnet://lambda.moo.mud.org:8888A couple of minutes of interaction with the environment will put the Cobot paper into context.
|
| Douglas Turnbull
|
2
|
 |
|
04-29-2003 07:07 PM ET (US)
|
|
Sutton and Barto have a good on-line reference to reinforcement learning. The site is http://www-anw.cs.umass.edu/~rich/book/the-book.htmlI will be going over sections 3.5 to 3.8 on Markov decision processes and value functions. There will also be material presented from 8.1 to 8.3 on linear function approximation.
|
| Dustin Boswell
|
3
|
 |
|
04-30-2003 03:27 AM ET (US)
|
|
It would have been nice to see some examples of interactions between Cobot and active users. In the summary they say that Cobot "learned non-trivial preferences for a number of users" and they also state a goal of learning "useful, interesting, and entertaining" actions. But the paper didn't show any of that. The results they show were two strange plots with no connection to whether anything was successful in any meaningful sense. I don't mean to be so critical, but with a project like this, people will read the paper expecting "cool results".
I think the biggest problem with their setup (one they acknowledge) is that users become complacent once Cobot is marginally trained. I think they should focus more on incorporating "implicit" rewards rather than the explicit REWARD and PUNISH commands. For example, things like how much response (and what type) is generated from a topic suggestion should be direct rewards if the goal is to create a system that stimulates interesting conversation.
|
| Douglas Turnbull
|
4
|
 |
|
04-30-2003 11:57 AM ET (US)
|
|
Dustin, You should be critical because these are major weaknesses in the paper. The second graph (1b) shows that cobot learns one perference for one user. In the whole paper, the author only illustrate that cobot learns only a few preferences. Part of the reason for this is that there was not enough data collected to effectively learn more perferences. The authors give the first graph (1a) to show that even with enough time, Cobot might not find enough data due to the decreasing average reward problem.
We will go over both graphs and talk about the feedback mechanisms in today's talk.
|
| Andrew Smith
|
5
|
 |
|
04-30-2003 01:50 PM ET (US)
|
|
I agree. It would have been interesting to see if the agent could interpret user responses as rewards and punishments, as opposed to a set of pre-defined feedback verbs (reward, punish, hug, spank).
Something like:
Cobot> User_A, meet User_B, B, meet A. Cobot> It sure is quiet in here! Cobot> ROLL CALL: WHO LIKES COOKIES? User_B> Uggh! Shut up already! User_A> ROLL CALL: WHO THINKS COBOT TALKS TOO MUCH?
or
Cobot> ROLL CALL: WHO THINKS SADDAM IS STILL ALIVE? [followed by fast and relavent responses]
But I guess that brings up lots of other issues about natural text processing. Or even more difficult, IRC text processing...
|
| Neil Jones
|
6
|
 |
|
04-30-2003 02:51 PM ET (US)
|
|
I'm not sure what the objective of this paper is, partly because the problem they pose is so ill-defined. On the one hand, their stated objective was to build an agent that could take unprompted (yet useful) actions without requiring the specific encoding of rules. On the other hand, they point out a number of issues that distinguish this application from traditional machine learning problems.
Regarding their stated objective, I do not see how they can conclude that Cobot's results are compelling, given: - "many of the standard assumptions [of RL] are violated" - They cannot measure performance - They only presented data for 4 out of 4,836 users For the specific engineering problem they are trying to solve, I think there might be other simpler approaches that achieve the goal better. For example, couldn't a news ticker be considered to take unprompted, yet meaningful action?
Perhaps the strengths of this paper are that they point out difficulties that need to be addressed when applying ML techniques to human-computer interaction? I know very little about HCI, so I can't tell if they raised valid concerns. This paper would have been more informative, in my opinion, if they had concetrated on this point, possibly examining alternatives to the Average Reward curve.
|
| Anjum Gupta
|
7
|
 |
|
05-01-2003 04:05 AM ET (US)
|
|
It appears to me that the parameters to train COBOT are very basic and my no means seem to be sufficient to acheive their stated goals. The state space consists of 6 variables!! and the feedback is very poor, as Andrew mentioned, at least COBOT should be able to get the implicit feedbacks from the sentences and shouldn't have wait for an emote. Also it could look at pair or a vector of feedback to take away some of the ambiguity e.g. Spanking, followed by Giggling should definately be taken as a positive feedback. My concern is how are they really trying to acheive their goals by just using 6 state variables and very limited explicit feedback.
|
| Douglas Turnbull
|
8
|
 |
|
05-03-2003 07:47 PM ET (US)
|
|
Anjum, There are not just 6 state variables, there are 6 vectors of state variables. They never mention the exact size but if you look at Table 2, they suggest that there are 4 dimensions in the state vector, 8 or so in the mood vector,... My best guess is that there are about 20 or so stats that Cobot keeps track. However, what really is the true number of state variables necessary to capture a social setting?
|
| |
Messages 9-10 deleted by topic administrator 07-22-2006 09:26 AM |
| monerdo
|
11
|
 |
|
08-16-2008 08:02 PM ET (US)
|
|
eltletorelv
|