| Who | When |
Messages | |
(not accepting new messages)
|
|
| Charles Elkan
|
120
|
 |
|
12-07-2008 06:38 PM ET (US)
|
|
What is a good way to prevent the soft-max policy from picking illegal actions? You can exclude these from the softmax calculation, i.e. don't let them have a numerator. Or you can make them have negative infinity Q-value.
|
| Meir Schwarz
|
119
|
 |
|
12-07-2008 05:42 PM ET (US)
|
|
Will there be a review session for the final?
|
| Mike Rose
|
118
|
 |
|
12-07-2008 03:11 PM ET (US)
|
|
Our Policy Iteration algorithm produces the correct policy but the values are seem to be off by about .05. Is this acceptable or does it hint that we have a slight error in our algorithm? If not, any idea where we should start debugging?
|
| Mike Rose
|
117
|
 |
|
12-07-2008 03:04 PM ET (US)
|
|
What is a good way to prevent the soft-max policy from picking illegal actions?
|
| matt
|
116
|
 |
|
12-07-2008 02:57 AM ET (US)
|
|
how is it possible for our algorithm to work to find the optimal policy, but not be able to find the alternative optimal policies(it basically finds the opposite of those policies when values in range are entered)?
|
| Chris
|
115
|
 |
|
12-06-2008 06:21 PM ET (US)
|
|
Deleted by author 12-06-2008 06:21 PM
|
| Charles Elkan
|
114
|
 |
|
12-05-2008 07:02 AM ET (US)
|
|
Edited by author 12-05-2008 07:04 AM
When computing the expected total reward it can grow to almost twice the maximum (2). For example, when the proposed action is to move to the goal field (reward = 1) then r(s, pi(s)) = 1. But in the sum this reward is again taken into account since the goal field is included in s'. V(goal) = 1 and therefore V(goal) * 0.8 * gamma is again added to the total expected reward, right? So I used r(s) instead. I am confused since this is the reward when moving into the current state and the expected total reward starting in this state should not take this r(s) into account, right? You are right there is a problem here. The fundamental issue is that we must avoid double-counting. Since you have understood the problem, you can solve it (in more than one way) quite easily.
The rewards computed match exactly the values on page 8 of the slide except (1,3) and (1,4) which are slightly different (0.59 instead of 0.61, and 0.37 instead of 0.388). Is it possible that the values in the slides were computed in some other way? It's possible that something is unspecified in the slides and slightly different in your code. This discrepancy is unlikely to indicate any algorithm bug, so you don't need to track down its cause.
|
| Tobias
|
113
|
 |
|
12-05-2008 01:33 AM ET (US)
|
|
Edited by author 12-05-2008 01:40 AM
V(s) = r(s, pi(s)) + sum_s' p(s'/s,pi(s)) gamma V(s')
- When computing the expected total reward it can grow to almost twice the maximum (2). For example, when the proposed action is to move to the goal field (reward = 1) then r(s, pi(s)) = 1. But in the sum this reward is again taken into account since the goal field is included in s'. V(goal) = 1 and therefore V(goal) * 0.8 * gamma is again added to the total expected reward, right? So I used r(s) instead. I am confused since this is the reward when moving into the current state and the expected total reward starting in this state should not take this r(s) into account, right?
- The rewards computed match exactly the values on page 8 of the slide except (1,3) and (1,4) which are slightly different (0.59 instead of 0.61, and 0.37 instead of 0.388). Is it possible that the values in the slides were computed in some other way?
|
| Decision Tree Notes
|
112
|
 |
|
12-04-2008 05:55 PM ET (US)
|
|
|
| Charles Elkan
|
111
|
 |
|
11-30-2008 01:40 AM ET (US)
|
|
I'll be at a conference this coming week. The lectures on Tuesday and Thursday will be given by Dr. Keith Noto, http://www.cs.ucsd.edu/~knoto/Section will happen on Monday as usual. I'll try to answer questions here on the message board as usual also. Please ask questions about MDPs!
|
| Charles Elkan
|
110
|
 |
|
11-24-2008 07:12 PM ET (US)
|
|
|
| Charles Elkan
|
109
|
 |
|
11-21-2008 11:58 AM ET (US)
|
|
If you like CSE 151, the following courses are highly recommended as further learning.
Announcing COGS 118A and 118B (Natural Computation I and II)
Are you interested in graduate school in machine learning, computational neuroscience, or computational modeling? Interested in lucrative jobs using machine learning to solve practical problems? Consider taking one or both of COGS 118A and 118B (Natural Computation I and II). These courses give a rigorous background in machine learning theory and algorithms designed to provide the background we want our machine learning graduate students to have. A strong math background is required (Math 20E, 20F, 180A and a prior course in computer programming are prerequisites) but you may discuss your individual situation with the instructors. Please note that these courses can be taken in either order. You do not need to take 118A before 118B. 118A will be offered in Winter 2009 (Professor Angela Yu) and 118B in Spring 2009 (Professor Virginia de Sa).
118A. Natural Computation I (4) This course is an introduction to computational modeling of biological intelligence, focusing on neural networks and related approaches to SUPERVISED learning. Topics include estimation, filtering, optimization, neural networks, support vector machines, Gaussian Processes, Bayes nets. Prerequisites: Cognitive Science 109 or equivalent, Mathematics 20E, Mathematics 20F, and Mathematics 180A or consent of instructor.
118B. Natural Computation II (4) This course is an introduction to computational modeling of biological intelligence, focusing on neural networks and related approaches to UNSUPERVISED learning. Topics include density estimation, clustering, self-organizing maps, principal component analysis, kernel methods, and information theoretic models. Prerequisites: Cognitive Science 109 or equivalent, Mathematics 20E, Mathematics 20F, and Mathematics 180A or consent of instructor.
----------------------------------------------------------- ------ Virginia de Sa desa@ucsd.edu Department of Cognitive Science ph: 858-822-5095 9500 Gilman Dr. 858-822-2402 La Jolla, CA 92093-0515 fax: 858-534-1128 -------------------------------------------------------------- ---
|
| Charles Elkan
|
108
|
 |
|
11-18-2008 09:58 AM ET (US)
|
|
REPRESENTATION, SEARCH, AND THE WEB Professor Richard K. Belew > CogSci 188 - Winter, 2009 > > Tuesday, Thursday 9:30-10:50a > > Warren Lecture Hall 2113 > > SectID#?? - (4 units) > > http://abbey.ucsd.edu:8080/cogs188> > Recent estimates suggest that our species is producing five exabytes (5 billion gigabytes) of "content," each year! But the more content we produce, the harder it often becomes to find anything, let along anything "relevant." This course will present a survey of computational techniques designed to represent content, search through it, and use the WWW to connect the new producers of content with their new audiences. The central focus will be on probabilistic techniques for inferring textual documents' "meaning'' from word occurrence statistics. Graph analysis techniques applied to bibliographic citations and the Web, Web crawling, and Web2.0 techniques will also be discussed. > > Students will analyze these algorithms mathematically and experiment with their implementation. Students with and without programming backgrounds will be accomodated. Both graduate and undergraduate students are welcome.
|
| Charles Elkan
|
107
|
 |
|
11-18-2008 01:10 AM ET (US)
|
|
/m106: Likelihoods are positive numbers very close to zero. They should be increasing with each iteration of EM.
|
| Meir Schwarz
|
106
|
 |
|
11-18-2008 12:49 AM ET (US)
|
|
If I'm seeing positive numbers that are progressively getting smaller am I computing likelihood instead of log likelihood?
|
| Charles Elkan
|
105
|
 |
|
11-18-2008 12:46 AM ET (US)
|
|
/m102: Yes, there is a different likelihood for each document. Because this is based on a pmf (not pdf) it is a number between 0 and 1 that is very close to zero in practice. The sum of all the log likelihoods is not one. It is a large negative number that should be increasing, i.e. getting less negative. For how to check if you have maximized the sum of log likelihoods, see /m104.
|