| Who | When |
Messages | |
(not accepting new messages)
|
|
|
|
153
|
 |
|
07-15-2009 07:55 PM ET (US)
|
|
Deleted by topic administrator 07-16-2009 12:03 PM
|
| Charles Elkan
|
152
|
 |
|
12-10-2008 01:15 PM ET (US)
|
|
Final exam will start at 8am tomorrow (Thursday)
Some people prefer the full three hours, which is quite reasonable and the official UCSD policy. So we will follow this norm and go from 8am to 11am.
|
| Charles Elkan
|
151
|
 |
|
12-10-2008 01:10 PM ET (US)
|
|
So V(1) is just a metric that tells us how well a particular run did (the higher the better)? Yes, where "run" means "learning process."
|
| Meir Schwarz
|
150
|
 |
|
12-09-2008 11:25 PM ET (US)
|
|
Edited by author 12-10-2008 02:08 AM
/m149: OK, That makes more sense. So V(1) is just a metric that tells us how well a particular run did (the higher the better)?
|
| Charles Elkan
|
149
|
 |
|
12-09-2008 11:07 PM ET (US)
|
|
/m147, /m148: Evaluating the policy cannot be part of the agent's learning process. It is simply a way for you the programmer to measure the success of the learning process you design. Since the agent could never evaluate a policy with policy iteration (PI), the learning process cannot decide to stop based on this. However, after the agent stops learning using a heuristic, then you the programmer can run PI. Your goal is to invent a heuristic that the agent can use to stop as quickly as possible with a policy that is good as possible.
|
| Meir Schwarz
|
148
|
 |
|
12-09-2008 10:28 PM ET (US)
|
|
So out of curiosity I coded in running the Policy Iteration every 50 times and stopping if V(1)>0.7. When it works it ends up with the optimal policy after 50 times. Most of the time it causes an infinite loop and sometimes it even prints "matrix is singular to working precision" repeatedly.
|
| Meir Schwarz
|
147
|
 |
|
12-09-2008 09:54 PM ET (US)
|
|
What exactly are we looking for in the V values for evaluating the policy. We get the optimal policy more than 9/10 times with out looking at the Vs. Do we need to change something?
|
| Charles Elkan
|
146
|
 |
|
12-09-2008 06:20 PM ET (US)
|
|
You could use the definition V(start) = Q(start,a) where a is the action recommended by the final learned policy.
However, the final Q values may not be perfectly accurate, so it is better to do what the project description says: use policy evaluation to measure the goodness of the final learned policy.
|
| Mike Rose
|
145
|
 |
|
12-09-2008 05:25 PM ET (US)
|
|
How can we find V(start) from Q(start)? Is there a good way to estimate this? It is supposed to be one of the metrics to report.
|
| Charles Elkan
|
144
|
 |
|
12-09-2008 03:56 PM ET (US)
|
|
/m143: The final covers all topics from the first nine weeks of the quarter, and from the assignments. The material in the last week's lectures by Dr. Noto is not included. The exam will be similar to the midterm in format, but maybe 50% longer. The instructions will be the same: open book, bring calculator, etc. There will be multipart regular questions (more than half of all points) and also true/false questions (less than half).
|
| Meir Schwarz
|
143
|
 |
|
12-09-2008 02:08 PM ET (US)
|
|
Can you give us a breakdown of what the final is going to look like (e.g. 1 question on each of these topics, 10 true/falses over these...)? Any direction to help focus what we should study the most would be appreciated.
|
| Charles Elkan
|
142
|
 |
|
12-08-2008 11:18 PM ET (US)
|
|
/m137, /m140: I don't know any convention about when to stop Q-learning. If you have interesting findings, discuss them in your report. (We never penalize anyone for bad results, just occasionally for bad design decisions or bad explanations.)
|
| Charles Elkan
|
141
|
 |
|
12-08-2008 11:16 PM ET (US)
|
|
/m138, /m139: I apologize for the lack of communication. The TA told me earlier that no one came to the section. I'll investigate.
|
| Meir Schwarz
|
140
|
 |
|
12-08-2008 10:06 PM ET (US)
|
|
/m137: This doesn't seem to work very consistently. Is there any sort of convention on when to stop?
|
| matt
|
139
|
 |
|
12-08-2008 04:30 PM ET (US)
|
|
yea, i figured since ta didnt respond to email there wasnt a section.
as for the final, is it possible for us to start at 8am if we want? I think I need every minute I can get to work on the test even if its made shorter.
|
| Meir Schwarz
|
138
|
 |
|
12-08-2008 04:03 PM ET (US)
|
|
I showed up to discussion section at 11:05 but there was no one in the room.
|
| Charles Elkan
|
137
|
 |
|
12-08-2008 12:09 PM ET (US)
|
|
What if we stop when the policy doesn't change and the change in Q is smaller than some tunable variable which we find by experimenting?
Sounds reasonable. In your report, explain your reasoning and experimentation that justify your procedure.
|
| Charles Elkan
|
136
|
 |
|
12-08-2008 12:08 PM ET (US)
|
|
Final exam Thursday this week
The final exam is scheduled from 8am to 11am on Thursday this week. If there is a consensus, I would be happy to make it shorter and start later, say starting at 8:45am. But there would have to be a consensus. Any opinions?
|
| Meir Schwarz
|
135
|
 |
|
12-08-2008 02:02 AM ET (US)
|
|
/m133: What if we stop when the policy doesn't change and the change in Q is smaller than some tunable variable which we find by experimenting?
|
| Charles Elkan
|
134
|
 |
|
12-08-2008 01:36 AM ET (US)
|
|
/m130: Remember, a policy doesn't choose the next state. Instead, it chooses an action and then the next state is random, but influenced by the action. Slide 8 says "maximize the expected utility of the immediate successors." So, the optimal action is not necessarily in the direction of the highest-value state. Different gamma values certainly lead to different value functions and different optimal policies. Slide 6 shows different policies caused by different penalties. You could draw a similar slide based on different discount factors.
|
| Charles Elkan
|
133
|
 |
|
12-08-2008 01:24 AM ET (US)
|
|
/m131, /m126: I mean, how can you be sure that continuing Q-learning for more trials will not lead to any further change in the policy. A trial is one "lifetime" for the agent, from the initial state until one of the two terminal states.
|
| Charles Elkan
|
132
|
 |
|
12-08-2008 01:21 AM ET (US)
|
|
/m129, /m127: Tobias, thank your for the correct answer.
|
| Meir Schwarz
|
131
|
 |
|
12-08-2008 12:35 AM ET (US)
|
|
But you need to ask the question, how can you tell for sure that pi will never change any more? Do you mean like make sure PI doesn't change for an entire game?
|
| matt
|
130
|
 |
|
12-07-2008 11:42 PM ET (US)
|
|
Edited by author 12-08-2008 12:31 AM
new question:
ok, on slide 8, there is something wrong. The picture of the policy does not match the values given, if it is maximizing the utility then the policy from (1,3). The utility for going left is .655, the utility for going up is .66 and utility for going right is .38. It is supposed to choose the max utility, but the picture, has the policy going left, but the max utility would be going north, so the picture of the given optimal policy is wrong or the actual values on slide 8 for position (1,2) and (2,3) are swapped. Please clarify.
Also, are we assuming that gamma is 1, because gamma does affect the values you get when running policy iteration?
|
| Tobias
|
129
|
 |
|
12-07-2008 11:16 PM ET (US)
|
|
@matt: It is not possible to go left or down. It is just there for completeness which can make computations more convenient since you need no exceptions for the goal/trap cells. So for (1,1): If you go left then the robot would bump into the wall. If it would slide to the left (and therefore go down) then it would bump into the wall too. Therefore in these two cases the robot would remain in its position - thats the 0.9U(1,1). The slide to the right side would result in going upwards, therefore 0.1U(1,2). The same applies for the 'down' part of the equation.
|
| Mike Rose
|
128
|
 |
|
12-07-2008 09:46 PM ET (US)
|
|
Deleted by author 12-07-2008 10:59 PM
|
| matt
|
127
|
 |
|
12-07-2008 09:40 PM ET (US)
|
|
ok, maybe we are misunderstanding the algorithm. So, on slide 10 of the notes there is this algorithm U(s) = R(s) + γ max a s′U(s′)T(s, a, s′)(see slides for proper format)
with example U(1, 1) = −0.04 + γ max{0.8U(1, 2) + 0.1U(2, 1) + 0.1U(1, 1), up 0.9U(1, 1) + 0.1U(1, 2) left 0.9U(1, 1) + 0.1U(2, 1) down 0.8U(2, 1) + 0.1U(1, 2) + 0.1U(1, 1)} right
my question is, if you are at spot (1,1) how is it possible to go all 4 directions, as far i know there is no wrap around, so why are all directions possible?
|
| Charles Elkan
|
126
|
 |
|
12-07-2008 09:39 PM ET (US)
|
|
Are we allowed to change epsilon or the constant in softmax as learning progresses? Sure. It would be interesting to do experiments to find the best schedule for these values, similar to the schedule for T in deterministic annealing.
When do we stop iterating for Q learning? Is is when PI stops changing like policy iteration (doesn't seem right to me) or is it when Q stops changing? Since the real goal is always to learn an optimal policy, stopping when the policy pi stops changing seems sensible. But you need to ask the question, how can you tell for sure that pi will never change any more?
|
| Meir Schwarz
|
125
|
 |
|
12-07-2008 09:16 PM ET (US)
|
|
Another question. When do we stop iterating for Q learning? Is is when PI stops changing like policy iteration (doesn't seem right to me) or is it when Q stops changing?
|
| Meir Schwarz
|
124
|
 |
|
12-07-2008 08:50 PM ET (US)
|
|
Are we allowed to change epsilon or the constant in softmax as learning progresses?
|
| Charles Elkan
|
123
|
 |
|
12-07-2008 06:46 PM ET (US)
|
|
How is it possible for our algorithm to work to find the optimal policy, but not be able to find the alternative optimal policies (it basically finds the opposite of those policies when values in range are entered)?It does sound like you have one (or more!) bugs. For debugging suggestions see /m122.
|
| Charles Elkan
|
122
|
 |
|
12-07-2008 06:45 PM ET (US)
|
|
Our Policy Iteration algorithm produces the correct policy but the values are seem to be off by about .05. Is this acceptable or does it hint that we have a slight error in our algorithm? If not, any idea where we should start debugging?
Many different value functions can lead to the same policy, so this is a hint that there is an error somewhere. The error may be in your code, but it may also be in your understanding of the gridworld domain. For example, the exact definition of when an action has an unintended effect (e.g. moving left on an "up" action) is not clear.
For debugging, it may be easiest to start with the goal state (upper right) and figure out whether its learned value is correct. Then do the same for states next to the goal state, then one away, and so on.
|
| Charles Elkan
|
121
|
 |
|
12-07-2008 06:40 PM ET (US)
|
|
Will there be a review session for the final? In the section tomorrow (Monday) you can ask any and all questions relevant to the final. Or you can ask questions here.
If anyone feels an additional in-person review session would be useful, please email me personally.
Remember, a review session is only useful if you come prepared with specific questions.
|
| Charles Elkan
|
120
|
 |
|
12-07-2008 06:38 PM ET (US)
|
|
What is a good way to prevent the soft-max policy from picking illegal actions? You can exclude these from the softmax calculation, i.e. don't let them have a numerator. Or you can make them have negative infinity Q-value.
|
| Meir Schwarz
|
119
|
 |
|
12-07-2008 05:42 PM ET (US)
|
|
Will there be a review session for the final?
|
| Mike Rose
|
118
|
 |
|
12-07-2008 03:11 PM ET (US)
|
|
Our Policy Iteration algorithm produces the correct policy but the values are seem to be off by about .05. Is this acceptable or does it hint that we have a slight error in our algorithm? If not, any idea where we should start debugging?
|
| Mike Rose
|
117
|
 |
|
12-07-2008 03:04 PM ET (US)
|
|
What is a good way to prevent the soft-max policy from picking illegal actions?
|
| matt
|
116
|
 |
|
12-07-2008 02:57 AM ET (US)
|
|
how is it possible for our algorithm to work to find the optimal policy, but not be able to find the alternative optimal policies(it basically finds the opposite of those policies when values in range are entered)?
|
| Chris
|
115
|
 |
|
12-06-2008 06:21 PM ET (US)
|
|
Deleted by author 12-06-2008 06:21 PM
|
| Charles Elkan
|
114
|
 |
|
12-05-2008 07:02 AM ET (US)
|
|
Edited by author 12-05-2008 07:04 AM
When computing the expected total reward it can grow to almost twice the maximum (2). For example, when the proposed action is to move to the goal field (reward = 1) then r(s, pi(s)) = 1. But in the sum this reward is again taken into account since the goal field is included in s'. V(goal) = 1 and therefore V(goal) * 0.8 * gamma is again added to the total expected reward, right? So I used r(s) instead. I am confused since this is the reward when moving into the current state and the expected total reward starting in this state should not take this r(s) into account, right? You are right there is a problem here. The fundamental issue is that we must avoid double-counting. Since you have understood the problem, you can solve it (in more than one way) quite easily.
The rewards computed match exactly the values on page 8 of the slide except (1,3) and (1,4) which are slightly different (0.59 instead of 0.61, and 0.37 instead of 0.388). Is it possible that the values in the slides were computed in some other way? It's possible that something is unspecified in the slides and slightly different in your code. This discrepancy is unlikely to indicate any algorithm bug, so you don't need to track down its cause.
|
| Tobias
|
113
|
 |
|
12-05-2008 01:33 AM ET (US)
|
|
Edited by author 12-05-2008 01:40 AM
V(s) = r(s, pi(s)) + sum_s' p(s'/s,pi(s)) gamma V(s')
- When computing the expected total reward it can grow to almost twice the maximum (2). For example, when the proposed action is to move to the goal field (reward = 1) then r(s, pi(s)) = 1. But in the sum this reward is again taken into account since the goal field is included in s'. V(goal) = 1 and therefore V(goal) * 0.8 * gamma is again added to the total expected reward, right? So I used r(s) instead. I am confused since this is the reward when moving into the current state and the expected total reward starting in this state should not take this r(s) into account, right?
- The rewards computed match exactly the values on page 8 of the slide except (1,3) and (1,4) which are slightly different (0.59 instead of 0.61, and 0.37 instead of 0.388). Is it possible that the values in the slides were computed in some other way?
|
| Decision Tree Notes
|
112
|
 |
|
12-04-2008 05:55 PM ET (US)
|
|
|
| Charles Elkan
|
111
|
 |
|
11-30-2008 01:40 AM ET (US)
|
|
I'll be at a conference this coming week. The lectures on Tuesday and Thursday will be given by Dr. Keith Noto, http://www.cs.ucsd.edu/~knoto/Section will happen on Monday as usual. I'll try to answer questions here on the message board as usual also. Please ask questions about MDPs!
|
| Charles Elkan
|
110
|
 |
|
11-24-2008 07:12 PM ET (US)
|
|
|
| Charles Elkan
|
109
|
 |
|
11-21-2008 11:58 AM ET (US)
|
|
If you like CSE 151, the following courses are highly recommended as further learning.
Announcing COGS 118A and 118B (Natural Computation I and II)
Are you interested in graduate school in machine learning, computational neuroscience, or computational modeling? Interested in lucrative jobs using machine learning to solve practical problems? Consider taking one or both of COGS 118A and 118B (Natural Computation I and II). These courses give a rigorous background in machine learning theory and algorithms designed to provide the background we want our machine learning graduate students to have. A strong math background is required (Math 20E, 20F, 180A and a prior course in computer programming are prerequisites) but you may discuss your individual situation with the instructors. Please note that these courses can be taken in either order. You do not need to take 118A before 118B. 118A will be offered in Winter 2009 (Professor Angela Yu) and 118B in Spring 2009 (Professor Virginia de Sa).
118A. Natural Computation I (4) This course is an introduction to computational modeling of biological intelligence, focusing on neural networks and related approaches to SUPERVISED learning. Topics include estimation, filtering, optimization, neural networks, support vector machines, Gaussian Processes, Bayes nets. Prerequisites: Cognitive Science 109 or equivalent, Mathematics 20E, Mathematics 20F, and Mathematics 180A or consent of instructor.
118B. Natural Computation II (4) This course is an introduction to computational modeling of biological intelligence, focusing on neural networks and related approaches to UNSUPERVISED learning. Topics include density estimation, clustering, self-organizing maps, principal component analysis, kernel methods, and information theoretic models. Prerequisites: Cognitive Science 109 or equivalent, Mathematics 20E, Mathematics 20F, and Mathematics 180A or consent of instructor.
----------------------------------------------------------- ------ Virginia de Sa desa@ucsd.edu Department of Cognitive Science ph: 858-822-5095 9500 Gilman Dr. 858-822-2402 La Jolla, CA 92093-0515 fax: 858-534-1128 -------------------------------------------------------------- ---
|
| Charles Elkan
|
108
|
 |
|
11-18-2008 09:58 AM ET (US)
|
|
REPRESENTATION, SEARCH, AND THE WEB Professor Richard K. Belew > CogSci 188 - Winter, 2009 > > Tuesday, Thursday 9:30-10:50a > > Warren Lecture Hall 2113 > > SectID#?? - (4 units) > > http://abbey.ucsd.edu:8080/cogs188> > Recent estimates suggest that our species is producing five exabytes (5 billion gigabytes) of "content," each year! But the more content we produce, the harder it often becomes to find anything, let along anything "relevant." This course will present a survey of computational techniques designed to represent content, search through it, and use the WWW to connect the new producers of content with their new audiences. The central focus will be on probabilistic techniques for inferring textual documents' "meaning'' from word occurrence statistics. Graph analysis techniques applied to bibliographic citations and the Web, Web crawling, and Web2.0 techniques will also be discussed. > > Students will analyze these algorithms mathematically and experiment with their implementation. Students with and without programming backgrounds will be accomodated. Both graduate and undergraduate students are welcome.
|
| Charles Elkan
|
107
|
 |
|
11-18-2008 01:10 AM ET (US)
|
|
/m106: Likelihoods are positive numbers very close to zero. They should be increasing with each iteration of EM.
|
| Meir Schwarz
|
106
|
 |
|
11-18-2008 12:49 AM ET (US)
|
|
If I'm seeing positive numbers that are progressively getting smaller am I computing likelihood instead of log likelihood?
|
| Charles Elkan
|
105
|
 |
|
11-18-2008 12:46 AM ET (US)
|
|
/m102: Yes, there is a different likelihood for each document. Because this is based on a pmf (not pdf) it is a number between 0 and 1 that is very close to zero in practice. The sum of all the log likelihoods is not one. It is a large negative number that should be increasing, i.e. getting less negative. For how to check if you have maximized the sum of log likelihoods, see /m104.
|
| Charles Elkan
|
104
|
 |
|
11-18-2008 12:43 AM ET (US)
|
|
/m100: No, you don't need to compute derivatives. Just compute the sum of log likelihoods after each E step and check that this sum is always increasing, but more and more slowly.
|
| Charles Elkan
|
103
|
 |
|
11-18-2008 12:41 AM ET (US)
|
|
/m99: Present your results without annealing. However, annealing should just add one line of code. Note you can raise to a power 1/t by dividing the logarithm by t. You can still use the same constant c idea to avoid overflow.
|
| Meir Schwarz
|
102
|
 |
|
11-18-2008 12:36 AM ET (US)
|
|
Just to make sure I have this right there is a different log likelihood for each document and it is just 1 number between 0 and 1. The sum of all the log likelihoods is 1.
How can we check if we have maximized the log likelihood efficiently?
|
| Albert Park
|
101
|
 |
|
11-17-2008 08:02 PM ET (US)
|
|
My apologies to those of you who went to section and waited today. Unfortunately there was a problem with my flight back to the US from Korea, and I couldn't make it to section. I will contact Charles about a make up section if one is wanted.
|
| Meir Schwarz
|
100
|
 |
|
11-17-2008 06:49 PM ET (US)
|
|
To determine if we correctly maximize the log likelihood do we have to take the derivative at each theta and make sure it is 0 or is there some easier way?
|
| Peter Faymonville
|
99
|
 |
|
11-17-2008 06:48 PM ET (US)
|
|
What shall we do if we have a bug in our implementation and cannot find it? Before we use deterministic annealing, we get senseful results in the range of 40 to 80 percent accuracy, but after we put in the code for annealing we seem to run in a bunch of underflows for the document weights. After spending half-dozen hours to find the error, how should we proceed?
|
| Charles Elkan
|
98
|
 |
|
11-17-2008 06:20 PM ET (US)
|
|
/m97: There is no useful theory about how many iterations until convergence that I know of. Call this number J and let it be a parameter of your big-O analysis.
|
| Meir Schwarz
|
97
|
 |
|
11-17-2008 06:17 PM ET (US)
|
|
How are we supposed to determine the big-O time complexity for our algorithm? Is there a theoretical maximum to the number of times it has to loop before convergence?
|
| Charles Elkan
|
96
|
 |
|
11-17-2008 04:42 PM ET (US)
|
|
/m94: I suggest computing one multinomial for the whole dataset. Then, initialize the multinomial for each component to be a small random fluctuation around this central multinomial. Another choice is to initialize the w_ik randomly instead, and to start the EM algorithm with an M step.
|
| Charles Elkan
|
95
|
 |
|
11-17-2008 04:40 PM ET (US)
|
|
We will return the midterms in class on Tuesday November 18. The mean score was 35, with standard deviation 9.
|
| matt
|
94
|
 |
|
11-17-2008 01:48 PM ET (US)
|
|
Edited by author 11-17-2008 02:23 PM
what do we use as our initialization for theta k. Do we evenly distribute to start?
|
| Charles Elkan
|
93
|
 |
|
11-16-2008 11:45 PM ET (US)
|
|
/m91: Log likelihood is simply the sum over all data points x_i of p(x_i). You already compute this for the denominator of the w_ik expressions in the E step.
|
| Charles Elkan
|
92
|
 |
|
11-16-2008 11:44 PM ET (US)
|
|
/m90: At first sight your code looks correct. The computation r ./ (p'*q) can give 0/0 for some i, j, which will yield NaN. When computing MI, you can define that 0/0 equals 0 instead of NaN.
|
| Meir Schwarz
|
91
|
 |
|
11-16-2008 11:30 PM ET (US)
|
|
/m88: How do we calculate achieved likelihood?
|
| Meir Schwarz
|
90
|
 |
|
11-16-2008 06:26 PM ET (US)
|
|
I am sometimes getting NAN for my mutual information metric. Am I calculating it correctly:
m = zeros(1,clusterCount); n = zeros(1,clusterCount); c = zeros(clusterCount); D = size(bagsOfWords,1); for i = 1:clusterCount m(i) = sum(labels==i); n(i) = sum(trueLabels==i); for j = 1:clusterCount c(i,j) = sum(((labels==i)+(trueLabels==j))==2); end end p = m/D; q = n/D; r = c/D; disp('Mutual Information Metric:') disp( sum(sum(r .* log(r ./ (p'*q)))) )
|
| Meir Schwarz
|
89
|
 |
|
11-16-2008 06:00 PM ET (US)
|
|
Deleted by author 11-16-2008 06:19 PM
|
| Charles Elkan
|
88
|
 |
|
11-16-2008 01:38 PM ET (US)
|
|
/m87: Your results are plausible. You don't necessarily have a bug, or bad initialization. Now you can try different schedules for t, and different initialization ideas, to see what works best. A very important question: Can you compute something about your results (e.g. the achieved likelihood) that is predictive of accuracy? In a real application, you want to maximize accuracy (or MI) but you don't know the true classes. So can you find something that you could compute in a real application, that is positively correlated with accuracy? Also important: Verify that using deterministic annealing really is better than not using it. Design a careful experiment to show this.
|
| Meir Schwarz
|
87
|
 |
|
11-16-2008 05:41 AM ET (US)
|
|
Edited by author 11-16-2008 06:31 AM
I'm getting an accuracy of 84.75% after running to convergence with t=10 and then t=1 (where convergence is defined as a change of < .000001 in the theta values). Should I add more steps to the changes in t to improve accuracy? Thanks
EDIT: Actually the results can vary alot. I got as low as 40% with the above technique. I switched to the t=128 then t = t/2 approach and I still don't get results consistently in the 90s (though it got as high as 96). Does this imply a bug or poor initialization?
|
| Charles Elkan
|
86
|
 |
|
11-16-2008 02:27 AM ET (US)
|
|
/m85: "Cluster" and "component" are two different names for the same thing. A pmf is a "probability mass function" i.e. a discrete probability distribution. The pmf you should use for this project is the multinomial. alpha has dimensionality K where K is the number of clusters. Each theta vector has dimensionality D where D is the size of the vocabulary. There are K different theta vectors, one for each cluster. Each theta vector defines one multinomial distribution.
|
| Oscar Ngai
|
85
|
 |
|
11-16-2008 02:07 AM ET (US)
|
|
Edited by author 11-16-2008 02:15 AM
Was a pmf given in class? Is it something we're supposed to derive somehow? I'm stuck on the E-step. What are the dimensions of the weights? just making sure, Theta is (number_clusters) by (number_components) alpha is (number_clusters)
is this correct?
|
| Charles Elkan
|
84
|
 |
|
11-16-2008 12:29 AM ET (US)
|
|
/m81, /m83: It should work to make the c value be the max of the v_i values. This will make the biggest exponential be 1.0. If some v_i is much smaller than the max v_i, then the exponential of that v_i will underflow to zero, which is ok.
|
| Meir Schwarz
|
83
|
 |
|
11-15-2008 11:33 PM ET (US)
|
|
/m81: Do you have any suggestions for the c value. I tried making it the max of the 3 MNs for each document but somehow that ended up making all of the documents have the same MNs (I don't understand how this happens). Thanks
|
| Oscar Ngai
|
82
|
 |
|
11-15-2008 10:19 PM ET (US)
|
|
Deleted by author 11-15-2008 10:24 PM
|
| Charles Elkan
|
81
|
 |
|
11-15-2008 11:24 AM ET (US)
|
|
/m80: I think your math is correct. The problem is the "exp" calculation. Try to do all later calculations using logarithms also, without ever doing an explicit "exponential" calculation. To calculate the weights w_ik you do need to calculate exponentials. Use this fact: exp(v_i) -------------- sum_i exp(v_i) is equal to exp(v_i - c) ------------------ sum_i exp(v_i - c) for any number c. Choose a value for c to avoid overflow and underflow.
|
| Meir Schwarz
|
80
|
 |
|
11-15-2008 05:43 AM ET (US)
|
|
I'm trying to calculate the pmf as: exp( gammaln(size(classic400,1) + 1) - sum( gammaln(full(classic400)+1), 2 ))
but it overflows to infinity. Is my math incorrect or is there a trick I should use?
|
| Charles Elkan
|
79
|
 |
|
11-15-2008 01:20 AM ET (US)
|
|
/m77: I don't think you would encounter any major difficulties using a different language. You would have to write additional code to input the data and to output results in ways that let you check that they are reasonable. With Matlab you have a large library of functions that you can call interactively to display results in interpretable ways.
|
| Charles Elkan
|
78
|
 |
|
11-15-2008 01:17 AM ET (US)
|
|
/m76: The notation I used (mainly) was alpha_k and theta_d. Alpha_k is the proportion of each group, k=1 to k=K where K = 3 for example. theta_d is the probability of each word from the dictionary, d=1 to d=D where D = 6000 for example. theta_1 to theta_D are the parameters of one multinomial. If you have a mixture, then each component is a multinomial. So you can use the notation theta_kd to mean the dth parameter of the kth multinomial.
|
| Mike Rose
|
77
|
 |
|
11-14-2008 06:37 PM ET (US)
|
|
If we were considering using a language besides MATlab for assignment 4, what extra difficulties would you foresee that we would have to overcome?
|
| Chris
|
76
|
 |
|
11-14-2008 04:39 PM ET (US)
|
|
For the EM assignment, what is the difference between theta_k and theta_d where k is the number of clusters and d is the number of dimensions. You've mentioned both and I'm wondering if they are referring to different values. I was also thinking there might be theta_k_d where you could look at theta values for either k or d or both.
|
| Charles Elkan
|
75
|
 |
|
11-10-2008 10:26 PM ET (US)
|
|
/m74: You are correct. There are 400 documents with a vocabulary size of 6205.
|
| Chris
|
74
|
 |
|
11-10-2008 04:25 PM ET (US)
|
|
Edited by author 11-10-2008 04:31 PM
Is there an official description (.names file) for the Classic400 dataset?
Here is my interpretation of the data:
truelabels: each column is the true human-defined group for each document matching the column number
classic400: (document#, word#) word#count
|
| Charles Elkan
|
73
|
 |
|
11-10-2008 02:50 PM ET (US)
|
|
Sample solution available for the midtermSee here: http://www.cs.ucsd.edu/users/elkan/151/midtermsoln.pdfPlease study this, and ask questions here about any part of the sample solution that you don't understand. We'll hand back the graded midterms on Tuesday next week, November 18.
|
| Charles Elkan
|
72
|
 |
|
11-04-2008 11:36 AM ET (US)
|
|
The Center for Human Development Presents
Anna Dornhaus University of Arizona
Department of Ecology and Evolutionary Biology
Friday, November 7th 12-1pm (discussion 1-1:40pm) room 003 in the Cognitive Science Building
?Collective and Individual Problem-Solving in Insects: Algorithmic Parallels to Human Brains and Societies"
Insects are regarded by many non-scientists as reflex automata, with hardwired and inflexible behaviour. However, they may serve as model systems for the study of cognition at two different levels: as individuals and collectively. First, research on individual insect behavior has shown clearly that arthropods can display great flexibility in their behavior. This includes not only simple conditioning to stimuli associated with food, but also more complex learning, attention, planning, the use of cognitive maps, of tools, speed-accuracy trade-offs in decision-making, and abilities such as counting. Second, the integration of colonies of social insects is such that the collective problem-solving algorithms can be compared to those of networks of neurons. For example, current models for decision-making in the human brain are essentially identical to those of collective decision-making in ant colonies. A particular strength of using insects as a model system is that the evolution of cognitive abilities, as well as the mechanisms that create them, can be investigated. For example, we can measure costs and benefits of particular mechanisms of problem-solving; this is necessary to understand why and how cognitive abilities evolved in particular species, but also why brain architecture may have evolved in the way it has.
Everyone is welcome.
|
| Charles Elkan
|
71
|
 |
|
11-02-2008 04:47 PM ET (US)
|
|
/m70: Sejnowski's talk is actually in room 1202 in the CSE building.
|
| Charles Elkan
|
70
|
 |
|
11-02-2008 03:56 PM ET (US)
|
|
Google Brain
Terrence J. Sejnowski
Monday (11/3) 11am in EBU3B 2112.
Howard Hughes Medical Institute Salk Institute for Biological Studies
University of California, San Diego
The brain is not just a computing device: It is also a powerful communication network, with the total bandwidth of signaling between neurons greater than that of the entire World Wide Web. How is all the traffic between brain areas regulated? How does the brain store and retrieve information? The answers to these questions are being sought in the temporal coherence of brain signals on a global scale. Curiously, brain states with the highest coherence are found during sleep.
Terrence Sejnowski in Investigator with the Howard Hughes Medical Institute, and a Professor of Biology and Neurosciences at the University of California, San Diego, where he is Director of the Institute for Neural Computation and the co-director of the NSF Temporal Dynamics of Learning Center. He is an adjunct professor in the Departments of Computer Science and Engineering, Psychology, Neurosciences and Cognitive Science. He is also the Francis Crick Professor at The Salk Institute for Biological Studies where he directs the Computational Neurobiology Laboratory and is the director of the Crick-Jacobs Center for Theoretical and Computational Biology.
Dr. Sejnowski uses computational models to understand the principles that link brain to behavior. He has published over 300 scientific papers and 12 books, including The Computational Brain, with Patricia Churchland. He received the Wright Prize for Interdisciplinary research in 1996, the Hebb Prize from the International Neural Network Society in 1999, and the IEEE Neural Network Pioneer Award in 2002. His was elected an IEEE Fellow in 2000, an AAAS Fellow in 2006 and to the Institute of Medicine of the National Academies in 2008.
|
| Charles Elkan
|
69
|
 |
|
11-02-2008 03:55 PM ET (US)
|
|
AI Seminar Lecture Series, Fall Quarter 2008 Shaplets, Motifs and Discords: A Set of Primitives for Mining Massive Time Series and Image ArchivesAssociate Professor Eamonn Keogh UC Riverside Host: Charles Elkan When: Monday, November 3, 2008 Where: Conference Room 1202, CSE Building Time: 2:00 PM to 3:00 PM Abstract: The past decade has seen tremendous interest in mining of time series and shape datasets, as such data can be found in domains as diverse as entertainment, finance, medicine and astronomy. However, much of this work has focused on toy problems, with a few thousand objects. In recent years, our research group has made an effort to address the problems of classification, clustering, query-by-content, motif discovery, and outlier detection on truly massive datasets, with 100 million+ objects. In this talk we will summarize our research findings over the last two years, and show that a small set of primitives, shaplets, motifs and discords, allow us to solve essentially all problems in shape/time series data mining with efficient, effective and interpretable results. We will demonstrate the utility of our ideas, with case studies in anthropology, astronomy, entomology, historical manuscript annotation and medicine. For more information on the 2008 Fall Quarter AI Seminar Lecture Series, see http://www.cs.ucsd.edu/users/elkan/259/
|
| Charles Elkan
|
68
|
 |
|
10-30-2008 12:35 AM ET (US)
|
|
|
| Charles Elkan
|
67
|
 |
|
10-30-2008 12:33 AM ET (US)
|
|
/m65: Duplicate examples should not be ignored and not be removed. It is possible for training examples with identical feature values to occur. They can have the same or different labels.
|
| Meir Schwarz
|
66
|
 |
|
10-30-2008 12:26 AM ET (US)
|
|
Is there any past accuracy data provided for the chess data set?
|
| Meir Schwarz
|
65
|
 |
|
10-29-2008 02:42 PM ET (US)
|
|
There are duplicate examples in the house votes data. Are we supposed to ignore them or remove them?
|
| Mike Rose
|
64
|
 |
|
10-29-2008 02:38 PM ET (US)
|
|
I was unsure of what was meant by:
How does the accuracy you achieve compare to what is reported in the dataset descriptions?
in the Assignment 3 description. Does it mean compared to the baserate? Or is there somewhere that states accuracy of previous learning attempts?
|
| Charles Elkan
|
63
|
 |
|
10-28-2008 05:41 PM ET (US)
|
|
The grading outline for Assignment 3 will be analogous to the one for the first project. I'm not sure if we'll be able to provide a specific outline in advance. We don't intend to require anything general that is different: lessons learned from Assignment 1 all apply to later assignments.
|
| Dan McKee
|
62
|
 |
|
10-28-2008 03:25 PM ET (US)
|
|
What is the exact grading outline for assignment 3?
Thanks
|
| Charles Elkan
|
61
|
 |
|
10-28-2008 02:24 PM ET (US)
|
|
/m56: The accuracy results you get are quite believable. Are you sure that 100% is measured with cross-validation? It is much easier to get 100% on training data than on test data!
|
| Charles Elkan
|
60
|
 |
|
10-28-2008 02:20 PM ET (US)
|
|
/m55: Both your suggested strategies are sensible. You can pick either one.
|
| Charles Elkan
|
59
|
 |
|
10-28-2008 02:19 PM ET (US)
|
|
/m51: The sentence "A modern observation is that the number of updates until convergence does not depend on the dimensionality of the data" means "the theoretical maximum number of updates" so it is not directly relevant to the T/F statement which refers just to "one epoch". The answer is that training a perceptron classifier, for one epoch, will be about equally fast for D and D'. The reason is that the total cost of all dot-products required will be essentially the same, i.e. O(nd). Doing updates for misclassified examples does not change this big-O complexity.
|
| Charles Elkan
|
58
|
 |
|
10-28-2008 02:15 PM ET (US)
|
|
/m52: Adding pseudocounts correctly in your naive Bayes code is important. You cannot assume in real datasets that all counts are nonzero!
|
| Charles Elkan
|
57
|
 |
|
10-28-2008 02:13 PM ET (US)
|
|
|
| Meir Schwarz
|
56
|
 |
|
10-28-2008 12:09 AM ET (US)
|
|
Edited by author 10-28-2008 02:25 AM
For the house votes data is it reasonable to get 100% accuracy? For the cancer data set I'm getting 47% precision, 54% recall, and 72% accuracy. For the chess data I'm getting 86% precision, 88% recall, and 88% accuracy.
|
| Tony
|
55
|
 |
|
10-27-2008 11:53 PM ET (US)
|
|
Regarding training the second stage of the Naive Bayes' classifier, when encountering "?" values for an attribute, should we:
1: treat this as a value for that variable. So, we create a count and probability for it. Also, for a test sample, if a "?" is encountered, we look up the probability (count) that the "?" occurred for that variable in the training data.
2: skip over this value, without creating a count or probability for it. When encountering a "?" in a test sample, ignore that random variable and figure out the most likely class label based on all the other non-"?" variables.
Thanks
|
| Matt
|
54
|
 |
|
10-27-2008 06:05 PM ET (US)
|
|
|
| Matt
|
53
|
 |
|
10-27-2008 06:04 PM ET (US)
|
|
|
| Matt
|
52
|
 |
|
10-27-2008 05:08 PM ET (US)
|
|
also for assignment 3 do we need to do the pseudo-counting shown in the lecture notes? As in, can we assume that every value of a feature will be observed at some point, as in no zero probabilities will occur for a given value for a feature.
|
| Matt
|
51
|
 |
|
10-27-2008 04:53 PM ET (US)
|
|
for this question from assignment 2 4. Training a perceptron classifier, for one epoch, will be faster for D than for D'. Where D has high dimensionality and few examples; D' has low dimensionality and many examples.
The answer is false right? Because of this line from lecture notes 2: A modern observation is that the number of updates until convergence does notdepend on the dimensionality of the data. This suggests that perceptron learning will be useful for very high-dimensional data such as images.
|
| Charles Elkan
|
50
|
 |
|
10-27-2008 11:01 AM ET (US)
|
|
/m48: Generally this is not a good assumption. It is often false, so you should randomly reorder the data before cross-validation.
|
| Charles Elkan
|
49
|
 |
|
10-27-2008 10:59 AM ET (US)
|
|
/m47: You can modify the data files so that the class label is is always in a fixed column, e.g. the first column. Or, you can write your code so that the user provides the column number of the labels as an input.
|
| Meir Schwarz
|
48
|
 |
|
10-27-2008 04:19 AM ET (US)
|
|
Is it OK to assume that the datasets given for HW3 are in random order? That is, for the purpose of cross validation do we have to reorder the data before dividing it?
|
| Matt
|
47
|
 |
|
10-26-2008 08:40 PM ET (US)
|
|
Given that the location of the class names varies for the dataset, do we have to use the exact same code for all 3 datasets, and if so, are we allowed to modify the data file? Otherwise we do not know how to get the class info out of the data file.
|
| Charles Elkan
|
46
|
 |
|
10-26-2008 12:39 AM ET (US)
|
|
/m45: You should understand the proofs, but you won't be asked to do new proofs. You should understand the statements of the theorems (what is asserted and what is not asserted) carefully. There may be questions on slight variants of algorithms seen in class. You won't have to invent new algorithms.
|
| Dan McKee
|
45
|
 |
|
10-25-2008 07:43 PM ET (US)
|
|
To what extent must we understand the proofs for the exam?
Do we mostly care about the results or relation of the proof?
Must we prepare for unique or modified versions of the algorithms discussed?
|
| Charles Elkan
|
44
|
 |
|
10-25-2008 04:17 PM ET (US)
|
|
Instructions for the midterm
Here are the instructions that will be on the exam: "Look through the whole exam and answer the questions that you find easiest first. Answer each question in the space below the question, using the backs of the pages for extra space as necessary. If necessary, you may make assumptions that are reasonable, and that do not make a question trivial. If you do make an assumption, state it clearly. This exam is open-book. You may use a calculator."
The true/false question instructions are "For each statement below, clearly write ``True''if it is mostly true, or ``False'' if it is mostly false. Then in the space below, write one or two sentences explaining why or how the statement is true or false. The maximum score for each answer is three points."
|
| Charles Elkan
|
43
|
 |
|
10-24-2008 02:15 PM ET (US)
|
|
|
| Charles Elkan
|
42
|
 |
|
10-24-2008 02:04 PM ET (US)
|
|
/m40: "Maybe" is not a class, so perhaps there is a misunderstanding I need to clear up? If two different classes do have identical highest probability, then you can just choose randomly between the classes. If this happens often, then most likely there is a bug in your code.
|
| Charles Elkan
|
41
|
 |
|
10-24-2008 02:01 PM ET (US)
|
|
/m39: The probabilities should not all be zero. I can think of two reasons why this might seem to happen: (1) Bug(s) in your code or your mathematical understanding. (2) If you multiply together hundreds of numbers smaller than one, the result may "underflow" and become zero. This happens for numbers less than 10^300 approximately. In this case you have two options. One option is to use fewer features. The better option is to rewrite the mathematics and your code to use logarithms. Instead of multiplying numbers, add their logarithms.
|
| Matt
|
40
|
 |
|
10-23-2008 07:04 PM ET (US)
|
|
What happens if we have a tie between the highest probability classes, like if yes is .5 and no is .5, and maybe is 0, would it be yes or no? (how do we classify the test data)
|
| Chris
|
39
|
 |
|
10-23-2008 06:51 PM ET (US)
|
|
If we calculate the probabilities for each classifier and they are all 0%, should we try to remove features until we get some non-zero possibility for a classifier, or should we just state that it is not classifiable?
|
| Charles Elkan
|
38
|
 |
|
10-23-2008 12:12 PM ET (US)
|
|
|
| Charles Elkan
|
37
|
 |
|
10-22-2008 11:49 AM ET (US)
|
|
The meanings of the chess features are explained in Figure 2 of the paper "Consciousness as an engineering issue. Part 2" by Donald Michie, Journal of Consciousness Studies, Volume 2, Number 1, 1995 , pp. 52-66(15). I have saved a copy of this paper at http://www.cs.ucsd.edu/users/elkan/151/michie.pdfThe paper itself is interesting but hard to read. It attempts to provide a theory of human problem-solving based on research in AI. The author, Donald Michie, was one of the pioneers of AI, after working with Alan Turing during World War II. See http://en.wikipedia.org/wiki/Donald_MichieThe Journal of Consciousness Studies is interesting also. Its editor-in-chief was Joseph Goguen, who was a professor in CSE at UCSD. See http://en.wikipedia.org/wiki/Joseph_Goguen
|
| Tobias
|
36
|
 |
|
10-21-2008 11:40 PM ET (US)
|
|
The description for the Chess dataset is a bit sparse and the referenced book is not available in the library. So here are some things I would like know (though most of them are just the result of my curiosity and not really required for doing the exercise):
What are the possible feature values and what do they mean? In the description "f", "t", "n" and "l" are listed.
What do the feature descriptions mean? If you have the book maybe you could copy the page with the descriptions for us?
|
| Charles Elkan
|
35
|
 |
|
10-17-2008 11:25 AM ET (US)
|
|
/m34: A linear separator is more likely to exist for the dataset that has higher dimensionality and fewer points. Fewer points means that the points are less likely to be in overlapping regions that cannot be separated. Higher dimensionality means that there is more flexibility for the separator to "find a path" between the two regions.
|
| Jacob
|
34
|
 |
|
10-16-2008 07:23 PM ET (US)
|
|
I understand what a linear separator is, but the question asks if a linear separator is more likely to exist for D or D', based on the fact that one has higher dimensionality and one has more training points. I can't find anything about that in the notes, and I don't remember it being covered in lecture either.
|
| Charles Elkan
|
33
|
 |
|
10-16-2008 04:17 PM ET (US)
|
|
|
| Charles Elkan
|
32
|
 |
|
10-16-2008 11:00 AM ET (US)
|
|
/m29, /m30, /m31: You are right, my printed notes are not very explicit about what a linear separator is. In summary: The simplest way to distinguish between two classes in p-dimensional space is a hyperplane. The points on one side are in one class, while the points on the other side are in the other class.
|
| Jacob
|
31
|
 |
|
10-16-2008 12:13 AM ET (US)
|
|
I've read those notes, and I can't find anything that clearly addresses the question.
|
| Meir Schwarz
|
30
|
 |
|
10-15-2008 10:25 PM ET (US)
|
|
|
| Jacob
|
29
|
 |
|
10-15-2008 09:56 PM ET (US)
|
|
Homework 2, question 2, part 7, about linear separability.
Did we ever cover this in class? I cannot find anything about this topic in the notes, and I don't remember it being discussed. I think I know the answer after doing my own research, but I was wondering when we covered it.
|
| Charles Elkan
|
28
|
 |
|
10-15-2008 01:27 AM ET (US)
|
|
"X is a binary event" means that either X or not-X occurs. If A and B are both binary events then there are four possibilities in total: A occurs and B occurs A occurs and not-B occurs not-A and B not-A and not-B.
|
| Loc
|
27
|
 |
|
10-14-2008 10:23 PM ET (US)
|
|
for question 2 part 2. Does binary event mean that B can't occur when the outcome is A and vice versa...so basically XOR?
|
| Charles Elkan
|
26
|
 |
|
10-14-2008 12:38 AM ET (US)
|
|
/m24: The lawyer doesn't agree that smoking causes cancer. Instead, he says that the gene causes smoking, and that the gene causes cancer separately. The numbers x, y, and z are probabilities. The goal is to find values between 0.0 and 1.0 for these numbers such that the laws of probability theory are satisfied, and the probabilities stated before part (a) are consistent with the x, y, and z probabilities.
|
| Dan
|
25
|
 |
|
10-13-2008 09:43 PM ET (US)
|
|
Looking for a partner for the next programming assignment, My original partner dropped
eekcmnad@gmail.com (714) 488-8692
|
| matthew davis
|
24
|
 |
|
10-13-2008 07:19 PM ET (US)
|
|
Can someone explain what is expected for part b for question 3 of assignment 2, I am quite confused as to how I can get numerical values from the information giving before part b. I'm not even sure what the lawyer's argument is, does this relate back to what we used in part a?
|
| Charles Elkan
|
23
|
 |
|
10-12-2008 07:49 PM ET (US)
|
|
Reminder: The current written assignment is here: http://www-cse.ucsd.edu/users/elkan/151/assignment2.pdfYour answers are due in class on Thursday October 16. The questions on this assignment are in the same style as questions on the midterm and final for 151. Don't be surprised by this style!
|
| Charles Elkan
|
22
|
 |
|
10-08-2008 05:54 PM ET (US)
|
|
Make quantitative observations in your report. Make your observations systematic. For example, if you report the confusion rate for one pair of digits, you should probably report it for all pairs of digits. Don't make the reader confused with lots of numbers in text. Instead, provide tables and/or figures.
|
| Oscar Ngai
|
21
|
 |
|
10-08-2008 04:32 PM ET (US)
|
|
So I edited my program to save and display misclassified pairs in addition to the # correct/wrong labels. Do I have to use some sort of numerical analysis on this data, or should I just just make observations in my report?
|
| Charles Elkan
|
20
|
 |
|
10-08-2008 03:27 PM ET (US)
|
|
You will not be able to achieve 100% accuracy, or even close.
The data come originally from some researchers in Turkey, so I don't think they made a connection with the US :-)
|
| Dan McKee
|
19
|
 |
|
10-08-2008 02:12 PM ET (US)
|
|
Are we supposed to be 100% accurate with the results of the program. We made a test function and right now we get 6.9% error out of all the 1779 tests.
btw, was 1779 testdata records made up because of the revolutionary war?
|
| Charles Elkan
|
18
|
 |
|
10-08-2008 10:06 AM ET (US)
|
|
You can use pretty much any Matlab function except one from a library that solves most of the project for you. If you are not sure about a particular function, just mention its name here and I'll reply.
|
| Patrick Lai
|
17
|
 |
|
10-07-2008 10:55 PM ET (US)
|
|
Are we free to use any available function in MATLAB?
|
| Charles Elkan
|
16
|
 |
|
10-06-2008 09:00 PM ET (US)
|
|
Once you have results, analyze them in more detail in order to answer questions about the classifier and suggest ideas for improving it. For example: - which pairs of digits are confused the most often? - which number k of nearest neighbors is best? - are the mistakes about the same for different k? - or do the mistakes concern different subsets of samples? - how does accuracy change when yo change the number of training samples?
|
| Oscar Ngai
|
15
|
 |
|
10-06-2008 05:58 PM ET (US)
|
|
What kind of results should our program give? so far, all I have is the number of correctly and wrongly classified test examples.
|
| Charles Elkan
|
14
|
 |
|
10-03-2008 05:35 PM ET (US)
|
|
Having a partner is very highly recommended, but not mandatory. Doing the project alone is preferable to not doing it!
|
| Benjamin Boynton
|
13
|
 |
|
10-03-2008 02:24 AM ET (US)
|
|
If we are unable to find a partner, can we still post our assignments by ourselves? If this is possible, this might be nicer for me. Thanks!
|
| Benjamin Boynton
|
12
|
 |
|
10-02-2008 08:46 AM ET (US)
|
|
Hello!
I'm also looking for a partner. I have some prior experience with Matlab and algorithms. I have completed or am currently taking almost all of the required classes for a BS/CS with the exception of CSE120 (and a bunch of Revelle GEs).
Since I'm not working this quarter (usually full time), I have lots of time to devote to this class. My classes are on Tue/Thr/Fri this quarter. I'm used to staying up all night two or three times a week from when I was working and am used to getting things done despite the cost. This quarter should be a lot easier.
Thanks for your consideration and good luck everyone!
~Benjamin Boynton bboynton@ucsd.edu
|
| Matt
|
11
|
 |
|
10-01-2008 11:51 PM ET (US)
|
|
My name is Matt, and I need a partner for the assignments. Tuesdays and Thursdays are heavy for me, but I am free the rest of the week email - mbrewer@ucsd.edu
|
| Dan
|
10
|
 |
|
10-01-2008 09:54 PM ET (US)
|
|
Hey Im Dan, im looking for a partner. I am a senior and have good debugging and analysis skills. I am learning Matlab and have a full version copy if anyone needs it.
I am really want to have a partner who is passionate about learning and works hard.
my email is eekcmnad@gmail.com
my schedule is heavy on tuesday and thursday.
:-)
|
| Charles Elkan
|
9
|
 |
|
10-01-2008 06:09 PM ET (US)
|
|
|
| Charles Elkan
|
8
|
 |
|
10-01-2008 05:56 PM ET (US)
|
|
|
| Elnur Emrah
|
7
|
 |
|
10-01-2008 03:07 PM ET (US)
|
|
Hi,
My name is Elnur. I am looking for a partner for the assignments. To introduce myself briefly, I am from Turkey. I am a senior student at CS department in Bilkent University in Turkey. I came to UCSD as an EAP exchange student. This is my first quarter in UCSD. I am currently taking three courses. My classes finish at 5 pm on Tuesdays and Thursdays. I have one-hour sections on other weekdays. I enjoy studying mathematics and algorithms. My GPA in my home university is 3.9. I am very interested in this AI course too and plan to work hard. I have little experience in Matlab but I believe I can learn it quickly. I am generally an easy-going person and like to work in teams. I do not mind working at nights or on weekends. If interested, please e-mail me as soon as possible. My e-mail address is eemrah@ucsd.edu. Thanks.
|
| Forrest Baker
|
6
|
 |
|
10-01-2008 12:41 PM ET (US)
|
|
Deleted by author 10-01-2008 12:41 PM
|
| Charles Elkan
|
5
|
 |
|
09-30-2008 11:32 AM ET (US)
|
|
/m4 answer: Thanks for the reminder. Yes, we will schedule office hours. Meanwhile, feel free to send email to me or to Albert Park to arrange a time. If anyone has suggestions for what format would be best for office hours (in a lab or not, sign up for individual times or not, etc.) please post the suggestions here.
|
| Patrick Lai
|
4
|
 |
|
09-30-2008 12:28 AM ET (US)
|
|
Will office hours be held?
|
| Charles Elkan
|
3
|
 |
|
09-26-2008 04:58 PM ET (US)
|
|
Yahoo UCSD "hack week", October 13 to 17Famous software engineers from Yahoo are coming to UCSD for a week. There will be a lot of events with free food, and a programming contest. Graduate and undergraduate students are encouraged to participate. The programming contest is open-ended, and you can use your machine learning skills in it. You can start planning a project for the contest now. UCSD is one of only six top universities selected by Yahoo for this event. For more information see http://www.cs.ucsd.edu/~ricko/YahooHackDay2008.pdfhttp://developer.yahoo.com/hacku/
|
| Charles Elkan
|
2
|
 |
|
09-26-2008 04:41 PM ET (US)
|
|
First section meeting for CSE 151
The first section will happen at the time and place listed in the Schedule of Classes: 11am to 11:50am on Monday (September 29) in Center Hall, room 217B.
The TA, Albert Park, will lead the section. He will give an introduction to programming in Matlab. This material is essential for the first assignment. All 151 students should attend.
|
Charles Elkan
|
1
|
 |
|
07-14-2008 04:11 PM ET (US)
|
|
This is the place to ask all questions that are of general interest concerning CSE 151.
|