| Who | When |
Messages | |
|
|
|
|
|
| Charles Elkan
|
109
|
 |
|
11-21-2008 11:58 AM ET (US)
|
|
If you like CSE 151, the following courses are highly recommended as further learning.
Announcing COGS 118A and 118B (Natural Computation I and II)
Are you interested in graduate school in machine learning, computational neuroscience, or computational modeling? Interested in lucrative jobs using machine learning to solve practical problems? Consider taking one or both of COGS 118A and 118B (Natural Computation I and II). These courses give a rigorous background in machine learning theory and algorithms designed to provide the background we want our machine learning graduate students to have. A strong math background is required (Math 20E, 20F, 180A and a prior course in computer programming are prerequisites) but you may discuss your individual situation with the instructors. Please note that these courses can be taken in either order. You do not need to take 118A before 118B. 118A will be offered in Winter 2009 (Professor Angela Yu) and 118B in Spring 2009 (Professor Virginia de Sa).
118A. Natural Computation I (4) This course is an introduction to computational modeling of biological intelligence, focusing on neural networks and related approaches to SUPERVISED learning. Topics include estimation, filtering, optimization, neural networks, support vector machines, Gaussian Processes, Bayes nets. Prerequisites: Cognitive Science 109 or equivalent, Mathematics 20E, Mathematics 20F, and Mathematics 180A or consent of instructor.
118B. Natural Computation II (4) This course is an introduction to computational modeling of biological intelligence, focusing on neural networks and related approaches to UNSUPERVISED learning. Topics include density estimation, clustering, self-organizing maps, principal component analysis, kernel methods, and information theoretic models. Prerequisites: Cognitive Science 109 or equivalent, Mathematics 20E, Mathematics 20F, and Mathematics 180A or consent of instructor.
----------------------------------------------------------- ------ Virginia de Sa desa@ucsd.edu Department of Cognitive Science ph: 858-822-5095 9500 Gilman Dr. 858-822-2402 La Jolla, CA 92093-0515 fax: 858-534-1128 -------------------------------------------------------------- --- |  | |
|
| Charles Elkan
|
108
|
 |
|
11-18-2008 09:58 AM ET (US)
|
|
REPRESENTATION, SEARCH, AND THE WEB Professor Richard K. Belew > CogSci 188 - Winter, 2009 > > Tuesday, Thursday 9:30-10:50a > > Warren Lecture Hall 2113 > > SectID#?? - (4 units) > > http://abbey.ucsd.edu:8080/cogs188> > Recent estimates suggest that our species is producing five exabytes (5 billion gigabytes) of "content," each year! But the more content we produce, the harder it often becomes to find anything, let along anything "relevant." This course will present a survey of computational techniques designed to represent content, search through it, and use the WWW to connect the new producers of content with their new audiences. The central focus will be on probabilistic techniques for inferring textual documents' "meaning'' from word occurrence statistics. Graph analysis techniques applied to bibliographic citations and the Web, Web crawling, and Web2.0 techniques will also be discussed. > > Students will analyze these algorithms mathematically and experiment with their implementation. Students with and without programming backgrounds will be accomodated. Both graduate and undergraduate students are welcome.
|
| Charles Elkan
|
107
|
 |
|
11-18-2008 01:10 AM ET (US)
|
|
/m106: Likelihoods are positive numbers very close to zero. They should be increasing with each iteration of EM.
|
| Meir Schwarz
|
106
|
 |
|
11-18-2008 12:49 AM ET (US)
|
|
If I'm seeing positive numbers that are progressively getting smaller am I computing likelihood instead of log likelihood?
|
| Charles Elkan
|
105
|
 |
|
11-18-2008 12:46 AM ET (US)
|
|
/m102: Yes, there is a different likelihood for each document. Because this is based on a pmf (not pdf) it is a number between 0 and 1 that is very close to zero in practice. The sum of all the log likelihoods is not one. It is a large negative number that should be increasing, i.e. getting less negative. For how to check if you have maximized the sum of log likelihoods, see /m104.
|
| Charles Elkan
|
104
|
 |
|
11-18-2008 12:43 AM ET (US)
|
|
/m100: No, you don't need to compute derivatives. Just compute the sum of log likelihoods after each E step and check that this sum is always increasing, but more and more slowly.
|
| Charles Elkan
|
103
|
 |
|
11-18-2008 12:41 AM ET (US)
|
|
/m99: Present your results without annealing. However, annealing should just add one line of code. Note you can raise to a power 1/t by dividing the logarithm by t. You can still use the same constant c idea to avoid overflow.
|
| Meir Schwarz
|
102
|
 |
|
11-18-2008 12:36 AM ET (US)
|
|
Just to make sure I have this right there is a different log likelihood for each document and it is just 1 number between 0 and 1. The sum of all the log likelihoods is 1.
How can we check if we have maximized the log likelihood efficiently?
|
| Albert Park
|
101
|
 |
|
11-17-2008 08:02 PM ET (US)
|
|
My apologies to those of you who went to section and waited today. Unfortunately there was a problem with my flight back to the US from Korea, and I couldn't make it to section. I will contact Charles about a make up section if one is wanted.
|
| Meir Schwarz
|
100
|
 |
|
11-17-2008 06:49 PM ET (US)
|
|
To determine if we correctly maximize the log likelihood do we have to take the derivative at each theta and make sure it is 0 or is there some easier way?
|
| Peter Faymonville
|
99
|
 |
|
11-17-2008 06:48 PM ET (US)
|
|
What shall we do if we have a bug in our implementation and cannot find it? Before we use deterministic annealing, we get senseful results in the range of 40 to 80 percent accuracy, but after we put in the code for annealing we seem to run in a bunch of underflows for the document weights. After spending half-dozen hours to find the error, how should we proceed?
|
| Charles Elkan
|
98
|
 |
|
11-17-2008 06:20 PM ET (US)
|
|
/m97: There is no useful theory about how many iterations until convergence that I know of. Call this number J and let it be a parameter of your big-O analysis.
|
| Meir Schwarz
|
97
|
 |
|
11-17-2008 06:17 PM ET (US)
|
|
How are we supposed to determine the big-O time complexity for our algorithm? Is there a theoretical maximum to the number of times it has to loop before convergence?
|
| Charles Elkan
|
96
|
 |
|
11-17-2008 04:42 PM ET (US)
|
|
/m94: I suggest computing one multinomial for the whole dataset. Then, initialize the multinomial for each component to be a small random fluctuation around this central multinomial. Another choice is to initialize the w_ik randomly instead, and to start the EM algorithm with an M step.
|
| Charles Elkan
|
95
|
 |
|
11-17-2008 04:40 PM ET (US)
|
|
We will return the midterms in class on Tuesday November 18. The mean score was 35, with standard deviation 9.
|
| matt
|
94
|
 |
|
11-17-2008 01:48 PM ET (US)
|
|
Edited by author 11-17-2008 02:23 PM
what do we use as our initialization for theta k. Do we evenly distribute to start?
|