| Who | When |
Messages | |
|
|
|
Lingyun
|
1
|
 |
|
03-03-2004 01:06 AM ET (US)
|
|
Deleted by author 03-03-2004 01:08 AM
|
| Doug Turnbull
|
2
|
 |
|
03-03-2004 01:08 AM ET (US)
|
|
In Silvey 1a, how do we verify the large sample theory? Is it enough to show that the distribution of the MLE can be approximated by Normally distribution with mean theta and variance 1/(n * fisher info for a single x_i) an n approaches infinity?
|
| Anjum Gupta
|
3
|
 |
|
03-05-2004 06:13 AM ET (US)
|
|
Edited by author 03-05-2004 06:17 AM
I have a few questions about the dataset for the 4th problem. Question 1. What is the difference between "ODATEW" and "FISTDATE"? Question 2. It looks like the following variables should have binary values but they don't. Why don't they have binary values? MALEMILI % Males active in the Military MALEVET % Males Veterans VIETVETS % Vietnam Vets WWIIVETS % WWII Vets LOCALGOV % Employed by Local Gov STATEGOV % Employed by State Gov FEDGOV % Employed by Fed Gov Question 3. Anyone has any good idea about how to deal with the missing values? By the way, the README explaining all the variables is at http://kdd.ics.uci.edu/databases/kddcup98/...mirror/cup98dic.txtThanks.
|
| Anjum Gupta
|
4
|
 |
|
03-07-2004 03:34 AM ET (US)
|
|
Edited by author 03-07-2004 03:35 AM
I think I have the answer to question 3 of mine in :/m3. I suppose we will be using the method discussed in class to deal with missing data. We went over an example with elevator waiting time with some missing data. However, The answer to 1st two question is still eluding me. Thanks.
|
| Jay
|
5
|
 |
|
03-07-2004 12:27 PM ET (US)
|
|
Edited by author 03-07-2004 01:53 PM
Anjum : /m3,question 2: I think for those variable, 0 means missing. The class index starts at 1. My question is if we should treate symbolic variables as regular regression variables, such as ZIP, GENDER, FEDGOV. My guess is that usually we use binary data as an indicator to combine with other variables. For example, assume we have variables x_i taking real value and x_j taking binary valu. Then there might be a term in the regression with form \beta * <x_i,x_j>. <,> means inner product. Just my two cents.
|
| Jay
|
6
|
 |
|
03-07-2004 02:05 PM ET (US)
|
|
/m3, sorry Anjum, I was wrong. I guess following variables are the probabilites that the donor falls in each field. MALEMILI % Males active in the Military MALEVET % Males Veterans VIETVETS % Vietnam Vets WWIIVETS % WWII Vets LOCALGOV % Employed by Local Gov STATEGOV % Employed by State Gov FEDGOV % Employed by Fed Gov
|
| Nuno
|
7
|
 |
|
03-07-2004 11:37 PM ET (US)
|
|
Edited by author 03-08-2004 12:06 AM
(contributions to other questions at the end) 1 - RFA_2A has 7 categories but only values 1-4. Should we really assume that there were no donations above $10? 2 - Is there a description somewhere of what the values mean? In the dictionary many of the values are said to be nominal/symbolic but we only get numerical values with no information as to what was the conversion process. /m6Jay, I added all the values in these columns and at least one added to more than 200 so I don't think these values are percentages/probabilities. A possibility that could make sense is that the donation may have been sent by some other association and these numbers reflect membership statistics. There does not some to be some requirement that the donations were made by individuals. /m4 Anjum, I believe the example in class was different from this one. In the elevator case we had some waiting time information and then quit - this is 'incomplete' information, not exactly missing information. A commonly used approach, with arguable merits, is to replace the missing values with the mean of the known values. About the difference between "ODATEW" and "FISTDATE", I just noticed that there are some fields about promotion history and giving history but nothing in these mentions PVA. Maybe these are more general history fields reflecting the overall contribution history and not just PVA donations?
|
| Michael Green
|
8
|
 |
|
03-08-2004 07:57 PM ET (US)
|
|
I have a question about problem 3:
part1: I'm having trouble writing an expression for the likelihood of a set of observations, given an assumed number of coefficients. I believe I understand the intuition (below), but I'm not sure about how to express it in formula. If my intuition is correct, I could live without a nice formula, but it would be a heck of a lot easier to implement, if I had a nice likelihood formula.
part2: Intuition: Treat the set of Y_i's as distributed as a Gaussian with different means (depending on X_i's), but with the same variance. Use the least squares solution as the MLE for a model containing a specific set of variables. Use the resultant RSS to estimate the variance of Y. Given (1) the expected Y (treated as mean) (2) the actual Y and (3) the estimated variance of Y, we have enough to determine the (maximum) likelihood of the observation assuming our model has the right parameters. Does this make sense?
|
| RTHORNE
|
9
|
 |
|
04-02-2004 06:27 AM ET (US)
|
|
|
|
|
10
|
 |
|
07-22-2006 03:32 AM ET (US)
|
|
Deleted by topic administrator 07-23-2006 02:06 AM
|
| taletopascna
|
11
|
 |
|
08-17-2008 02:16 PM ET (US)
|
|
letotaer
|