QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: CSE 291 Assignment 5, Winter 2005
Views: 2060, Unique: 663 
Subscribers: 1
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages    << 26-41  10-25 of 41  1-9 >>
About these ads
Who | When
Messagessort recent-bottom   
Post a new message
 
Hyun Min Kang  25
03-15-2005 08:12 PM ET (US)
Edited by author 03-15-2005 08:13 PM
/m23 As far as I know, Wilcoxon rank-sum test is a nonparametric test to see if two distributions are independent or not (like t-test). I think it's quite different from testing if two medians are the same. For example, if a = [0 0 0 0 0 1 2 3 4], b = [-4 -3 -2 -1 0 0 0 0 0], then rank-sum test would report some 'significant' result, but actually their median is the same. Shouldn't we use a different test?
Charles ElkanPerson was signed in when posted  24
03-15-2005 06:00 PM ET (US)
/m18 answer: You are right, I mean use the bootstrap method to investigate the distribution of the sample median.
Charles ElkanPerson was signed in when posted  23
03-15-2005 05:55 PM ET (US)
/m17 answer: Using the difference between the two medians as your test statistic is reasonable, but only if the distributions have similar spread.

Otherwise, you may want to be inspired by a known test for whether two medians are equal, e.g. the Wilcoxon rank sum test. See http://www.mathworks.com/access/helpdesk/h.../stats/ranksum.html
Charles ElkanPerson was signed in when posted  22
03-15-2005 05:50 PM ET (US)
/m16, /m19: Thanks for the explanation, Jan.
Charles ElkanPerson was signed in when posted  21
03-15-2005 05:49 PM ET (US)
/m20 answer: The Laplace has fatter tails than the Gaussian because the probability of x decays as exp(-x) for the Laplace, as opposed to exp(-x^2) for the Gaussian.
Taylor Sittler  20
03-14-2005 02:21 PM ET (US)
Re: LaPlace

The LaPlace (double-exponential) distribution seems to have tails that are skinnier than the Gaussian. Is there a way to shape it such that it has heavy tails (ideally with variance=1)?
Jan Schellenberger  19
03-14-2005 02:56 AM ET (US)
/m16 The scaling is important to give each feature an equal chance at contributing.

Let's say feature 1 is an excellent predictor of the output, however, the variance of feature 1 is tiny. Then in order to fit a good model, the b coefficient of this feature will have to be huge. However, in ridge regression we are trying to also shrink b as we fit the model, so a ridge fitted model may ignore feature 1 in favor of other features which have a bigger variance even though they are worse predictors. Normalizing each feature eliminates this problem by making the 'average' b for each feature about the same.

-Jan
Jan Schellenberger  18
03-14-2005 02:34 AM ET (US)
For Problem 2a)

What does it mean to estimate the median of a distribution using bootstrapping. I can see how you can estimate the median from a sample. I don't see how bootstraping helps. Bootstrapping may be useful to figure out the distribution of the median. Is that the question?

-Jan
Banu Dost  17
03-12-2005 03:48 AM ET (US)
For problem 2 part b, is using the absolute value of the difference between two medians as our test statistic good idea? Or should it be something more complicated?

Banu
Banu Dost  16
03-12-2005 03:43 AM ET (US)
In ridge regression, I do not see the point of standardizing the data by shifting and scaling. If we shift the data, the b values do not change, except b0. If we scale it by 1/std of the column then b values are scaled by std of column itself? But we still have the same predicted y vector. So, what do we gain by standardizing?
Charles ElkanPerson was signed in when posted  15
03-10-2005 07:59 PM ET (US)
/m11 answer: The Laplace distribution has finite variance, and tails that are heavier than the Gaussian's.

The Pareto distribution has even heavier tails. See pages 623 and 625 of Casella and Berger.
Charles ElkanPerson was signed in when posted  14
03-10-2005 07:53 PM ET (US)
/m12 answer: What's important is the yhat predictions for test examples. I think these will be the same whether you scale the test data, or you rescale back the training data, because the scaling is a linear transformation.

If you rescale back the training data, there is a simple formula for rescaling the coefficients b.
Charles ElkanPerson was signed in when posted  13
03-10-2005 07:50 PM ET (US)
/m10 answer: I don't think you can say that any value for lambda is intrinsically large or small.

What's more meaninful is how large SUM (y - b0 - SUM xj*bj)^2 is relative to lambda*SUM_j>=1 bj^2.

If the latter is smaller than the former, that says the ridge regession is not changing the answer much. This is what I would expect if all the predictors are useful, and the sample size is large.
Stephen Krotosky  12
03-09-2005 06:41 PM ET (US)
Also for Problem 1,

Once we generate the best set b, do we rescale back to the original means and variances, or do we scale our new test data using the same scaling factors that we scaled our training data. The test data won't exactly have zero mean and unit variance, but if the training and test data each represent the underlying distributions well, it seems like the error would be small? Also if we rescale back, how does this affect our b values.

Thanks
Stephen Krotosky  11
03-09-2005 06:39 PM ET (US)
Question for Problem 2:

I've figured out how to generate the other distributions, but what is an example of a distribution with heavy tails and finite variance?
Stephen Krotosky  10
03-09-2005 06:38 PM ET (US)
Edited by author 03-09-2005 06:38 PM
Some more questions on problem 1:

After doing forward selection with Sidak, I find 7 significant X parameters. I then scale and shift those values to give zero-mean and unit-variance. Next, I try to perform ridge regression using 10-fold cross validation to find the best lambda.

My question is that I get MSE on the order of about 2700 and I get optimal lambda values between 1000 and 1600, roughly, depending on how I randomly permute my X data into 10 sections. Does this seem like a reasonable value? I was thinking that lambda would be of considerably less magnitude.

Also how do I resolve the problem that I have such a wide range of lambda's depending in how I divide up the data. I suppose I could do repeated attempts and take an average, but that seems computationally expensive, since I would have to range over hundreds of possible lambdas.

Thanks
RSS link What's this?
All messages    << 26-41  10-25 of 41  1-9 >>
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.