QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: CSE 291 Assignment 3, Winter 2005
Views: 1798, Unique: 501 
Subscribers: 0
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages            1-50 of 50        
About these ads
Who | When
Messagessort recent-top   
Post a new message
 
Charles ElkanPerson was signed in when posted  1
02-01-2005 03:06 PM ET (US)
Please ask questions here about the third assignment for "Statistical Learning".
Stephen Krotosky  2
02-04-2005 06:59 PM ET (US)
I have some questions on problem 1.

1) When you say repeat part c, do you mean that we simulate using Cauchy and compare to our theoretical answers for Gaussian, since Cauchy integrals blow up?

2) To generate Cauchy distributed samples, I am executing the following code:

    temp = rand(samples,1)*pi - pi/2;
    xcauchy = tan(temp).*g.*sqrt(n) + d*n;

where g is the "spread" of the cauchy distribtion, and d is the median return value. I multiply each by sqrt(n) and n respectively to account for the change in distribution each year n. I've plotted these compared with Gaussians and am able to get pdfs that look similar in terms of "mean" and "variance".

However, I'm unsure how to actually use this to get valid results. In the Gaussian case, I can simply generate a Gaussian, find the ones that are lower than the cash investment and replace those values and find the new mean as shown:

    xnormal = randn(samples,1).*s.*sqrt(n) + d*n;
    I = find(xnormal < n*c);
    xnormal(I) = n*c;
    avg_ret_normal(n) = mean(xnormal) + (N-n)*d;

However, if I do that for the cauchy, the mean blows up. If I try to take the median instead of the mean, this would just give me a value close to n*d, since the median wouldn't change if I just truncate the tail and replace it with cash investment values.

One solution I just thought of trying would be to find a new median associated with just the values > n*c, but I'm not sure if this is valid, or if it would be a valid comparison to the Gaussian case.

Any thoughts would be appreciated.
Charles ElkanPerson was signed in when posted  3
02-05-2005 01:04 PM ET (US)
/m2 answer: What you write is thoughtful, but I will need some more time to understand it.

(1) answer: Yes, simulate with Cauchys and see if the answer based on Gaussians is still useful.

(2) answer: To keep the simulation simple, I suggest simulating just one year at a time. While the sum of k Gaussians is itself a Gaussian, this fact is not true for other distributions. So, just generate each simulation one year at a time.
samory  4
02-08-2005 02:46 AM ET (US)
Hi,
About Question 4:

I don't understand how the organelles have "equal" radius r if we could at the same time observe n different radii. Could you please clarify ?

Thanks,

Samory
samory  5
02-08-2005 03:03 AM ET (US)
OK, never mind ...
r is the radius of the spheres, and the xi's are radii of circles cut out of those spheres.


Samory.
Charles ElkanPerson was signed in when posted  6
02-08-2005 02:25 PM ET (US)
/m5 answer: Yes!
Ryan Kelley  7
02-09-2005 01:44 AM ET (US)
/m2
I think I am experiencing a similar problem where the expected gain is undefined if the stock market distribution is cauchy (even for a large number of trials, the value does not converge). Supposing this is the case, should we just report this as evidence that our answer in the previous case is incorrect?
Michael Sanders  8
02-09-2005 02:38 PM ET (US)
How do you calculate the probability of a hypergeometric for a population N with m tags and a sample size n with r tags, where n,r << N? Using the formula in Casella causes overflow in Matlab
Jan Schellenberger  9
02-09-2005 06:50 PM ET (US)
/m8
You can use stirling's approximation.

ln (n!) is about n ln(n) - n.

 http://mathworld.wolfram.com/StirlingsApproximation.html

Then just calculate ln(probability) and take e^ans at the end. That should prevent any overflow.

Hope that helps.
-Jan
Stephen Krotosky  10
02-09-2005 07:42 PM ET (US)
Edited by author 02-09-2005 07:42 PM
/m8 /m9: This has confused me.

My initial logic was that we would assume that the number of animals found tagged, r, would follow a binomial distribution in our hypothesis test. Assuming that we initially believe that there are N animals, we can assume that p = m/N. We also know that the E[r] = n*p and the var[r] = n*p*(1-p).

I was thinking that in simulation we could find the observed r and then see if it exceeds a certain threshold. We would only be concerned with exceeding it, since if the actual N is less than the claimed N, we would expect to see more tagged animals.

Is this reasoning correct and if so, is it similar or identical to the hypergeometric reasoning?

Thanks
Jan Schellenberger  11
02-09-2005 07:47 PM ET (US)
/m10

That seems reasonable. The hypergeometric distribution is a generalization of the geometric and if N were large you could use the binomial.
However, if N is small then your samples are no longer iid (finding a tagged animal decreases the chance that the next animal observed will be tagged). The hypergeometric distribution takes this into account.

-Jan
Stephen Krotosky  12
02-09-2005 07:50 PM ET (US)
/m11 Oh I see. Thanks. I didn't think about the fact that I was assuming iid. That makes perfect sense.
Charles ElkanPerson was signed in when posted  13
02-10-2005 11:45 AM ET (US)
/m7 answer: I discussed this with Ryan, and I think he is correct.

This does not mean that the Gaussian-based answer is incorrect, just that it may be unreliable in the real world. It points out the need to investigate which distribution(s) model real-world returns well.
Charles ElkanPerson was signed in when posted  14
02-10-2005 11:52 AM ET (US)
/m10 answer: I think the reasoning with the binomial is correct, and the hypergeometric is not needed for this problem.

Be careful with this claim: "if the actual N is less than the claimed N, we would expect to see more tagged animals." I'm not saying it's false (or true) just that it requires careful thought to be sure.
Charles ElkanPerson was signed in when posted  15
02-10-2005 11:55 AM ET (US)
/m10, /m11 answer: Yes, the binomial assumes N is large. You may assume this; I should have mentioned it in the problem statement. No need to do the more difficult hypergeometric calculations.
Hyun Min Kang  16
02-11-2005 03:03 AM ET (US)
Where can I get the formal definition of Dirichlet distributions, power law distirbutions, and Zipf distributions?
Banu Dost  17
02-11-2005 05:19 AM ET (US)
from wikipedia.com
Banu
Hyun Min Kang  18
02-11-2005 12:33 PM ET (US)
In part 3(c), "Explain carefully whether or not the theorem relies on the exponential family being described using its natural parameters." I don't get that part. What does that mean, and what I am supposed to explain?
Hyun Min Kang  19
02-11-2005 12:37 PM ET (US)
/m17 Thanks for the information. I found dirichlet there, but Zipf distribution is not stated clearly, and there is no power law distribtution. Where can I get these info?
Hyun Min Kang  20
02-11-2005 09:44 PM ET (US)
Edited by author 02-11-2005 09:44 PM
In problem 4, I don't get the last sentence. What did you mean by "distribution of estimate"? For example, if avg(x1,..,xn) is the MLE of \mu in Gaussian, what is the distribution of the estimate?
Ryan Kelley  21
02-11-2005 11:40 PM ET (US)
\m20 Just as a random variable has a distribution, so does some function of that variable. Since an estimator is just a function of the data, it will some distribution as well. In your example, if x1,...,xn are iid gaussian(\mu,\sigma^2), then avg(x1,...,xn) will follow a gaussian distribution with mean \mu and variance \frac{\sigma^2}{n}
Charles ElkanPerson was signed in when posted  22
02-12-2005 12:56 AM ET (US)
/m19 answer: One lesson to learn from this problem is that widely used terms may not have a universally shared definition. You should make the definition you use clear.

The wikipedia is an excellent reference. Here is a paper that explicitly discusses the confusion:
http://www.hpl.hp.com/research/idl/papers/ranking/ranking.html

For the contemporary importance of power laws see
http://www.sciencemag.org/cgi/content/full/294/5548/1849
Charles ElkanPerson was signed in when posted  23
02-12-2005 01:02 AM ET (US)
/m18 answer: Natural parameters are explained here: http://www-cse.ucsd.edu/users/elkan/291/lect08.pdf

The issue is, when you look for an open rectangle, do you have to look in the space of natural parameters, or can you look in the space of original parameters?
Charles ElkanPerson was signed in when posted  24
02-12-2005 01:07 AM ET (US)
/m21 answer: Yes.
Hyun Min Kang  25
02-12-2005 12:13 PM ET (US)
/m21 /m24 Thanks. I understand what you mean. One more question. Can we assume that the number of cross-section is always n? If the organelles are distributed randomly, the number of cross-section will actually vary. So, the distribution should involve the probability distribution of n also. But since there is no description about the density of the organelles, I cannot compute the exact distribution. (Also, the problem becomes much more complicated).
Charles ElkanPerson was signed in when posted  26
02-12-2005 03:47 PM ET (US)
/m25 answer: Yes, you may assume that the number of cross-sections n is fixed.
Stephen Krotosky  27
02-13-2005 01:55 AM ET (US)
For problem 4,

I've figured out what the distribution looks like, and I believe I can define it mathematically, but am having trouble doing so. Basically what I would like to do is transform a uniform distribution to the desired one. Here is my logic.

If we look at a cross section of an organelle of radius r, the cross section can be thought of as cutting the organelle at a point x that is generated uniformly from -r..r. This corresponds to the front and back of the sphere relative to the cutting plane.

If we know where the sphere is cut, we can compute the observed radius, Z = sqrt(r^2 - x^2). My questions is how can I use the function for Z and the fact that x is uniform to find the pdf (likelihood function) for Z.

I know this is possible but am having difficulty actually doing it. Thanks.
Jan Schellenberger  28
02-13-2005 05:34 AM ET (US)
/m27
There is a trick.

You can use the CDF of the uniform (P(z<Z)) to correspond to some CDF of the radii (P(r>R)). Once you have that, the PDF is the derivative of the CDF.
Stephen Krotosky  29
02-13-2005 02:24 PM ET (US)
Thanks, I had figured that out.

Now I'm having trouble finding the MLE, because i'm having trouble breaking up the sum for the total score function.

I know that each x_i ~ p*(x_i,r) = x_i / ( r sqrt(r^2-x^2)

s(x_i,r) = -1/r - r/(r^2-x_i)^2

I need to find s(x,r) = \sum s(x_i,r)

When I do that, I can't seem to separate it into something solveable. Does anyone have any tips on manipulating the sum to give the score function in terms that will put the sum only on the x_i's

Thanks,

Stephen


--
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.300 / Virus Database: 265.8.7 - Release Date: 2/10/2005
Stephen Krotosky  30
02-13-2005 02:25 PM ET (US)
/m29 sorry that should read:

s(x_i,r) = -1/r - r/(r^2-x_i^2)
Banu Dost  31
02-13-2005 02:27 PM ET (US)
For problem 4, which variable is uniformly distributed? Is the distance of a cross-section or the radius of a sphere itself? How can we decide this? In both cases, we get similar pdf but not the same.
Charles ElkanPerson was signed in when posted  32
02-13-2005 02:51 PM ET (US)
/m31 answer: Your question is perhaps the part of the problem that is conceptually the least clear. I think the answer is that the point at which each organelle is aliced is uniformly distributed, hence the observed radius of the disk obtained from the organelle is not uniformly distributed.
Charles ElkanPerson was signed in when posted  33
02-13-2005 02:54 PM ET (US)
/m29 answer: Remember, you can maximize f(x) by setting its derivative to zero only when the optimal x is in the interior of the allowed range for x.
 
Messages 34-37 deleted by topic administrator between 07-21-2006 09:00 AM and 07-22-2006 09:30 AM
Anthony  38
07-21-2006 05:48 PM ET (US)
You have a very good topic here, so best greetings to you and all your visitors. Admin of inderal information webpage devoted to inderal information. free amaryl webpage devoted to free amaryl.
Jordan  39
07-21-2006 11:28 PM ET (US)
Wow! Well done! If this was, like, an assessment or something, i'd sure give you guys full marks!! buy pravachol webpage devoted to buy pravachol. pravachol webpage devoted to pravachol. master.
 
Messages 40-42 deleted by topic administrator between 06-25-2008 02:30 AM and 07-22-2006 09:30 AM
çet  43
01-02-2009 06:36 AM ET (US)
sohbet  44
01-02-2009 08:30 AM ET (US)
Phfjlsrf  45
07-13-2009 09:02 PM ET (US)
dR8a0R
Keytmalr  46
07-14-2009 04:58 PM ET (US)
E3y7Zp
Lvwcbgmo  47
07-15-2009 01:52 AM ET (US)
8xjwx8
 
Messages 48-50 deleted by topic administrator 08-02-2009 02:07 AM
RSS link What's this?
All messages            1-50 of 50        
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.