| Who | When |
Messages | |
|
|
|
Charles Elkan
|
1
|
 |
|
02-01-2005 03:06 PM ET (US)
|
|
Please ask questions here about the third assignment for "Statistical Learning".
|
| Stephen Krotosky
|
2
|
 |
|
02-04-2005 06:59 PM ET (US)
|
|
I have some questions on problem 1.
1) When you say repeat part c, do you mean that we simulate using Cauchy and compare to our theoretical answers for Gaussian, since Cauchy integrals blow up?
2) To generate Cauchy distributed samples, I am executing the following code:
temp = rand(samples,1)*pi - pi/2; xcauchy = tan(temp).*g.*sqrt(n) + d*n;
where g is the "spread" of the cauchy distribtion, and d is the median return value. I multiply each by sqrt(n) and n respectively to account for the change in distribution each year n. I've plotted these compared with Gaussians and am able to get pdfs that look similar in terms of "mean" and "variance".
However, I'm unsure how to actually use this to get valid results. In the Gaussian case, I can simply generate a Gaussian, find the ones that are lower than the cash investment and replace those values and find the new mean as shown:
xnormal = randn(samples,1).*s.*sqrt(n) + d*n; I = find(xnormal < n*c); xnormal(I) = n*c; avg_ret_normal(n) = mean(xnormal) + (N-n)*d;
However, if I do that for the cauchy, the mean blows up. If I try to take the median instead of the mean, this would just give me a value close to n*d, since the median wouldn't change if I just truncate the tail and replace it with cash investment values.
One solution I just thought of trying would be to find a new median associated with just the values > n*c, but I'm not sure if this is valid, or if it would be a valid comparison to the Gaussian case.
Any thoughts would be appreciated.
|
Charles Elkan
|
3
|
 |
|
02-05-2005 01:04 PM ET (US)
|
|
/m2 answer: What you write is thoughtful, but I will need some more time to understand it. (1) answer: Yes, simulate with Cauchys and see if the answer based on Gaussians is still useful. (2) answer: To keep the simulation simple, I suggest simulating just one year at a time. While the sum of k Gaussians is itself a Gaussian, this fact is not true for other distributions. So, just generate each simulation one year at a time.
|
| samory
|
4
|
 |
|
02-08-2005 02:46 AM ET (US)
|
|
Hi, About Question 4:
I don't understand how the organelles have "equal" radius r if we could at the same time observe n different radii. Could you please clarify ?
Thanks,
Samory
|
| samory
|
5
|
 |
|
02-08-2005 03:03 AM ET (US)
|
|
OK, never mind ... r is the radius of the spheres, and the xi's are radii of circles cut out of those spheres.
Samory.
|
Charles Elkan
|
6
|
 |
|
02-08-2005 02:25 PM ET (US)
|
|
|
| Ryan Kelley
|
7
|
 |
|
02-09-2005 01:44 AM ET (US)
|
|
/m2I think I am experiencing a similar problem where the expected gain is undefined if the stock market distribution is cauchy (even for a large number of trials, the value does not converge). Supposing this is the case, should we just report this as evidence that our answer in the previous case is incorrect?
|
| Michael Sanders
|
8
|
 |
|
02-09-2005 02:38 PM ET (US)
|
|
How do you calculate the probability of a hypergeometric for a population N with m tags and a sample size n with r tags, where n,r << N? Using the formula in Casella causes overflow in Matlab
|
| Jan Schellenberger
|
9
|
 |
|
02-09-2005 06:50 PM ET (US)
|
|
|
| Stephen Krotosky
|
10
|
 |
|
02-09-2005 07:42 PM ET (US)
|
|
Edited by author 02-09-2005 07:42 PM
/m8 /m9: This has confused me. My initial logic was that we would assume that the number of animals found tagged, r, would follow a binomial distribution in our hypothesis test. Assuming that we initially believe that there are N animals, we can assume that p = m/N. We also know that the E[r] = n*p and the var[r] = n*p*(1-p). I was thinking that in simulation we could find the observed r and then see if it exceeds a certain threshold. We would only be concerned with exceeding it, since if the actual N is less than the claimed N, we would expect to see more tagged animals. Is this reasoning correct and if so, is it similar or identical to the hypergeometric reasoning? Thanks
|
| Jan Schellenberger
|
11
|
 |
|
02-09-2005 07:47 PM ET (US)
|
|
/m10That seems reasonable. The hypergeometric distribution is a generalization of the geometric and if N were large you could use the binomial. However, if N is small then your samples are no longer iid (finding a tagged animal decreases the chance that the next animal observed will be tagged). The hypergeometric distribution takes this into account. -Jan
|
| Stephen Krotosky
|
12
|
 |
|
02-09-2005 07:50 PM ET (US)
|
|
/m11 Oh I see. Thanks. I didn't think about the fact that I was assuming iid. That makes perfect sense.
|