QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: CSE 291 Winter 2004 Assignment 3 questions
Views: 1800, Unique: 800 
Subscribers: 1
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
About these ads
Who | When
Messagessort recent-bottom   
Post a new message
 
 
Messages 27-24 deleted by topic administrator between 07-21-2006 08:59 AM and 07-23-2006 02:05 AM
Andrew Smith  23
02-19-2004 09:27 AM ET (US)
Anjum:

To find E[x|x>0] you need to integrate:

x * p(x | x>0) dx

which is:

x * p(x ^ x>0)/p(x>0)

since your integration is from 0 to +inf, the numerator reduces to

x * p(x) / p(x>0)

You're forgetting the denominator, which is the factor of 2 you're missing.

-Andrew
Michael Green  22
02-18-2004 07:15 PM ET (US)
Anjum: Sorry for the confusion. You want to find another distribution where the SAMPLE you took has the same mean and variance. For instance, imagine getting a sample and it happens to have come from a chi-squared (just for illustration) distribution. If you assume Ho is true, you will do your stuff according to parts a and b to get a locations for c1,c2 and distribution of r(x). Compare that to what you would get if you had calculated locations for c1 and c2 (and therefore r(x)) assuming the same points had come from a chi-squared distribution. Is that helpful?
Anjum GuptaPerson was signed in when posted  21
02-18-2004 01:18 PM ET (US)
Edited by author 02-18-2004 01:37 PM
Michael: Thanks for your response to my message /m18. Just to clarify it again, so you are saying that I need to find another distribution with same Mean and Variance to compare the value of E(r(x)) to the one obtained under H*. So if I want to consider a normal distribution with N = 100, then under H* I will have to use x~N(50.5,833.25) to get the same variance and mean?

To respond to /m20. Noah, I did the exact same integration and came up with the same conflict between my theoretical and experimental results. As far the integration is concerned, I am pretty confident that it is right. I think we are missing some other factor or some other concept that is making our experimental results different. btw, in your integration expression you forgot to write a constant factor of (1/(sigma*(2*pi^0.5))).
Noah  20
02-18-2004 12:30 PM ET (US)
I am trying to do the symbolic integration from 0 to
infinity of

xe^(x^2/sigma^2)dx

to find the average value of x on the right side of the
gaussian. I get sigma/(2*(2*pi)^.5) but my expermintal results give close to 1 minus that value, and I am confident they are correct. Much more confindent than in my
very rusty integration by substitution skills. Does anybody know how to integrate the above correctly?
Michael Green  19
02-18-2004 11:54 AM ET (US)
Edited by author 02-18-2004 11:56 AM
Anjum: In both C and D, assume that you are working with the same set of points (i.e., the same mean and variance) as the sample for A and B. Find other non-Gaussian distributions such that the expected c1 and c2 locations under those distributions produce an r(x) less than (part C) and greater than (part D) the r(x) for the Guassian distribution [for the same observation].
Anjum GuptaPerson was signed in when posted  18
02-17-2004 09:03 PM ET (US)
A question on part C) and D) of problem 2:
It is asking us to find a distribution for which E(r(x)) will be less or more than the one under H* (Null hypothesis). But wouldn't E(r(x) depend on the parameters of a distibution? For example: E(r(x)) will be very big under H* if x ~ N(200,500) but will not be that big if x~N(0,1). So a uniform distribution from [0,1] may have smaller E(r(x)) than the later case of H* but may be larger than N(200,500) case.
Am I reading the problem wrong?
Thanks.
Jay  17
02-17-2004 03:22 AM ET (US)
Anjum:

My bad. It is so obvious. SUM(sigma_i) is not expected to be 0. IT IS ZERO.

Sometimes you just get lost after you stare at it for too long. :)
Anjum GuptaPerson was signed in when posted  16
02-17-2004 01:56 AM ET (US)
Comment on /m15: Jay, Sum(sigma_i) is assumed to be 0, since it is Sum(O_i - e_i), which has expected value of zero. I think that is the assumption at work here.
Jay  15
02-17-2004 12:10 AM ET (US)
When we derive Pearson Chi^2 test from LRT Chi^2, we have:
2*SUM[(e_i+sigma_i)*(sigma_i/e_i - sigma_i^2/2*e_i^2)]
= 2*SUM[sigma_i-(sigma^2/2*e_i^2)+(sigma^2/e_i^2)-...]
~SUM[(o_i-e_i)^2/e_i].

Somehow, we got rid of term sigma_i. It seems we assume SUM(sigma_i)=0. Why? Am I missing some assumption here?
Charles ElkanPerson was signed in when posted  14
02-16-2004 11:48 PM ET (US)
Answer to /m12: Your reasoning about 4 degrees of freedom seems right to me. Have you repeated your experiment often enough to be sure that the empirical d.f. is 5?
Charles ElkanPerson was signed in when posted  13
02-16-2004 11:45 PM ET (US)
Answer to /m11: Yes, your understanding is right.
Anjum GuptaPerson was signed in when posted  12
02-16-2004 10:28 AM ET (US)
Edited by author 02-16-2004 02:11 PM
Thanks for all your responses.
One more question regarding degrees of freedom --
i) I am generating a multinomial distribution using a fair dice. My null hypothesis is that dice is a fair dice, meaning I have one parameter to estimate with a true value of 1/6. Alterate hypothesis is that each number has different probability of coming up, so total of (6-1) parameters to estimate. From Prof. Weber's notes, the degree of freedom of the chi-squared statistic should be (k-1)-p that is (6-1)-1 and that is 4. But my statistic is following chi-squared distribution with d.f. 5. What is wrong? I am using expression (2) from Prof. Weber's notes to compute my statitics value.


btw: I came across the following paper, a readable and relavant paper. http://www.ecb.int/pub/wp/ecbwp083.pdf
Michael  11
02-16-2004 12:31 AM ET (US)
I'm having a little trouble being sure I understand the charts from the thesis. Any clarification is appreciated:

1. Charts 5.4 and 5.5 merely state the relative significance between the Pearson's chi-squared stat and the LRT under justified and unjustified normal approximation scenarios respectively. These charts do not necessarily show how well either statistic follow chi-square distributions.

2. Chart 5.6 shows how well the LRT stat follows the theoretical chi-square distribution. It does not show how well the Pearson's stat follows the chi-squared distribution.

Is my understanding correct? I just want to make sure that I am not missing something or reading too much out of the examples. Thanks.
Charles Elkan  10
02-15-2004 09:39 PM ET (US)
Answers to /m6:

i) When O_i is zero, assume that 0*log 0 is zero. We discussed this question in class on Thursday.

ii) Tests of independence are really tests whether observed data were generated by a particular multinomial.

Note that different null hypotheses give different tests, but they all have the same test statistic

Because the tests we use are LRT tests, they involve the chi-squared distribution, so they are called chi-squared tests.

The Pearson test is derived from the LRT test by takeing the Taylor approximation of the log function.

Dunning says that the Pearson test can be derived from the LRT test by using a Gaussian to approximate a binomial. However, he does not explain this in detail.
 
iii) SUM 2O_i + e_i = 3n because n = SUM e_i and also n = SUM O_i
Charles Elkan  9
02-15-2004 09:30 PM ET (US)
Answer to /m4: We can calculate r(x) for any distribution, with centers that depend on that distribution.

The question is: find a non-Gaussian distribution that appears even more spread-out than a Gaussian with same mean and variance.
Nuno  8
02-15-2004 09:24 PM ET (US)
Jay:

Thanks for your reply but the problem was that I was not understanding the question itself. Re-reading it now it makes more sense if I interpret it as "Find a non-Gaussian distribution _hypothesis_ ...". I was only looking for a non-Gaussian distribution of the sample points.

Anjum:

i) We saw in class that you should ignore the terms 0log(0)

ii) Dunning makes a good point of clearly separating Pearson's chi-square statistic from any other chi-square statistic. In short, Pearson's formulation is an approximation to the LRT score when the binomial can be approximated by a normal distribution and you're testing for independent binomials vs a generic multinomial. I do recommend chapters 4 and 5 of Dunning's thesis - very clear and readable.

Nuno
Jay  7
02-15-2004 02:28 PM ET (US)
Anjum:

I think the test in the two articles are different.

In the note, we are testing if given samples are from a binomial random process with certain p.

In the thesis, we are testing if two samples are from two independent binomial random processes.
Anjum GuptaPerson was signed in when posted  6
02-15-2004 11:34 AM ET (US)
Edited by author 02-15-2004 12:42 PM
I have a few questions --
i) How do we deal with the cases when O_i is zero (observed number of events is zero). This is the same question asked by Jay.

ii) I think in general I am bit confused about the relationship between chi-squared test of indepenence statistic and Pearson chi-squared statistic. Any one line clarification to this confusion? Also am I right in thinking that if you use normal distribution approximation for a binomial distribution then the LRT statistic boils down to the Pearson chi-squared statistic?

iii) In Prof. Weber's notes, I am not clear how expression (3) is derived. How is it that 2*O_i + e_i is replaced by 2n + n?

Thanks.
Jay  5
02-14-2004 02:35 AM ET (US)
Don't just think about one or two guassian. Think more.
Nuno  4
02-13-2004 08:55 PM ET (US)
I don't quite understand question 2d). Under H_0 we have a predetermined positioning of the centers c1 & c2 that gives us the smallest r(x)'s - part a).

But under H_1 (any distribution for X) we can place the centers anywhere, including the optimal positions that minimize r(X).

Therefore how can r(X) ever be larger under H_1 than under H_0?

Maybe I'm missing something?
Jay  3
02-11-2004 12:22 AM ET (US)
I guess we just ignore it, since we assume d << e. That's why using X^2 to approximate LRT only stands when d << e.
Andrew  2
02-11-2004 12:13 AM ET (US)
In the derivation of the Pearson X^2 statistic, we used the taylor expansion of ln(1+x), and we are left with

 n
Sum [ d^2/e - d^3/e^2 ]
i=1

but the Pearson X^2 statistic is
 n
Sum
i=1 [ d^2/e ]

My question is where did the d^3/e^2 term go?

Since it is sometimes positive and sometimes negative did we assume/hope it would cancel its self out?
Jay  1
02-11-2004 12:07 AM ET (US)
In the LRT, 2Log(L(H0)/L(H1)) = 2Sum_i(o_i * Log(o_i/e_i)).
How should we handle cases where o_i equals to zero?
RSS link What's this?
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2008 Internicity Inc. All rights reserved.