| Gyozo Gidofavi
|
9
|
 |
|
09-27-2001 12:32 PM ET (US)
|
|
Edited by author 09-27-2001 12:53 PM
This is a reply to the general question asked by Dave Kauchak about the number "n". As far as I know, as a general rule of thumb, if you want to cover X% of the variance in the data set, the selection of "n" can be as follows: select "n" such that: (sum((i=1 to n) eigen-value(i)) / sum(all eigen-values)) = X/100. In other words select the first "n" eigen-vector components such that the normalized cumulative sum of the corresponding eigen-values is X/100, if X was a percentage. Normally X is usually 90% but this is can be really problem specific and in most cases is determined experimentally. Finally Im not certain about whether larger number of eigen-vector components always give better results, in fact after a certain point to my understanding it can severely degrade performance. Once again, let me repeat that the method for selection of "n" described above is a general rule of thumb.
|