| Kristin Branson
|
3
|
 |
|
04-30-2002 03:06 PM ET (US)
|
|
Dave, from my understanding, a reference partition is just a partition of the data into its classes. So S_1 is the data with label 1 and S_C is the data with label C.
I think the authors did use cross-validation of the training data to determine the optimal number of features to select, as shown in Figure 4. In terms of using cross-validation to test the algorithm, I can only assume that this is what is used. In particular, cross-validation is the only way to go if the number of training samples is limited, and the precision of their error measures suggest that they tested on a large number of samples. You can tell that they didnt test on the training set because of the dotted line graphs in Figure 3.
I am looking forward to today's presentation to clarify the different feature selection algorithms used. I'm not familiar with these algorithms, and am interested in learning about them. I'm not sure how creative the authors were in deciding to pile all these algorithms on top of each other (seems a bit ad hoc), but their justifications for the order in which they were applied is somewhat satisfying (at least for the Markov Blanket Filtering). I agree with Dave that the paper was very dense and full of information, making it hard for a novice like me to understand the algorithms.
I am confused about the specifics of the gene application, and was wondering if someone could explain it to me. From the introduction, I am guessing that a feature is a gene, and that there are 7130 genes collected from a single sample. What is a sample? Is it one human, for example? And what is an expression level?
Looking at Figure 3, it seems weird to me that under 10 featuers give optimal performance for logistic regression, while over 40 are required for KNN. Is there a property of KNN/logistic regression that explains this?
Overall, I think this paper is a good paper to present for this class, as it goes over a number of interesting algorithms. I think the paper would have benefited by losing the section on regularization methods, as it doesn't really fit with the rest of the paper. Then, there would have been more room for explanation.
I thought the experimental section was actually pretty good -- it had all the comparisons I wanted to see, though I agree that it was a bit vague about the details of the experiments. I also like that the authors tried different classification algorithms, as the interaction between classification algorithm and feature selection method may be hard to predict.
|