| Dave Kauchak
|
3
|
 |
|
04-16-2001 03:15 AM ET (US)
|
|
I thought the paper did a relatively good job of selecting and analyzing a number of diverse situations. The paper not only tries to show that TSVMs are better for the problem at hand, they also explain a number of different variables that might be varied.
More explicitly, to show that TSVMs were better, the paper presented test averages with training sets ranging from 7 to 120 and test sets from 3,299 to 10,00. The paper went on, however, to show the effect of varying the sizes of these two different sets.
However, there were a number of factors with respect to testing that I felt the paper could have explored better. First, all of the examples that were chosen represent tests containing a relatively small set of categories. In fact, many of the categories were actually trimmed for experimentation (for example in the case of the Reuters database, there were 135 potential but on the 10 most frequent were chosen). I would like to have seen more justification for this size reduction and also the effect that category size has on the effectiveness of the algorithm.
Second, the paper mentions that there are a number of parameters that are input in to the algorithm by the user. These seem to be only mentioned in the algorithm description section. There is no mention of these tuning parameters in the actual experimentation section. I would be curious what effect these parameters have on the effectiveness of the TSVM and also how difficult it is to fine-tune these parameters.
|