QuickTopic (SM) free message boards QuickTopic (SM) free message boards
Skip to Messages
  Sign In to access your topic list  |New Topic |My Topics|Profile
Upgrade to Pro   Customize, show pictures, add an intro, and more:   QuickTopic Pro...and check out QuickThreadSM
Topic: Shrinking Trees
Branched from topic: Adaptive Probabilistic Networks with Hidden Variables
Views: 1095, Unique: 600 
Subscribers: 1
What's
this?
Printer-Friendly Page
Subscribe to get & post, or stop messages by email Subscribe
All messages            0-11 of 11        
About these ads
Who | When
Messagessort recent-bottom   
Post a new message
 
Rosamel  11
07-21-2006 04:23 PM ET (US)
Hi, regards from Germany! Pretty good discussion! Please support us by visiting norgestimate and ethinyl estradiol webpage devoted to norgestimate and ethinyl estradiol. metformin glucophage webpage devoted to metformin glucophage.
   10
07-21-2006 02:22 AM ET (US)
Deleted by topic administrator 07-21-2006 08:57 AM
Neil Jones  9
04-28-2003 05:41 PM ET (US)
There appear to be 4 citations to this paper in Citeseer, so I suspect this hasn't
been developed fully, even though there are a couple of hints in the paper that someone should do so.

I think the authors point out a parallel between a nonrecursive shrinking procedure (which is not what they're suggesting) and lasso regression. The connection between recursive shrinking and lasso regression is harder to see, in my opinion. The paper points out that the lasso regression approach is different than the recursive approach because, in the former case, the shrinking of an effect is the same for two leaves at different depths:
      O
     / <- effect "j"
    O
   / \ "*" -- leaves
  * O "O" -- splits
  1 / * *
    2 3

In nonrecursive shrinking, the weight of effect j is the same for the values of 1, 2, and 3. In recursive shrinking, the value of effect "j" is larger for leaves 2 and 3 than for leaf 1.

..Neil

>
< replied-to message removed by QT >
Degui Zhi  8
04-28-2003 05:14 PM ET (US)
Edited by author 04-28-2003 05:18 PM
/m6
They mentioned that their shrinkage method for trees is analogical to the shrinkage method for linear models (ridge, lasso, et cetera). Actually many of their derivations are based on the same algebra as linear models.

I wonder what are the follow-ups of the work.
Neil Jones  7
04-28-2003 02:17 PM ET (US)
Unfortunately, there are some peculiarities with those figures that I can't readily explain. For example, the legend to the figure on page 3 says "the numbers below the terminal or leaf nodes are standard errors of the estimates." What about the numbers beneath the non-terminal nodes? (Presumably that's a typo.)

Without knowing specifically where those numbers-beneath-the-nodes came from, I speculate that "the standard error of the estimate" at node t is the standard deviation of the mean calculated for node t (the sample mean is the estimate of the true mean) as measured through 10-fold cross validation. The "standard error" actually increases down the tree, as you'd expect, since the sample size used to generate the estimate is becoming smaller. In other words, with one training set and one test set you calculate the mean. Then repeat this with different test sets and training sets; this will create a distribution, and the number beneath the node is the standard deviation of that distribution. I do not think that the numbers beneath the nodes are related to how much that node contributes to the training error, R(T).

But like I said, this is speculation on my part.
Coleman Mosley  6
04-28-2003 01:23 PM ET (US)
From the equations and description given, this seems very similar to Ridge regression appled to the tree. Is this a reasonable approximation?
Ari Frank  5
04-28-2003 03:10 AM ET (US)
Its been a while since a played with trees so can anybody refresh my memory as to what exactly are the standard errors of the estimates (the numbers below the terminal or leaf nodes in the figures). In particular, why do they decrease as we go down in the tree? (this seems to be counter intuitive to me).
Thanks,
Ari.
Neil Jones  4
04-25-2003 04:58 PM ET (US)
I cannot explain why the authors chose to put this graphic in that place. Perhaps the fact that this is an unpublished tech memo means that it didn't go through extensive revisions (there are some typos, for example). Regardless, the figure does show a few important things:
   - a shrunken tree has the same topology as a non-shrunken
     one
   - as you move down in a shrunken tree, the nodes get more
     and more homogenous
   - the standard errors increase as you move down in a
     tree (shrunken or otherwise), but the shrunken versions
     have smaller standard error at a given leaf node
What would have been particularly nice to have is a cardinality of observations in each node. There is no way to reconstruct that information from this figure or the paper.

The "size" of a shrunken tree is defined in section 4 of the paper, and is related to a linear regression model that one can create *given* both the training data and the tree. I think this measure of size does not have any particular justification behind it other than a demonstration that the boundary conditions (non-shrunken tree has size of $K$, and a fully-shrunken tree has size $1$) hold. Basically, they are asserting that the size of the tree is the size of the linear regression model that represents the tree.
Bryant Forsgren  3
04-25-2003 04:20 PM ET (US)
I just noticed that the number of effective leaf nodes of a shrunken tree is defined later in the paper. The definition does not resemble my intuitive idea of what it would be, so I presume that four is the correct number for the tree. Still, it is confusing to make use of a concept such as this and leave it undefined for several pages.

-Bryant
Bryant Forsgren  2
04-25-2003 03:00 PM ET (US)
I haven't fully digested the paper yet, but I'd like to say that I believe figure 3 is unnecessarily confusing and distracting.

First off, on page 6 \hat{y}(node;\Theta) is given as a function of a node's "unshrunken" value and its parent's "shrunken" value. But it is obvious that no single value of \Theta could have produced the tree in figure 3. Not until several sections later will the reader realize that this tree was probably generated using the "optimal shrinking" procedure, in which a different \Theta may be used for each node. It would have been a simple matter to either include an image which is consistent with the simple definition of recursive shrinking, or (even easier) make a note that the image was generated using a more sophisticated procedure.

The other confusing thing about this figure is that the caption claims there are four effective leaf nodes. The word "effective" is not defined anywhere in the paper. The most intuitive definition (at least to me) would be that an effective leaf node is any node (not necessarily a leaf) whose descendants all share its value, and which has no ancestors which are effective leaf nodes. According to this definition, there are six effective leaf nodes in figure 3, not four.

-Bryant
Neil Jones  1
04-21-2003 01:59 AM ET (US)
This paper makes frequent reference to Breiman, et al. from 1984. This is a book, and it is not available online. I believe I have the only copy from the library; if you want to see it, please let me know.

(The Breiman book does not contain any information critical to the shrinkage approach.)
RSS link What's this?
All messages            0-11 of 11        
QuickTopicSM message boards
Over 200,000 topics served
Learn more Frequently asked questions  Acknowledgements
What they're saying about QuickTopic
 Questions, comments, or suggestions? Contact Us
Read our use policy before beginning. We value your privacy; please read our privacy statement.
Copyright ©1999-2006 Internicity Inc. All rights reserved.