| Kristin Branson
|
3
|
 |
|
04-04-2002 04:00 AM ET (US)
|
|
In response to Greg's question, I think the iteration in Figures 6 and 7 is the value iteration number (described in equation (4)). Iteration 0 represents the estimate of the Q* value function by Q_0, iteration 1 represents the estimate of the Q* value function by Q_1, ...
I thought that this paper thoroughly presented an interesting approach to this problem. Not being all that familiar with the targeted marketing problem, I would have appreciated a more in depth motivation for the algorithm (other than the results), i.e. some demonstration of how hurtful the assumptions made in single-event targeted marketing could be. However, I am impressed that Bianca was able to do this all in just a few months -- it seems like applying reinforcement learning to this application requires a lot of thought and attention to detail.
I am having trouble figuring out exactly what constitutes a single state. With the assumption of a finite number of states but continuous valued probability and amount of donation functions, as well as a large number of customer profiles that will change with time in different ways depending on what actions are taken, it seems like the number of states necessary to represent all this information would be huge. Probably, there is some simplification that I missed that explains how the number of states is kept tractible. An example of a state would be very helpful to me.
Finally, I have a qualm with the introduction to reinforcement learning and MDP's (section 2.1). I think equation (2) is missing some mention of the reward for performing action a in state s at the initial time, if it is to be consistent with equation (3). I also think section 2.1 requires a mention/emphasis of the assumption that, for MDP's, the state, action, and reward at time t+1 only depends on the state at time t.
|