| Nikola Jevtic
|
1
|
 |
|
05-11-2002 05:43 AM ET (US)
|
|
I will try to be short but precise. The results mentioned in the paper are not at all surprising. If a problem is defined as compressing an iid source of unknown distribution, and as a performance measure is used the worst redundancy (extra # of bits on top of the the entropy of the source) when the adversary gets to choose both x^n and the source parameters, then: provably the smallest achievable redundancy is: (k-1)/2*log(n/2pi)+C(k)+o(1). As far as "add 1/2" rule is concerned, if all the symbol counts (ni) grow linearly with n, redundancy of add 1/2 is asymptotically optimal (is equal to the above). However at the vertex points (sun always rises), it is slightly worse than the optimal coding scheme, and the redundancy is: (k-1)/2*log(n/pi)+C(k)+o(1). What Ristad is saying is that it is (close to) optimal when the number of symbol is k, but if it were smaller maybe we are not comparing it to the right "best possible performance". It is absolutely correct, but perhaps trivial. by applying his scheme to add 1/2 rule we could : 1) send log(k) bits to say how many symbols we saw in x^n 2) send log(choose(k,q)) bits to describe the active alphabet 3) apply add 1/2 rule to the reduced alphabet, and this time have redundancy be bounded by : (q-1)/2*log(n/pi)+C(k)+o(1) (+log k + log choose(k,q))
If someone thinks it happens rarely that alphabet is smaller and that we pay too much extra (log k) if there is nothing to be saved we can just send 1 bit to say if savings is possible and if so, then send log k ...
Performance analysis: If we compare probabilities he assigns to sequences and those that we do by using modified add 1/2 rule it can be shown that if all symbols among the active q grow linearly with n, there is a constant factor between the two methods. However if any of the {ni} grows slower then linear in n, modified add 1/2 rule beats his scheme polynomially in n. (Note that it is impossible to give higher probability to all the sequences, as they all need to add up to 1.)
Experiments: It has not been (perhaps) said clear enough that experimental results may only be vaguely correlated with the theoretical analysis, as block probabilities and derived conditional probabilities may differ at will. (More precisely original block probabilities that he calls priors, and block probabilities for x^n computed as a sequence of conditional probabilites). Or at least no reason to believe differently was provided. This said it would be interesting to see how well would 'modified add 1/2 rule' perform because it could go either way.
Philosophical aspects: In the paper it was not explicitly said what we expect of the method. (E.g. to learn iid source) which then allows for bogus examples like broken random number generator. In reality, one can not hope to be able to learn 'any' source. It is only possible to restrict to a certain set of sources (e.g. iid, markov...) and try to be close to the best source in the closed set. If the set is unrestricted the worst case redundancy is always infinite.
|