Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Here is another view, which also explains where the logarithm comes from.

The KL between two distributions of a random variable, say Kl[p|q], says that if you made a perfect compression algorithm for samples from distribution q, how many extra bits/nats you expect to need to code samples that actually come from p instead if you use that compression algorithm.

And compression is all about keeping only the true information that is encoded in a sample.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: