Relevance model 3: probability for a set of words - Victor Lavrenko - AI深度探索:從機器學習到NLP精通 - Cupoy
How do we estimate a joint probability of observing a set of words? We cannot count the frequency (i...
How do we estimate a joint probability of observing a set of words? We cannot count the frequency (it'll be zero for a large set), and should not assume that the words are independent (pointless). Instead, we assume the words are exchangeable (order-invariant), and apply the de-Finetti theorem, which provides a convenient form for the joint distribution.