nltk :: probability :: GoodTuringProbDist :: Class GoodTuringProbDist
[hide private]
[frames] | no frames]

Class GoodTuringProbDist

object --+    
         |    
 ProbDistI --+
             |
            GoodTuringProbDist

The Good-Turing estimate of a probability distribution. This method calculates the probability mass to assign to events with zero or low counts based on the number of events with higher counts. It does so by using the smoothed count c*:

where c is the original count, N(i) is the number of event types observed with count i. We can think the count of unseen as the count of frequency one. (see Jurafsky & Martin 2nd Edition, p101)

Instance Methods [hide private]
 
__init__(self, freqdist, bins=None)
x.__init__(...) initializes x; see help(type(x)) for signature
string
__repr__(self)
Returns: A string representation of this ProbDist.
float
discount(self)
Returns: The probability mass transferred from the seen samples to the unseen samples.
 
freqdist(self)
any
max(self)
Returns: the sample with the greatest probability.
float
prob(self, sample)
Returns: the probability for a given sample.
list
samples(self)
Returns: A list of all samples that have nonzero probabilities.

Inherited from ProbDistI: generate, logprob

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Variables [hide private]

Inherited from ProbDistI: SUM_TO_ONE

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

__init__(self, freqdist, bins=None)
(Constructor)

 

x.__init__(...) initializes x; see help(type(x)) for signature

Parameters:
  • freqdist (FreqDist) - The frequency counts upon which to base the estimation.
  • bins (Int) - The number of possible event types. This must be at least as large as the number of bins in the freqdist. If None, then it's assumed to be equal to that of the freqdist
Overrides: object.__init__

__repr__(self)
(Representation operator)

 

repr(x)

Returns: string
A string representation of this ProbDist.
Overrides: object.__repr__

discount(self)

 
Returns: float
The probability mass transferred from the seen samples to the unseen samples.
Overrides: ProbDistI.discount

max(self)

 
Returns: any
the sample with the greatest probability. If two or more samples have the same probability, return one of them; which sample is returned is undefined.
Overrides: ProbDistI.max
(inherited documentation)

prob(self, sample)

 
Parameters:
  • sample - The sample whose probability should be returned.
Returns: float
the probability for a given sample. Probabilities are always real numbers in the range [0, 1].
Overrides: ProbDistI.prob
(inherited documentation)

samples(self)

 
Returns: list
A list of all samples that have nonzero probabilities. Use prob to find the probability of each sample.
Overrides: ProbDistI.samples
(inherited documentation)