nltk :: probability :: FreqDist :: Class FreqDist
[hide private]
[frames] | no frames]

Class FreqDist

object --+    
         |    
      dict --+
             |
            FreqDist
Known Subclasses:

A frequency distribution for the outcomes of an experiment. A frequency distribution records the number of times each outcome of an experiment has occurred. For example, a frequency distribution could be used to record the frequency of each word type in a document. Formally, a frequency distribution can be defined as a function mapping from each sample to the number of times that sample occurred as an outcome.

Frequency distributions are generally constructed by running a number of experiments, and incrementing the count for a sample every time it is an outcome of an experiment. For example, the following code will produce a frequency distribution that encodes how often each word occurs in a text:

>>> fdist = FreqDist()
>>> for word in tokenize.whitespace(sent):
...    fdist.inc(word.lower())

An equivalent way to do this is with the initializer:

>>> fdist = FreqDist(word.lower() for word in tokenize.whitespace(sent))
Instance Methods [hide private]
int
B(self)
Returns: The total number of sample values (or bins) that have counts greater than zero.
int
N(self)
Returns: The total number of sample outcomes that have been recorded by this FreqDist.
int
Nr(self, r, bins=None)
Returns: The number of samples with count r.
 
__add__(self, other)
 
__eq__(self, other)
x==y
 
__ge__(self, other)
x>=y
 
__getitem__(self, sample)
x[y]
 
__gt__(self, other)
x>y
new empty dictionary

__init__(self, samples=None)
Construct a new frequency distribution.
iter
__iter__(self)
Return the samples sorted in decreasing order of frequency.
 
__le__(self, other)
x<=y
 
__lt__(self, other)
x<y
 
__ne__(self, other)
x!=y
string
__repr__(self)
Returns: A string representation of this FreqDist.
None
__setitem__(self, sample, value)
Set this FreqDist's count for the given sample.
string
__str__(self)
Returns: A string representation of this FreqDist.
 
_cache_Nr_values(self)
list of float
_cumulative_frequencies(self, samples=None)
Return the cumulative frequencies of the specified samples.
 
_reset_caches(self)
 
_sort_keys_by_value(self)
None
clear(self)
Remove all items from D.
FreqDist
copy(self)
Create a copy of this frequency distribution.
int
count(self, sample)
Return the count of a given sample.
float
freq(self, sample)
Return the frequency of a given sample.
list
hapaxes(self)
Returns: A list of all samples that occur once (hapax legomena)
None
inc(self, sample, count=1)
Increment this FreqDist's count for the given sample.
list of tuple
items(self)
Return the items sorted in decreasing order of frequency.
iter of any
iteritems(self)
Return the items sorted in decreasing order of frequency.
iter
iterkeys(self)
Return the samples sorted in decreasing order of frequency.
iter
itervalues(self)
Return the values sorted in decreasing order.
list of any
keys(self)
Return the samples sorted in decreasing order of frequency.
any or None
max(self)
Return the sample with the greatest number of outcomes in this frequency distribution.
 
plot(self, *args, **kwargs)
Plot samples from the frequency distribution displaying the most frequent sample first.
v, remove specified key and return the corresponding value
pop(self, other)
If key is not found, d is returned if given, otherwise KeyError is raised
(k, v), remove and return some (key, value) pair as a
popitem(self, other)
2-tuple; but raise KeyError if D is empty.
list
samples(self)
Returns: A list of all samples that have been recorded as outcomes by this frequency distribution.
 
sorted(self)
 
sorted_samples(self)
 
tabulate(self, *args, **kwargs)
Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first.
None
update(self, samples)
Update the frequency distribution with the provided list of samples.
list of any
values(self)
Return the samples sorted in decreasing order of frequency.

Inherited from dict: __cmp__, __contains__, __delitem__, __getattribute__, __len__, __new__, __sizeof__, fromkeys, get, has_key, setdefault, viewitems, viewkeys, viewvalues

Inherited from object: __delattr__, __format__, __reduce__, __reduce_ex__, __setattr__, __subclasshook__

Class Variables [hide private]

Inherited from dict: __hash__

Properties [hide private]

Inherited from object: __class__

Method Details [hide private]

B(self)

 
Returns: int
The total number of sample values (or bins) that have counts greater than zero. For the total number of sample outcomes recorded, use FreqDist.N(). (FreqDist.B() is the same as len(FreqDist).)

N(self)

 
Returns: int
The total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B().

Nr(self, r, bins=None)

 
Parameters:
  • r (int) - A sample count.
  • bins (int) - The number of possible sample outcomes. bins is used to calculate Nr(0). In particular, Nr(0) is bins-self.B(). If bins is not specified, it defaults to self.B() (so Nr(0) will be 0).
Returns: int
The number of samples with count r.

__eq__(self, other)
(Equality operator)

 

x==y

Overrides: dict.__eq__
(inherited documentation)

__ge__(self, other)
(Greater-than-or-equals operator)

 

x>=y

Overrides: dict.__ge__
(inherited documentation)

__getitem__(self, sample)
(Indexing operator)

 

x[y]

Overrides: dict.__getitem__
(inherited documentation)

__gt__(self, other)
(Greater-than operator)

 

x>y

Overrides: dict.__gt__
(inherited documentation)

__init__(self, samples=None)
(Constructor)

 

Construct a new frequency distribution. If samples is given, then the frequency distribution will be initialized with the count of each object in samples; otherwise, it will be initialized to be empty.

In particular, FreqDist() returns an empty frequency distribution; and FreqDist(samples) first creates an empty frequency distribution, and then calls update with the list samples.

Parameters:
  • samples (Sequence) - The samples to initialize the frequency distribution with.
Returns:
new empty dictionary

Overrides: object.__init__

__iter__(self)

 

Return the samples sorted in decreasing order of frequency.

Returns: iter
An iterator over the samples, in sorted order
Overrides: dict.__iter__

__le__(self, other)
(Less-than-or-equals operator)

 

x<=y

Overrides: dict.__le__
(inherited documentation)

__lt__(self, other)
(Less-than operator)

 

x<y

Overrides: dict.__lt__
(inherited documentation)

__ne__(self, other)

 

x!=y

Overrides: dict.__ne__
(inherited documentation)

__repr__(self)
(Representation operator)

 

repr(x)

Returns: string
A string representation of this FreqDist.
Overrides: object.__repr__

__setitem__(self, sample, value)
(Index assignment operator)

 

Set this FreqDist's count for the given sample.

Parameters:
  • sample (any hashable object) - The sample whose count should be incremented.
  • count (int) - The new value for the sample's count
Returns: None
Raises:
  • TypeError - If sample is not a supported sample type.
Overrides: dict.__setitem__

__str__(self)
(Informal representation operator)

 

str(x)

Returns: string
A string representation of this FreqDist.
Overrides: object.__str__

_cumulative_frequencies(self, samples=None)

 

Return the cumulative frequencies of the specified samples. If no samples are specified, all counts are returned, starting with the largest.

Parameters:
  • samples - the samples whose frequencies should be returned.
  • sample (any.)
Returns: list of float
The cumulative frequencies of the given samples.

clear(self)

 

Remove all items from D.

Returns: None
Overrides: dict.clear
(inherited documentation)

copy(self)

 

Create a copy of this frequency distribution.

Returns: FreqDist
A copy of this frequency distribution object.
Overrides: dict.copy

count(self, sample)

 

Return the count of a given sample. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Counts are non-negative integers. This method has been replaced by conventional dictionary indexing; use fd[item] instead of fd.count(item).

Parameters:
  • sample (any.) - the sample whose count should be returned.
Returns: int
The count of a given sample.

freq(self, sample)

 

Return the frequency of a given sample. The frequency of a sample is defined as the count of that sample divided by the total number of sample outcomes that have been recorded by this FreqDist. The count of a sample is defined as the number of times that sample outcome was recorded by this FreqDist. Frequencies are always real numbers in the range [0, 1].

Parameters:
  • sample (any) - the sample whose frequency should be returned.
Returns: float
The frequency of a given sample.

hapaxes(self)

 
Returns: list
A list of all samples that occur once (hapax legomena)

inc(self, sample, count=1)

 

Increment this FreqDist's count for the given sample.

Parameters:
  • sample (any) - The sample whose count should be incremented.
  • count (int) - The amount to increment the sample's count by.
Returns: None
Raises:
  • NotImplementedError - If sample is not a supported sample type.

items(self)

 

Return the items sorted in decreasing order of frequency.

Returns: list of tuple
A list of items, in sorted order
Overrides: dict.items

iteritems(self)

 

Return the items sorted in decreasing order of frequency.

Returns: iter of any
An iterator over the items, in sorted order
Overrides: dict.iteritems

iterkeys(self)

 

Return the samples sorted in decreasing order of frequency.

Returns: iter
An iterator over the samples, in sorted order
Overrides: dict.iterkeys

itervalues(self)

 

Return the values sorted in decreasing order.

Returns: iter
An iterator over the values, in sorted order
Overrides: dict.itervalues

keys(self)

 

Return the samples sorted in decreasing order of frequency.

Returns: list of any
A list of samples, in sorted order
Overrides: dict.keys

max(self)

 

Return the sample with the greatest number of outcomes in this frequency distribution. If two or more samples have the same number of outcomes, return one of them; which sample is returned is undefined. If no outcomes have occurred in this frequency distribution, return None.

Returns: any or None
The sample with the maximum number of outcomes in this frequency distribution.

plot(self, *args, **kwargs)

 

Plot samples from the frequency distribution displaying the most frequent sample first. If an integer parameter is supplied, stop after this many samples have been plotted. If two integer parameters m, n are supplied, plot a subset of the samples, beginning with m and stopping at n-1. For a cumulative plot, specify cumulative=True. (Requires Matplotlib to be installed.)

Parameters:
  • title (str) - The title for the graph
  • cumulative - A flag to specify whether the plot is cumulative (default = False)
  • num (int) - The maximum number of samples to plot (default=50). Specify num=0 to get all samples (slow).

pop(self, other)

 

If key is not found, d is returned if given, otherwise KeyError is raised

Returns: v, remove specified key and return the corresponding value
Overrides: dict.pop
(inherited documentation)

popitem(self, other)

 

2-tuple; but raise KeyError if D is empty.

Returns: (k, v), remove and return some (key, value) pair as a
Overrides: dict.popitem
(inherited documentation)

samples(self)

 
Returns: list
A list of all samples that have been recorded as outcomes by this frequency distribution. Use count() to determine the count for each sample.

tabulate(self, *args, **kwargs)

 

Tabulate the given samples from the frequency distribution (cumulative), displaying the most frequent sample first. (Requires Matplotlib to be installed.)

Parameters:
  • samples (list) - The samples to plot (default is all samples)
  • title (str) - The title for the graph
  • num (int) - The maximum number of samples to plot (default=50). Specify num=0 to get all samples (slow).

update(self, samples)

 

Update the frequency distribution with the provided list of samples. This is a faster way to add multiple samples to the distribution.

Parameters:
  • samples (list) - The samples to add.
Returns: None
Overrides: dict.update

values(self)

 

Return the samples sorted in decreasing order of frequency.

Returns: list of any
A list of samples, in sorted order
Overrides: dict.values