Package jazzparser :: Package utils :: Package nltk :: Module probability :: Class CutoffFreqDist

Class CutoffFreqDist

           object --+        
                    |        
                 dict --+    
                        |    
nltk.probability.FreqDist --+
                            |
                           CutoffFreqDist

Like FreqDist, but returns zero counts for everything with a count less than a given cutoff. Also adjusts the total count to account for the lost counts.

Instance Methods

[hide private]

new empty dictionary

__init__(self, cutoff, *args, **kwargs)
Construct a new frequency distribution.

source code

__getitem__(self, key)
x[y]

source code

raw_counts(self)
Returns the raw counts (i.e.

source code

raw_count(self, sample)
Returns the raw count of this sample (doesn't apply a cutoff).

source code

int

N(self)
Returns: The total number of sample outcomes that have been recorded by this FreqDist. source code

int

B(self)
This is slightly more complicated than the superclass, because we want to count only samples that have non-zero counts after the cutoff has been applied.

source code

__len__(self)
len(x)

source code

float

freq(self, sample)
Have to override this because the superclass doesn't use N(), but the internal _N to calculate the frequency.

source code

FreqDist

copy(self)
Create a copy of this frequency distribution.

source code

__add__(self, other)
Returns a CutoffFreqDist like this one, but with counts from the other added.

source code

_reset_caches(self)
Add our own caches to the superclass'

source code

lost_N(self)
The number of counts lost by applying the cutoff

source code

_get_cutoff(self)
Make cutoff a read-only attribute

source code

_sort_keys_by_value(self)
Need to override this because dict.items(self) accesses the non-cutoff values.

source code

Inherited from nltk.probability.FreqDist: Nr, __eq__, __ge__, __gt__, __iter__, __le__, __lt__, __ne__, __repr__, __setitem__, __str__, clear, count, hapaxes, inc, items, iteritems, iterkeys, itervalues, keys, max, plot, pop, popitem, samples, sorted, sorted_samples, tabulate, update, values

Inherited from nltk.probability.FreqDist (private): _cache_Nr_values, _cumulative_frequencies

Inherited from dict: __cmp__, __contains__, __delitem__, __getattribute__, __new__, __sizeof__, fromkeys, get, has_key, setdefault, viewitems, viewkeys, viewvalues

Inherited from object: __delattr__, __format__, __reduce__, __reduce_ex__, __setattr__, __subclasshook__

Class Variables

[hide private]

Inherited from dict: __hash__

Properties

[hide private]

cutoff
Make cutoff a read-only attribute

Inherited from object: __class__

Method Details

[hide private]

init(self, cutoff, *args, **kwargs)
(Constructor)

source code

Construct a new frequency distribution. If samples is given, then the frequency distribution will be initialized with the count of each object in samples; otherwise, it will be initialized to be empty.

In particular, FreqDist() returns an empty frequency distribution; and FreqDist(samples) first creates an empty frequency distribution, and then calls update with the list samples.

Parameters:

samples - The samples to initialize the frequency distribution with.

Returns: new empty dictionary

Overrides: object.__init__

(inherited documentation)

getitem(self, key)
(Indexing operator)

source code

x[y]

Overrides: dict.__getitem__: (inherited documentation)

raw_counts(self)

source code

Returns the raw counts (i.e. without the cutoff applied) as a dictionary. This could, for example, be used as init data to another FreqDist.

N(self)

source code

Returns: int: The total number of sample outcomes that have been recorded by this FreqDist. For the number of unique sample values (or bins) with counts greater than zero, use FreqDist.B().
Overrides: nltk.probability.FreqDist.N: (inherited documentation)

B(self)

source code

This is slightly more complicated than the superclass, because we want to count only samples that have non-zero counts after the cutoff has been applied.

Returns: int: The total number of sample values (or bins) that have counts greater than zero. For the total number of sample outcomes recorded, use FreqDist.N(). (FreqDist.B() is the same as len(FreqDist).)
Overrides: nltk.probability.FreqDist.B

len(self)
(Length operator)

source code

len(x)

Overrides: dict.__len__: (inherited documentation)

freq(self, sample)

source code

Have to override this because the superclass doesn't use N(), but the internal _N to calculate the frequency.

Parameters:

sample - the sample whose frequency should be returned.

Returns: float

The frequency of a given sample.

Overrides: nltk.probability.FreqDist.freq

copy(self)

source code

Create a copy of this frequency distribution.

Returns: FreqDist: A copy of this frequency distribution object.
Overrides: dict.copy: (inherited documentation)

add(self, other)
(Addition operator)

source code

Returns a CutoffFreqDist like this one, but with counts from the other added. The other may only be another CutoffFreqDist.

Overrides: nltk.probability.FreqDist.__add__

_reset_caches(self)

source code

Add our own caches to the superclass'

Overrides: nltk.probability.FreqDist._reset_caches

_sort_keys_by_value(self)

source code

Need to override this because dict.items(self) accesses the non-cutoff values.

Overrides: nltk.probability.FreqDist._sort_keys_by_value

Property Details

[hide private]

cutoff

Make cutoff a read-only attribute

Get Method:: _get_cutoff(self) - Make cutoff a read-only attribute

Class CutoffFreqDist

__init__(self, cutoff, *args, **kwargs) (Constructor)

__getitem__(self, key) (Indexing operator)

raw_counts(self)

N(self)

B(self)

__len__(self) (Length operator)

freq(self, sample)

copy(self)

__add__(self, other) (Addition operator)

_reset_caches(self)

_sort_keys_by_value(self)

cutoff

init(self, cutoff, *args, **kwargs)
(Constructor)

getitem(self, key)
(Indexing operator)

len(self)
(Length operator)

add(self, other)
(Addition operator)