Module model
source code
Generic n-gram model implementation, using NLTK's probability
handling.
NLTK provides an n-gram POS tagger, but it can only assign the most
likely tag sequence to observations. It doesn't calculate probabilities,
so is no use for our supertagger component. Here I provide a generic
n-gram model, using NLTK's probability stuff.
Author:
Mark Granroth-Wilding <mark.granroth-wilding@ed.ac.uk>
|
NgramModel
A general n-gram model, trained on some labelled data.
|
|
PrecomputedNgramModel
Overrides parts of NgramModel to provide exactly the same interface,
but stores the precomputed transition matrix and uses this to
provide transition probabilities.
|
|
NgramError
|
|
|
|
sum_matrix_dims(matrix,
dims=2)
Takes an n-dimensional matrix and sums over all but the first
dims dimensions, returning a
dims -dimensional matrix. |
source code
|
|
|
_all_indices(length,
num_labels)
Function to generate all index n-grams of a given length |
source code
|
|
|
NGRAM_JOINER = ' :: '
|
|
logger = logging.getLogger("main_logger")
|
|
__package__ = ' jazzparser.utils.nltk.ngram '
|
Utility function for methods below. Converts all the probabilities in
the time-state matrix from log probabilities to probabilities.
|
Takes an n-dimensional matrix and sums over all but the first
dims dimensions, returning a dims -dimensional
matrix. You can do this easily in later versions of Numpy!
|