Hidden Markov model class, a generative model for labelling sequence
data. These models define the joint probability of a sequence of symbols
and their labels (state transitions) as the product of the starting state
probability, the probability of each state transition, and the
probability of each observation being generated from each state. This is
described in more detail in the module documentation.
This implementation is based on the HMM description in Chapter 8,
Huang, Acero and Hon, Spoken Language Processing and includes an
extension for training shallow HMM parsers or specializaed HMMs as in
Molina et. al, 2002. A specialized HMM modifies training data by
applying a specialization function to create a new training set that is
more appropriate for sequential tagging with an HMM. A typical use case
is chunking.
|
|
__init__(self,
symbols,
states,
transitions,
outputs,
priors,
**kwargs)
Creates a hidden markov model parametised by the the states,
transition probabilities, output probabilities and priors. |
|
|
|
|
|
|
array
|
_backward_probability(self,
unlabeled_sequence)
Return the backward probability matrix, a T by N array of
log-probabilities, where T is the length of the sequence and N is the
number of states. |
|
|
|
|
| _best_path(self,
unlabeled_sequence) |
|
|
|
|
| _best_path_simple(self,
unlabeled_sequence) |
|
|
|
|
|
|
|
| _exhaustive_entropy(self,
unlabeled_sequence) |
|
|
|
|
| _exhaustive_point_entropy(self,
unlabeled_sequence) |
|
|
|
array
|
_forward_probability(self,
unlabeled_sequence)
Return the forward probability matrix, a T by N array of
log-probabilities, where T is the length of the sequence and N is the
number of states. |
|
|
|
float
|
_output_logprob(self,
state,
symbol)
Returns:
the log probability of the symbol being observed in the given state |
|
|
|
|
| _sample_probdist(self,
probdist,
p,
samples) |
|
|
|
|
| _tag(self,
unlabeled_sequence) |
|
|
|
|
| _update_cache(self,
symbols) |
|
|
|
sequence of any
|
best_path(self,
unlabeled_sequence)
Returns the state sequence of the optimal (most probable) path
through the HMM. |
|
|
|
sequence of any
|
best_path_simple(self,
unlabeled_sequence)
Returns the state sequence of the optimal (most probable) path
through the HMM. |
|
|
|
|
entropy(self,
unlabeled_sequence)
Returns the entropy over labellings of the given sequence. |
|
|
|
float
|
log_probability(self,
sequence)
Returns the log-probability of the given symbol sequence. |
|
|
|
|
point_entropy(self,
unlabeled_sequence)
Returns the pointwise entropy over the possible states at each
position in the chain, given the observation sequence. |
|
|
|
float
|
probability(self,
sequence)
Returns the probability of the given symbol sequence. |
|
|
|
list
|
random_sample(self,
rng,
length)
Randomly sample the HMM to generate a sentence of a given length. |
|
|
|
list
|
tag(self,
unlabeled_sequence)
Tags the sequence with the highest probability state sequence. |
|
|
|
|
test(self,
test_sequence,
**kwargs)
Tests the HiddenMarkovModelTagger instance. |
|
|
|
Inherited from api.TaggerI:
batch_tag,
evaluate
Inherited from api.TaggerI (private):
_check_params
Inherited from object:
__delattr__,
__format__,
__getattribute__,
__hash__,
__new__,
__reduce__,
__reduce_ex__,
__setattr__,
__sizeof__,
__str__,
__subclasshook__
|