| Trees | Indices | Help |
|
|---|
|
|
object --+
|
BaumWelchTrainer
Class with methods to retrain an HMM using the Baum-Welch EM algorithm.
Note that although the default implementation is for a plain jazzparser.utils.nltk.ngram.NgramModel, Baum-Welch training only makes sense if the model is an HMM. It will therefore complain if the order is not 2 and if there's a backoff model.
Module options must be processed externally. This allows them to be combined with other options as appropriate. The options defined here are a standard set of options for generic training and should be processed before the trainer is instantiated.
This is designed as a generic implementation of the algorithm. To use it for a special kind of model (e.g. one with a non-standard transition distribution), you need to override certain methods to make them appropriate to the model:
create_mutable_model
update_model
sequence_updates
get_empty_arrays
sequence_updates_callback
get_array_indices
The generic version of the trainer can be used to train a DictionaryHmmModel. Subclasses are used to train other model types.
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
|||
|
Inherited from |
|||
|
|||
|
|||
|
|||
|
|||
|
|||
OPTIONS = [ModuleOption('max_iterations', filter= int, help_te
|
|||
|
|||
|
Inherited from |
|||
|
|||
x.__init__(...) initializes x; see help(type(x)) for signature
|
Verifies and processes the training option values. Returns the processed dict. |
Stores a line in the history of the model or wherever else it is appropriate to keep a record of training steps. Default implementation does nothing, but subclasses may want to store this information. |
This should be overridden by subclasses, but not by defining a static method on the class, since the function must be picklable. For this, it needs to be a top-level function. Then you can set the sequence_updates attribute to point to it (using staticmethod), as we have done in the default implementation.
|
Creates a mutable version of the given model. This mutable version will be the model that receives updates during training, as defined by update_model. |
Creates empty arrays to hold the accumulated probabilities during training. The sizes will depend on self.model. |
Returns a tuple of the dicts that map labels, emissions, etc to the indices of arrays to which they correspond. These will need to be different for non-standard models. |
Callback for the sequence_updates processes that takes the updates from a single sequence and adds them onto the global update accumulators. The accumulators are stored as self.global_arrays. |
Replaces the distributions of the saved model with the probabilities taken from the arrays of updates. self.model is expected to be made up of mutable distributions when this is called. |
Saves the model in self.model to disk. This may be called at the end of each iteration and will be called at the end of the whole training process. By default, does nothing. You don't have to put something in here, but you'll need to override this if you want the model to be saved during training before it gets return at the end. |
Performs unsupervised training using Baum-Welch EM. This is performed as a retraining step on a model that has already been initialized. This is based on the training procedure in NLTK for HMMs:
|
|
|||
OPTIONS
|
| Trees | Indices | Help |
|
|---|
| Generated by Epydoc 3.0.1 on Mon Nov 26 16:04:58 2012 | http://epydoc.sourceforge.net |