Package jazzparser :: Package taggers :: Package candc :: Module training
[hide private]
[frames] | no frames]

Module training

source code

Training interface to the C&C supertagger.

This automates the process of training the C&C supertagger on data from the database. The data should first be generated using the script in the annotator bin.

Training data should be in the Jazz Parser format, which differs slightly from the C&C format. Instead of <obs>|<pos>|<tag>, each chord should have be represented as <chord>|<obs>|<pos>|<tag>. Use generate_model_data to generate this from the database.

Classes [hide private]
  CandcTrainingError
Functions [hide private]
 
train_model(model, data_filename, holdout_partitions=0, train_params={}, chordmap=None)
Train a C&C model by calling the C&C supertagger training routine.
source code
 
train_model_on_sequence_data(model, data_filename, *args, **kwargs)
Same as train_model, but takes a db_mirrors sequence data file as input, rather than a C&C training data file.
source code
 
train_model_on_sequence_index(model, sequenceindex, *args, **kwargs)
Same as train_model_on_sequence_data, but doesn't read the sequences from a file.
source code
 
train_model_on_sequence_list(model, sequences, *args, **kwargs)
Same as train_model_on_sequence_data, but doesn't read the sequences from a file.
source code
Variables [hide private]
  __package__ = 'jazzparser.taggers.candc'
Function Details [hide private]

train_model(model, data_filename, holdout_partitions=0, train_params={}, chordmap=None)

source code 

Train a C&C model by calling the C&C supertagger training routine.

model should be a model name to train/retrain. data_filename should be the path to a training data file in the hybrid C&C format. params is an optional dict of (string) parameter values to feed to the C&C trainer. Only certain parameter values will be allowed. These will override the default parameters in settings.CANDC. Set a parameter to None or an empty string to use C&C's default.