Running the full parsing routine on chord sequences longer than a few chords pretty soon becomes infeasible. This is because there are lots of lexical entries that can apply to each chord: this isn't our fault, it's just a feature of the language of harmony.

A simple way to speed things up is to use a supertagger. This works rather like a part-of-speech tagger, assigning probable categories to each chord on the basis of the chords surrounding it.

/Evaluation - some evaluation of models built for the tagger.

C&C Supertagger

The Jazz Parser uses the C&C supertagger straight out of the box. This is not included in the Jazz Parser's code. To use it, you need to download the C&C source code and compile it. Put it in the directory lib/candc in your Jazz Parser checkout.

For more information about the C&C tools, which the supertagger is a part of, see the C&C tools homepage.

Using the Tagger

There's already a supertagger model trained ready for use with the C&C tagger, so once you've got the tagger, the parser will do the rest for you.

To use the supertagger, just add the parameter -u candc and give the parser some chords. You should find that it parses much more quickly. There is, of course, a chance that it will miss some correct interpretations.

Training the Tagger

Generate Training Data

First, generate some training data. You can do this directly from the annotator tool's database using the script generate_model_data.py. For this you'll need the annotator tool set up with a working database with all the data in it you wish to use.

In due course, I hope to make some datasets available in a form that does not require the database. If you want that now, contact me.

From the project root:

cd annotator
./django-admin run annotator/bin/data/generate_model_data.py -c chords.super

The file chords.super now contains all the sequences in the database as chords, observations and tags.

This format contains some extra information over what C&C uses for its supertagger training data. The training interface strips this out. This allows the same format to be used for other purposes. If you want direct input to the C&C command-line training tool, use the -s option instead of -c.

Training the Tagger

The C&C supertagger can be trained using the C&C command-line tools. However, the Jazz Parser provides a training interface that will take care of training the models and putting them in the right place for the parser to use.

Decide on a unique name for your model. This should include no dots (unless you're using them to create a hierarchy of models), spaces or slashes. You don't need to worry about where the model data is stored: the training tool will put it in a directory where the parser will look for it.

The script train_candc_model.py will do the training for you. Assuming you are now in the bin directory and your training data file (see above) is there too:

./jazzshell train_candc_model.py model1 chord.super

This will create a model called model1.

Using the Model

You can now run the parser using this model.

cd bin
./jazzparser -u candc --topt 'model=model1' "I V7 I"

The option --topt allows you to specify options specific to the supertagger module (the C&C supertagger is just one of potentially many tagger modules). Here we tell it what model to use: if you didn't it would just use its default model. To see other tagger options run:

./jazzparser -u candc --topt help