Christopher Raphael and Joshua Stoddard describe a model for chord labelling from symbolic data (MIDI) in their 2004 Computer Music Journal paper, Functional Harmonic Analysis Using Probabilistic Models, also published in ISMIR 2003.

This is a prime candidate for a chord labelling model that I could (a) use as a baseline for the chord labelling task and (b) extend into a category tagger for processing MIDI input.

Summary

HMM generating pitch classes
Decoding of labels predicts both key and harmonic function of chord
No objective evaluation - only subjective analysis of decoding results
No dataset available, or even described in the paper
Parameters initialized by making naive assumptions about probability of pitch classes for chord types
Trained on unlabelled data using Baum-Welch

Model Details

The model is described in detail in the paper. I've summarized it in this document.

Omissions

The description of the model omits some details that I need in order to replicate their experiments.

Training dataset is unavailable: they trained on a collect of 5 or 6 Haydn piano sonata movements
Test data and analyses referenced in the paper are not available. They can now be found on Christopher Raphael's current homepage.
Initialization parameters are not given.
They say in the paper that some of the components of the transition distribution don't get learned well by the model, so are set by hand. The parameters used are not given.

Training data

I'm not sure what's best to train the model on to replicate R&S's experiments. They don't have the files that they used any more. One possibility is to try just training on the test midis. R&S used Haydn piano sonatas to train their model (PC, not in paper), so I've collected and cleaned up some such midi files that I can use as training data.

Initialization

Exact emission distribution initialization parameters probably don't matter that much. Setting them to the kind of ballpark figures you'd imagine from reading the paper (root high probability, chord notes lower, scale notes lower, non-scale notes lower) should get the training off to a good enough start.

Transition distribution parameters

Setting the transition distribution parameters by hand is more dangerous. In the paper, they say initialization of the transition distributions doesn't make much difference. I'll try it anyway and set the transition distribution parameters by hand for initialization and train.

I have confirmed that training without initializing the transition distributions (i.e. initializing to uniform distributions) gives nonsensical parameters after training. I need to try initializing to see if it's any better.

If that still doesn't work, I'll try hand-setting the parameters, as they do.

Trained Models

I'm experimenting with training different models and seeing how they perform (pretty informally).

Model	Description
haydn	Trained on 5 Haydn piano sonata movements, each truncated to 50 chords.
jazz	Trained on 12 jazz standards midi files, each truncated to 50 chords.
jazz-haydn	Trained first on the Haydn data, then retrained on the jazz data.

To do

I've got the model training basically working. These are things I want to do next.

Try a unigram model as a baseline. This will help us to see how much the transition distributions are helping.
Try training the model on the test data.
Initialize the transition distributions better and train to see what comes out.
Look for some suitable data to train on.
Find ways of speeding up the training, which currently takes far too long. Try comparing your implementation to Tapas Kanungo's UMDHMM