I have trained the C&C supertagger on the data in my dataset, consisting currently of 3,756 chords. The annotation is incomplete. Some chords have no category, which will slightly reduce the performance of the tagger.

Heldout Cross-Validation

To perform a basic evaluation of the tagger I split the corpus 10 ways (by sequences, not chords). I trained a model on each combination of 9 partitions and evaluated it on the remaining one, combining the results of all the tests.

Tag Agreement

I used the tagger to pick the highest probability tag for each chord and computed the proportion of tags which matched the gold standard tag. I ignored any chords for which there is currently no gold standard tag.

Perplexity

By retrieving a high number of tags, with probabilities, from the tagger for each chord I computed the entropy per chord over each partition and hence the perplexity of the model over the whole evaluated set.

Confusions

The following table shows a count of the incorrectly picked tags over the whole evaluation in the agreement test.

Question marks appear in this table because unknown tags in the training set were not excluded. The tagger has learnt the unknown value and is occasionally choosing it as the most probable tag. Eventually there should be no unknown tags in the training set.

For brevity this table excludes confusion that occur 1 or 2 times.

Correct tag

Chosen tag

Count

T

D

102

D

T

47

D_Tt

D

41

Rep

T

41

Rep_D

T

26

Rep_D

D

26

D

D_Tt

22

D_Bd

D

21

TC_IV

T

21

T

?

20

D_Tt

T

17

Rep

D

15

T

Rep_D

13

D_Bd

?

12

T

D_Tt

12

Rep_D_Tt

D

12

TC_IV

D

12

D

?

10

Rep_D_Tt

T

10

TC_IVR

D

10

D_Bd

T

9

Pass_VI

?

9

TC_IVR

T

9

S

T

8

9c

D

8

Dim_bVII

?

7

T

Rep

7

S

D

7

Pass_bV

D

7

TC_II

?

7

Aug_bII

D

6

9e

T

6

TC_IIR

?

5

Dim_bII

?

5

T

TC_IV

5

D

Rep_D

5

11a

D

5

T_III

D

5

0a

D

5

D_Btk

D

5

Rep

Rep_D

5

2a

D_Bd

4

S

?

4

D

TC_IV

4

TC_IV

D_Tt

4

Pass_I

D

4

Rep_bVI

T

4

9e

D

4

Rep_D_Bd

D

3

TC_IIR

T

3

Dim_bII

D_Tt

3

T

Rep_D_Tt

3

T_III

?

3

TC_IVR

?

3

TC_IV

D_Bd

3

Category Distribution

At the time of training, the distribution of chords over categories was as follows. Numeric category names are from an old annotation and have not yet been re-annotated: these make up only a small proportion.

Category

Count

%

D

1,984

50.51

T

875

22.28

D_Tt

271

6.90

No category

172

4.38

Rep_D

130

3.31

Rep

105

2.67

TC_IV

58

1.48

Dim_bII

49

1.25

D_Bd

48

1.22

Rep_D_Tt

38

0.97

TC_IVR

35

0.89

S

22

0.56

9e

11

0.28

0a

11

0.28

T_III

10

0.25

TC_II

10

0.25

Pass_VI

9

0.23

Pass_bV

9

0.23

9c

9

0.23

TC_IIR

8

0.20

Dim_bVII

8

0.20

Aug_bII

8

0.20

Rep_bVI

6

0.15

D_Btk

6

0.15

Rep_D_Bd

5

0.13

11a

5

0.13

Pass_I

4

0.10

2a

4

0.10

Rep_S

3

0.08

Rep_Aug_bII

3

0.08

Dim_V

3

0.08

T_bVI

2

0.05

Aug_VI

2

0.05

11b

2

0.05

Rep_Aug_VI

1

0.03

Dim_III

1

0.03

9b

1

0.03


CategoryEvaluation