1 """Output chord corpus data to a text file that others can use.
2
3 Data structures and utilities are provided elsewhere in the codebase for
4 loading, editing, converting, saving, etc. chord sequence data with
5 annotations. It's stored either as a sqlite database or as pickled Python
6 object, neither of which is useful to many other people. This format is
7 designed to be easily readable by others.
8
9 I don't currently provide any implementation of reading this file format,
10 since all scripts take their input from an internally-used format. The
11 description below of the file format should be enough to implement a function
12 to read this in the language of your choice.
13
14 File format
15 ===========
16 The standard file extension to use for these file shall be C{jcc}.
17
18 The first line is always::
19 JAZZPARSER CHORD CORPUS
20
21 Chord sequences are preceded by a blank line. They begin with the line::
22 BEGIN SEQUENCE
23
24 The lines that follow, up to the C{BEGIN CHORDS} line, contain metadata
25 about the sequence.
26 - C{INDEX}: sequences are numbered sequentially and this is the index of
27 the sequence within the file.
28 - C{ID}: database id of the sequence. This provides a way of referring to
29 a sequence in a corpus that is not tied to its position in the file
30 (you might want a different ordering, or selection of sequences).
31 - C{NAME}: unicode name of the song (utf-8 encoded).
32 - C{KEY}: key of the piece in the source. Chords are stored relative to
33 this key. E.g. in C major, a chord 5 is F. The formatting of this wasn't
34 originally intended to be machine readable, so might be a little
35 inconsistent. It is generally a note name (using C{b} and C{#} for flat
36 and sharp) followed by C{major} or C{minor} (C{major} assumed if omitted).
37 - C{BAR LENGTH}: integer number of beats per bar (durations of chords are
38 stored in beats.
39 - C{SOURCE}: where the chord sequence was taken from. Almost always
40 "C{The Real Book, Sixth Edition}".
41
42 Lines between C{BEGIN CHORDS} and C{END CHORDS} each represent a single
43 chord, with comma-separated fields. The fields are the following:
44 - B{root}: equal-temperament pitch class (integer) relative to key.
45 - B{chord type}: chord type label.
46 - B{duration}: integer number of beats.
47 - B{additions}: any further additions to the chord notated in the input
48 not covered by the chord type (anything above the seventh degree).
49 - B{bass}: integer pitch class of bass note, if written in the input
50 (e.g. C7/B{G}). Otherwise blank.
51 - B{category}: lexical category of annotation, from the jazz CCG grammar.
52 - B{coordination middle}: unresolved dominant/subdominant chord which
53 marks the middle point of a coordination. E.g. G7 in (Dm7 G7) (A7 Dm7 G7)
54 CM7. C{T} or C{F}.
55 - B{coordination end}: dominant/subdominant sharing its resolution with a
56 previously marked coordination-middle chord. C{T} or C{F}.
57
58 """
59 """
60 ============================== License ========================================
61 Copyright (C) 2008, 2010-12 University of Edinburgh, Mark Granroth-Wilding
62
63 This file is part of The Jazz Parser.
64
65 The Jazz Parser is free software: you can redistribute it and/or modify
66 it under the terms of the GNU General Public License as published by
67 the Free Software Foundation, either version 3 of the License, or
68 (at your option) any later version.
69
70 The Jazz Parser is distributed in the hope that it will be useful,
71 but WITHOUT ANY WARRANTY; without even the implied warranty of
72 MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
73 GNU General Public License for more details.
74
75 You should have received a copy of the GNU General Public License
76 along with The Jazz Parser. If not, see <http://www.gnu.org/licenses/>.
77
78 ============================ End license ======================================
79
80 """
81 __author__ = "Mark Granroth-Wilding <mark.granroth-wilding@ed.ac.uk>"
82
84 """
85 Outputs the sequences in the sequence index to a text file.
86
87 @type index: jazzparser.data.db_mirrors.SequenceIndex
88 @param index: index to get sequences from
89 @type outfile: file-like object
90 @param outfile: file to write to
91
92 """
93 _write_header(index, outfile)
94 for ind,seq in enumerate(index.sequences):
95 _write_sequence(seq, ind, outfile)
96
97
99 """
100 Writes a header to the outfile for this sequence index.
101
102 """
103
104 print >>outfile, "JAZZPARSER CHORD CORPUS"
105
107 """
108 Writes the data for one chord sequence to the outfile.
109
110 """
111 print >>outfile
112 print >>outfile, "BEGIN SEQUENCE"
113 print >>outfile, "INDEX: %d" % index
114 print >>outfile, "ID: %d" % seq.id
115 print >>outfile, "NAME: %s" % seq.name.encode('utf8')
116 print >>outfile, "KEY: %s" % seq.key
117 print >>outfile, "BAR LENGTH: %d" % seq.bar_length
118 print >>outfile, "SOURCE: %s" % seq.source
119 print >>outfile, "BEGIN CHORDS"
120 for chord in seq.iterator():
121 _write_chord(chord, outfile)
122 print >>outfile, "END CHORDS"
123
125 """
126 Writes a single line of data for a chord to the outfile.
127
128 """
129 print >>outfile, "%d, %s, %d, %s, %s, %s, %s, %s" % \
130 (crd.root, crd.type, crd.duration, crd.additions,
131 crd.bass or "", crd.category,
132 "T" if crd.treeinfo.coord_unresolved else "F",
133 "T" if crd.treeinfo.coord_resolved else "F")
134