My corpus of chord sequences is stored in a database. This makes it efficient for me to access the data and annotate it using an annotator application I've written.

This is not a convenient format for distribution, however. If anyone else wants to use my data, they would need to have Sqlite installed and probably some other Django dependencies. I therefore have an alternative format into which I can export my entire dataset.

Database-Independent Data Format

My database is interfaced by a Django application. In Django one defines Python classes which determine the database schema and the database tables are automatically generated. My application contains several classes which define the database schema and through which the data is accessed.

My parser's codebase contains a close mirror of these classes, specifically designed to provide a Python interface to the data that is independent of the database. Data from the database can be simply converted into instances of these classes. These instances contain all of the information from the database models. They can be pickled (that is, serialized) and stored to a file.

If I am collaborating with you, I can give you my data in this form.

Using the Data File

Let's say you have a file in the format described above. If you want to use the data in a Python script, you may load (and unpickle) the file and then access all the data via the interface of my database-independent classes. If you want the data in some other format, a very short Python script is enough to output it in your desired format. See below for instructions.

I will assume the file has been put in the root of the codebase and is called sequences.crd.

Before you can do any of this you need a working copy of my Jazz Parser codebase: then you will have access to the classes that provide the data interface and functions that will read in the file for you. See Parser for instructions on how to get the code. The following instructions assume you have installed any prerequisites, checked out the code and are in the root directory of the codebase.

The commands below (starting $, representing the shell prompt) are appropriate for a Unix terminal.

Handling Sequences in Python

Starting the Shell

Change into the bin directory in the root of the codebase

$ cd bin

Running the script jazzshell starts up a Python shell with the Jazz Parser modules in the environment.

$ ./jazzshell

Loading the Data File

The following commands, typed at the shell prompt, load the file into a list of chord sequences.

from jazzparser.data.db_mirrors import load_pickled_data
l = load_pickled_data('../sequences.crd')

The list l contains instances of the chord sequence classes.

Chord Sequence Class Interface

The classes in jazzparser.data.db_mirrors define the interface to the chord sequence data.

class ChordSequence(object):
    def __init__(self, name=None, key=None, bar_length=None, first_chord=None, 
                 notes=None, analysis_omitted=None, omissions=None, 
                 source=None):
        self.name = name                # String
        self.key = key                  # String
        self.bar_length = bar_length    # Int
        self.first_chord = first_chord  # Mirror of a Chord
        self.notes = notes              # String
        self.analysis_omitted = analysis_omitted # Bool
        self.omissions = omissions      # String
        self.source = source            # Store as a string

class Chord(object):
    def __init__(self, root=None, type=None, additions=None, bass=None, 
                 next=None, duration=None, category=None, sequence=None):
        self.root = root            # Int
        self.type = type            # Store as a string
        self.additions = additions  # String
        self.bass = bass            # Int
        self.next = next            # Another Chord mirror
        self.duration = duration    # Int
        self.category = category    # String
        self.sequence = sequence    # Mirror of the sequence model

Access the chords of a sequence using its iterator():

for chord in sequence.iterator():
    print chord

Outputting Sequence Data

You may use the instances of these classes to output the chord data in the format of your choice. For example:

from jazzparser.data.db_mirrors import load_pickled_data
# Load in the pickled data
data = load_pickled_data('../sequences.crd')
# Open a file to output to
file = open('mynewformat.txt','w')
for sequence in data:
    ## Change this depending on your desired output format
    # Output each sequence's name
    file.write("## %s\n" % sequence.name.encode('utf-8'))
    # Output each chord and its category, separated by tabs
    file.write("%s\n" % "\t".join(["%s|%s" % (c, c.category) for c in sequence.iterator()]))

file.close()

The file mynewformat.txt now contains the sequence data in the cut-down textual form that we chose to output.

Replace the middle part of this script with the formatting you want to output.