My corpus of chord sequences is stored in a database. This makes it efficient for me to access the data and annotate it using an annotator application I've written.
This is not a convenient format for distribution, however. If anyone else wants to use my data, they would need to have Sqlite installed and probably some other Django dependencies. I therefore have an alternative format into which I can export my entire dataset.
Contents
Database-Independent Data Format
My database is interfaced by a Django application. In Django one defines Python classes which determine the database schema and the database tables are automatically generated. My application contains several classes which define the database schema and through which the data is accessed.
My parser's codebase contains a close mirror of these classes, specifically designed to provide a Python interface to the data that is independent of the database. Data from the database can be simply converted into instances of these classes. These instances contain all of the information from the database models. They can be pickled (that is, serialized) and stored to a file.
If I am collaborating with you, I can give you my data in this form.
Using the Data File
Let's say you have a file in the format described above. If you want to use the data in a Python script, you may load (and unpickle) the file and then access all the data via the interface of my database-independent classes. If you want the data in some other format, a very short Python script is enough to output it in your desired format. See below for instructions.
I will assume the file has been put in the root of the codebase and is called sequences.crd.
Before you can do any of this you need a working copy of my Jazz Parser codebase: then you will have access to the classes that provide the data interface and functions that will read in the file for you. See Parser for instructions on how to get the code. The following instructions assume you have installed any prerequisites, checked out the code and are in the root directory of the codebase.
The commands below (starting $, representing the shell prompt) are appropriate for a Unix terminal.
Handling Sequences in Python
Starting the Shell
Change into the bin directory in the root of the codebase
$ cd bin
Running the script jazzshell starts up a Python shell with the Jazz Parser modules in the environment.
$ ./jazzshell
Loading the Data File
The following commands, typed at the shell prompt, load the file into a list of chord sequences.
from jazzparser.data.db_mirrors import load_pickled_data l = load_pickled_data('../sequences.crd')
The list l contains instances of the chord sequence classes.
Chord Sequence Class Interface
The classes in jazzparser.data.db_mirrors define the interface to the chord sequence data.
class ChordSequence(object): def __init__(self, name=None, key=None, bar_length=None, first_chord=None, notes=None, analysis_omitted=None, omissions=None, source=None): self.name = name # String self.key = key # String self.bar_length = bar_length # Int self.first_chord = first_chord # Mirror of a Chord self.notes = notes # String self.analysis_omitted = analysis_omitted # Bool self.omissions = omissions # String self.source = source # Store as a string class Chord(object): def __init__(self, root=None, type=None, additions=None, bass=None, next=None, duration=None, category=None, sequence=None): self.root = root # Int self.type = type # Store as a string self.additions = additions # String self.bass = bass # Int self.next = next # Another Chord mirror self.duration = duration # Int self.category = category # String self.sequence = sequence # Mirror of the sequence model
Access the chords of a sequence using its iterator():
for chord in sequence.iterator(): print chord
Outputting Sequence Data
You may use the instances of these classes to output the chord data in the format of your choice. For example:
from jazzparser.data.db_mirrors import load_pickled_data # Load in the pickled data data = load_pickled_data('../sequences.crd') # Open a file to output to file = open('mynewformat.txt','w') for sequence in data: ## Change this depending on your desired output format # Output each sequence's name file.write("## %s\n" % sequence.name.encode('utf-8')) # Output each chord and its category, separated by tabs file.write("%s\n" % "\t".join(["%s|%s" % (c, c.category) for c in sequence.iterator()])) file.close()
The file mynewformat.txt now contains the sequence data in the cut-down textual form that we chose to output.
Replace the middle part of this script with the formatting you want to output.