Originally, my corpus of jazz standards contained only chord sequences, with category annotations. I'm now extending it include MIDI data for every song. I also plan to align the MIDI data with the chords. I'm currently in the process of collecting MIDI files.
Contents
Getting the MIDI data
A series of functions in the Jazz Parser codebase scrape data about MIDI files from several MIDI search engines (none of which has an API or even clean HTML, in most cases). I've chosen sources that I could find that had a decent coverage of jazz standards and where the standard of the MIDI files is reasonably good.
Sources
My scripts scour the following sites for MIDI files by name:
The Jazz Page (just MIDI files on a big page, but good ones)
Melody Catcher (search by name)
Fetching
The script data/getmidi.py fetches distinct MIDI files from all the sources I've included. In most cases, it finds quite a lot of files, in some cases none at all. The files retrieved are rather noisy. The search engines aren't very good and song titles can be misleading.
I filter the MIDI files manually to remove all those that are not the right song and duplicates. Exact file mirrors were removed by getmidi.py, but there are often some duplicates that slip through because they're essentially the same but with instruments changed or suchlike.
- I ended up with 2,656 files at the end of this.
Filtering
The script data/checkmidi.py reads the information output by getmidi.py, presents me with what information it has about the song and plays the MIDI file using Timidity (which also usefully outputs the MIDI meta data). I listen to the song for as long as I need and quit Timidity. checkmidi.py then allows me to delete the file or continue to the next one.
- This cut the files down to 391.
Detecting Mirrors
Although we checked for exact copies of the same file while downloading the MIDI files, often files get distributed with only minor changes to, for example, the instruments of each track, or even just the meta data. This makes the file data different, but we still don't want both files.
My function jazzparser.utils.midi.note_on_similarity() (with fronted bin/data/loadmidi.py) tries to detect these by checking for note-on events that happen at the same time in two files. I used this to show up files that are effectively duplicates and checked them manually before deleting them.
- I removed a further 43 files in this way, leaving the corpus at 348 files.
Manual collection
Some songs got misled by their title or were just too hard to find automatically ("All By Myself", for example, returned only the more well-known song by that name, not the jazz standard by Irving Berlin).
In these cases I needed to do some manual searching in the hope of getting hold of something by using better search terms.
I managed to find a MIDI file for:
- All By Myself
- Big Nick
- Chelsea Bridge
Results
The following table shows how many files I ended up with for each song.
Django |
1 |
Afternoon in Paris |
4 |
Ain't Misbehavin' |
25 |
Agua De Beber |
6 |
Ain't That a Kick in the Head |
2 |
Airegin |
4 |
Alfie |
7 |
Alice in Wonderland |
4 |
All Blues |
6 |
All By Myself |
1 |
All or Nothing At All |
3 |
All the Things You Are |
25 |
All the Way |
4 |
Alright, Okay, You Win |
1 |
Amor |
6 |
Angel Eyes |
7 |
Anthropology |
2 |
Aren't You Glad You're You |
2 |
As Long As I Live |
0 |
Au Privave |
3 |
Bark for Barksdale |
1 |
Beauty and the Beast |
0 |
Bernie's Tune |
4 |
Bésame Mucho |
24 |
Bessie's Blues |
1 |
Between the Devil and the Deep Blue Sea |
2 |
Beyond the Blue Horizon |
2 |
Big Nick |
1 |
Black Coffee |
0 |
Black Nile |
0 |
Black Orpheus |
8 |
Blackberry Winter |
1 |
Blue Bossa |
6 |
Blue Champagne |
1 |
Blue in Green |
5 |
Blue Monk |
4 |
Blues for Alice |
5 |
Bluesette |
10 |
Boplicity |
1 |
Brazil |
10 |
Bud Powell |
3 |
Bye Bye Baby |
0 |
Byrd Like |
0 |
Call Me |
1 |
Call Me Irresponsible |
5 |
Can't Help Lovin' Dat Man |
3 |
Celia |
0 |
Central Park West |
1 |
Chega De Suadade (No More Blues) |
5 |
Chelsea Bridge |
1 |
Child is Born, A |
2 |
Chippie |
0 |
Chitlins Con Carne |
1 |
Come Fly With Me |
6 |
Como En Vietnam |
0 |
Confirmation |
2 |
Contemplation |
0 |
Crescent |
0 |
Crazy |
12 |
Crystal Silence |
1 |
D Natural Blues |
0 |
Daahoud |
3 |
Dear Old Stockholm |
1 |
Dearly Beloved |
0 |
Dedicated to You |
0 |
Desafinado |
16 |
Dexterity |
4 |
Dig |
0 |
Dizzy Atmosphere |
1 |
Dolores |
0 |
Doin' the Pig |
0 |
Domino Biscuit |
0 |
Don't Blame Me |
9 |
Don't Get Around Much Anymore |
21 |
Don't Know Why |
7 |
East of the Sun |
11 |
Easy Living |
2 |
Eighty One |
1 |
End of a Love Affair, The |
1 |
Equinox |
2 |
Everything Happens to Me |
4 |
Exactly Like You |
5 |
Falling Grace |
0 |
Falling in Love Again |
2 |
A Fine Romance |
5 |
Fine and Mellow |
0 |
502 Blues |
0 |
Follow Your Heart |
0 |
Footprints |
2 |
Take the A Train |
19 |
Omissions
I've got no MIDI file for the following:
- As Long As I Live
- Beauty and the Beast
- Black Coffee
- Black Nile
- Bye Bye Baby
- Byrd Like
- Celia
- Chippie
- Como En Vietnam
- Contemplation
- Crescent
- D Natural Blues
- Dearly Beloved
- Dedicated to You
- Dig
- Dolores
- Doin' the Pig
- Domino Biscuit
- Falling Grace
- Fine and Mellow
- 502 Blues
- Follow Your Heart