Most of the command-line scripts in the Jazz Parser, including the main parsing script (jazzparser) will accept a --config option. This allows you to give parameters and arguments in a config file instead of (or as well as) on the command line. It's handy for setting up experiments that you might want to repeat, so that you don't need to remember the whole set of options you gave on the command line.

The options and arguments you can give in the config file are identical to those available on the command line. Try the --help option to see a list for any particular script.

Config File Syntax

The config files use a simple syntax and the options and arguments end up getting processed more or less as they would if you'd given them on the command line. There are some funky things you can do, though, such as inheriting from abstract config files.

Description

optname=value

# This line will be ignored
someopt = a_value    # This part will also be ignored

/path/to/file        # This is the first cmd-line argument
AM7 Bm7 E7 AM7       # This is the second - it's all one argument, not four

+derivations         # Equivalent to --derivations

You cannot use the one-letter versions of options: e.g. you must use tagger for the -t option. This is good, as it makes the file more readable. Check --help for the variants of option names. Note that some have more readable alternatives: e.g. you could use topt, but tagger-option is more intelligible.

It's good to put a comment at the top so you know what script it's for.

# Config file for jazzparser

Long lines can be split by putting a \ before the linebreak. Whitespace at the start of the continuation line will be ignored (but not that before the \). For example, in the following, opts is all one option, so one line. No spaces will appear between the parts.

opts  = n=2:\
        backoff=1:\
        cutoff=2:\
        estimator=witten-bell

Some special directives are available, listed below. These appear on a line of their own, beginning %%.

You may use certain substitutions in the options. %{X} will be replaced by the substitution value X. The main use of this is to specify paths relative to the project root, etc, rather than where the script is run. It's best to specify all paths in this way.

file = %{PROJECT_ROOT}/path/to/file

This value, PROJECT_ROOT comes from the Jazz Parser's settings module, jazzparser.settings. (Note that, despite the way they appear in the API docs, these paths are constructed dynamically, so will always point to the right place on your system.) You can use any of the values defined in the settings as substitutions.

output = %{TEMP_DIR}/parser-output/%{CURRENT_VERSION}

Another source of substitution values is variables you've set yourself previously in the config file. Define a variable using the %% DEF directive. Then use it using the %{X} syntax.

%% DEF myvar someval
... %{myvar}

Here %{mayvar} will be replaced by someval. This is particularly useful in the subclassing scenario described in the next section.

Inheritance

The %% INCLUDE directive allows you to include another file. This allows for a sort of subclassing, or inheritance, of config files. You can define some common options in one file and import it into several others.

%% INCLUDE filename.conf

Say you want to run several parsing experiments, most of which use the same options. Create a file base.conf with the options that always stay the same:

# Config file for jazzparser
# Master file for all my exciting, but similar, experiments
%% ABSTRACT

# Define the common options here
...

Then create each of the individual experiments' config files:

# Config file for jazzparser
# The first of my suite of experiments
%% INCLUDE base.conf

# Define experiment-specific options here
...

The parser will look for base.conf in the same directory as the sub-config file. You can also give a relative path.

Note that the base config file used the ABSTRACT directive. This just tells the parser that it can't be used on its own, but has to be inherited by another file, avoiding confusing errors because of missing options.

Variables can be useful with a structure like this. Instead of fully specifying alternative options in the sub-config files, they could just set the value of a variable before loading the base config.

# base.conf
%% ABSTRACT

# This output directory needs to be different for the different experiments
output   = %{PROJECT_ROOT}/etc/output/%{exp_name}/

# First experiment
# Set the variable
%% DEF exp_name my_first

# Now load the base config
%% INCLUDE base.conf

When dealing with this sort of structure, it can become confusing to get command-line arguments in the right order. Instead of just putting them on a line of their own, you can use a directive to specify a particular argument number. Then they don't need to be in order, so base.conf can define some and leave others to the sub-config files.

%% ARG 2 some-val

All Directives

Directive

Arguments

Description

Example

INCLUDE

Filename

Include another config file, as if its contents appeared here

%% INCLUDE base.conf

ARG

Arg number, value

Specify the value to use as the numbered command-line argument

%% ARG 2 %{TEMP_DIR}/input-file

DEF

Var name, value

Set the value of a variable, which may later be used as %{my_var}

%% DEF my_var some-value

ABSTRACT

Declare that this config file must be inherited by another to be used

%% ABSTRACT

REQUIRE

Option name

Require the given option to be set by an inheriting config file; if it's not found, an error will be output

%% REQUIRE parser

Examples

The following example shows much of this in action:

#*** This is a demo config file as an example of the syntax
#*** Comments beginning "***" are explanatory

#*** Comments, beginning with a #, are ignored
#*** Remember to put a comment so you know what script this was for
# Config for eval_tagger.py

#*** Arguments are just put on a line of their own
# Model type
ngram-multi
# Model name
bigram

#*** You can use substitutions like this to include constants 
#***  from jazzparser.settings. This is a good way to specify paths.
# Input sequences
%{PROJECT_ROOT}/input/sequences

#*** Options are given in the form "optname = value".
#*** This is equivalent to "--optname value" on the command line.
#*** You can only use long option names.

# Divide into 10 partitions
partitions = 10
# Use the tagrank parser
parser = tagrank

#*** There's no problem with including "="s in the value
# Give some options to the parser
# Dump the chart to a file
popt = dump_chart=%{TEMP_DIR}/chart

#*** Flags (options with no value) are specified by putting a + in 
#***  front of the flag name, like so:
# Store derivation traces
+derivations

Here's an example of 'subclassing'. The following is in the file full_eval.conf:

# Model type
%% ARG 0 ngram
# Model name: this is left out so it can be supplied by configs that include this
# Input sequences
%% ARG 2 %{PROJECT_ROOT}/input/fullseqs

# Divide into 10 partitions
partitions  = 10

# Output parse results to a file
output      = %{PROJECT_ROOT}/etc/tmp/noparse/output-%{suffix}-

# Output accuracy values, not TS distance
+accuracy

Note that the second argument has been omitted. Note also the %{suffix} substitution. This value will be defined by the including files. Here's one of them:

%% DEF suffix bigram
%% INCLUDE full_eval.conf
%% ARG 1 bigram-c2-uni

Stored Experiments

I'm putting config files for experiments I'm running in the directory %{PROJECT_ROOT}/input/config/.

Implementation

The processing is implemented in jazzparser.utils.config. See the API doc for more details.