Package jazzparser :: Package taggers :: Module tagger :: Class Tagger

Class Tagger

object --+
         |
        Tagger

Known Subclasses:

The superclass of all taggers. Subclass this to create tagger components.

Probabilities are returned by the tagger along with signs. These are posterior probabilities for the C&C supertagging approach: that is, Pr(tag | observations). For the PCFG parser approach, the taggers must yield likelihoods: Pr(observation | tag). A tagger of this sort should have POSTERIOR set to False.

Instance Methods

[hide private]

__init__(self, grammar, input, options={}, original_input=None, logger=None)
The tagger must have reference to the grammar being used to parse the input. source code

_get_input_length(self)
Should return the number of words (chords) in the input, or some other measure of input length appropriate to the type of tagger.

source code

_get_name(self)

source code

dict

get_all_signs(self)
Gets all signs that the tagger will return, regardless of offset.

source code

get_signs(self, offset=0)
Returns a list of tuples (start, end, signtup). source code

get_string_input(self)
Returns a list of string representations of the inputs.

source code

get_tag_probability(self, index, tag, end_index=None)
Returns as a float the probability with which the tagger judges the given span will be assigned the given sign.

source code

get_word(self, index)
Returns the input word at this index.

source code

get_word_duration(self, index)
Returns the duration of the word at this index if durations are available.

source code

Inherited from object: __delattr__, __format__, __getattribute__, __hash__, __new__, __reduce__, __reduce_ex__, __repr__, __setattr__, __sizeof__, __str__, __subclasshook__

Class Methods

[hide private]

check_options(cls, options)
Normally, options are validated when the tagger is instantiated.

source code

Class Variables

[hide private]

COMPATIBLE_FORMALISMS = []

INPUT_TYPES = []
List of allowed input datatypes.

LEXICAL_PROBABILITY = False
Some models provide lexical probabilities that the parsing models can use.

TAGGER_OPTIONS = []
Tagger-specific options.

shell_tools = []
Interactive shell tools available when this tagger is used.

Properties

[hide private]

input_length
Should return the number of words (chords) in the input, or some other measure of input length appropriate to the type of tagger.

name

Inherited from object: __class__

Method Details

[hide private]

init(self, grammar, input, options=`{}`, original_input=None, logger=None)
(Constructor)

source code

The tagger must have reference to the grammar being used to parse the input. It must also be given the full input when instantiated. The format of this input will depend on the tagger: for example, it might be a string or a MIDI file.

Parameters:

original_input - the input in its original, unprocessed form. This will usually be a string. This is optional, but in some circumstances things might fall apart if it hasn't been given. E.g. using a backoff model as backoff from a tagging model requires the original input to be passed to the backoff model.
logger - optional progress logger. Logging will be sent to this during initialization of the tagger and tagging. If not given, the logging will be lost. Subclasses may access the logger (or a dummy logger if none was given) in self.logger.

Overrides: object.__init__

check_options(cls, options)
Class Method

source code

Normally, options are validated when the tagger is instantiated. This allows you to check them before that.

get_all_signs(self)

source code

Gets all signs that the tagger will return, regardless of offset. This just uses get_signs to get the signs for every offset.

Returns: dict: all the signs, keyed by (start,end) tuple

get_signs(self, offset=0)

source code

Returns a list of tuples (start, end, signtup). These represent spans to be added to the chart, start and end being the start and end nodes.

Each signtup is a (sign,tag,probability) tuple representing a sign that the tagger wishes to add to the chart in this position. How many are returned is up to the tagger (it may wish to return more in cases where there are no clear winners, for example). If the tag is not found in the grammar, sign will be None.

Returned list is sorted by probability, highest first.

offset may be set >0 in order to retrieve further signs once some have already been returned. If offset=k, the tagger should disregard all the signs that would have been returned for offset<k and return the next bunch - as many as it sees fit. offset is incremented each time the parse fails.

The simplest approach, and that employed by most taggers, has some signs for each word and none spanning more than one word. That is, the tuples in the list would be of the form (wordnum, wordnum+1, signtup). This is by no means required, though: some taggers will want to add multi-node spans to the chart.

Note: This functionality used to be provided by get_signs_for_word. For convenience, if a tagger provides get_signs_for_word and not get_signs(), the results of the former will be used to produce the latter. New taggers should not do this, but override this method directly.

get_string_input(self)

source code

Returns a list of string representations of the inputs. This is just a convenience function, which uses whatever representation gets returned by get_word() to produce a representation of the whole input.

get_tag_probability(self, index, tag, end_index=None)

source code

Returns as a float the probability with which the tagger judges the given span will be assigned the given sign.

If end_index is not given, it defaults to index+1.

get_word(self, index)

source code

Returns the input word at this index. This does not need to be a string, but must have a sensible __str__, so that it can be converted to a readable string. The purpose of this is to provide a readable form of the input for the parser to store in derivation traces.

get_word_duration(self, index)

source code

Returns the duration of the word at this index if durations are available. Otherwise raises an AttributeError.

Class Variable Details

[hide private]

INPUT_TYPES

List of allowed input datatypes. See jazzparser.data.input.INPUT_TYPES.

Value:

[]

LEXICAL_PROBABILITY

Some models provide lexical probabilities that the parsing models can use. They should set this to true. They should also provide a method lexical_probability(start_time, end_time, span_label).

Value:

False

TAGGER_OPTIONS

Tagger-specific options. List of ModuleOptions.

Value:

[]

Property Details

[hide private]

input_length

Should return the number of words (chords) in the input, or some other measure of input length appropriate to the type of tagger.

Get Method:: _get_input_length(self) - Should return the number of words (chords) in the input, or some other measure of input length appropriate to the type of tagger.

name

Get Method:: _get_name(self)

Class Tagger

__init__(self, grammar, input, options={}, original_input=None, logger=None) (Constructor)

check_options(cls, options) Class Method

get_all_signs(self)

get_signs(self, offset=0)

get_string_input(self)

get_tag_probability(self, index, tag, end_index=None)

get_word(self, index)

get_word_duration(self, index)

INPUT_TYPES

LEXICAL_PROBABILITY

TAGGER_OPTIONS

input_length

name

init(self, grammar, input, options=`{}`, original_input=None, logger=None)
(Constructor)

check_options(cls, options)
Class Method