Package jazzparser :: Package utils :: Module distance
[hide private]
[frames] | no frames]

Module distance

source code

Algorithms for commonly-used distance metrics.


Author: Mark Granroth-Wilding <mark.granroth-wilding@ed.ac.uk>

Functions [hide private]
 
levenshtein_distance(seq1, seq2, delins_cost=1, subst_cost_fun=None)
Compute the Levenshtein distance between two sequences.
source code
 
levenshtein_distance_with_pointers(seq1, seq2, delins_cost=1, subst_cost_fun=None)
Compute the Levenshtein distance between two sequences.
source code
 
align(seq1, seq2, delins_cost=1, subst_cost=None, dist=False)
Finds the optimal alignment of the two sequences using Levenshtein distance and traces back the pointers to find the alignment.
source code
 
local_levenshtein_distance(seq1, seq2, delins_cost=1, subst_cost_fun=None)
Compute a local alignment variant of the Levenshtein distance between two sequences.
source code
Variables [hide private]
  __package__ = None
hash(x)
Function Details [hide private]

levenshtein_distance(seq1, seq2, delins_cost=1, subst_cost_fun=None)

source code 

Compute the Levenshtein distance between two sequences. By default, will compare the elements using the == operator, but any binary function can be given as the equality argument.

delins_cost is the cost applied for deletions and insertions.

subst_cost_fun is a binary function that gives the cost to substitute the first argument with the second. If not given, a cost of delins is used for any substitution.

levenshtein_distance_with_pointers(seq1, seq2, delins_cost=1, subst_cost_fun=None)

source code 

Compute the Levenshtein distance between two sequences. This does the same thing as levenshtein_distance, but stores pointers to indicate what alignments gave the costs and returns the full cost matrix, plus the pointer matrix.

seq2 is aligned with seq1: that is, a deletion indicates that seq1 moves on a cell without a corresponding cell in seq2.

align(seq1, seq2, delins_cost=1, subst_cost=None, dist=False)

source code 

Finds the optimal alignment of the two sequences using Levenshtein distance and traces back the pointers to find the alignment. Returns a list of pairs, containing the points from the two lists.

In the case of a substitution, it will contain the two points that were aligned. In the case of an insertion, the first value will be None and the second the inserted value. In the case of a deletion, the second value will be None and the first the deleted value.

Note that the pair of values in the case of a substitution may be equal - an alignment - or not - a substitution - depending on the substitution cost function.

Parameters:
  • dist (bool) - return a tuple of the alignment and the dist

local_levenshtein_distance(seq1, seq2, delins_cost=1, subst_cost_fun=None)

source code 

Compute a local alignment variant of the Levenshtein distance between two sequences. Options are the same as levenshtein_distance_with_pointers.

Finds the optimal alignment of seq2 within seq1.

In addition to the operations I, D and S used in levenshtein_distance_with_pointers, we use here '.' to indicate a deletion at zero-cost at the beginning or end.