# Bio.Var: Biological Variation.

## Identifying and counting mutations

You can count the numbers of different types of mutations in a pairwise manner for a set of nucleotide sequences.

### Different types of mutation

The types of mutations that can currently be counted are `DifferentMutation`

,s `TransitionMutation`

s, and `TransversionMutation`

s.

#
** Bio.Var.AnyMutation** —

*Type*.

`AnyMutation`

describes a site where two aligned nucleotides are not the same.

Every kind of difference is counted.

#
** Bio.Var.TransitionMutation** —

*Type*.

`TransitionMutation`

describes a situation with two aligned nucleotides, where a purine has mutated into another purine, or a pyrimadine has mutated into another pyrimadine.

Possible transition mutations are: A <-> G C <-> T

#
** Bio.Var.TransversionMutation** —

*Type*.

`TransversionMutation`

describes a situation with two aligned nucleotides, where a purine has mutated into a pyrimadine or vice versa.

Possible transversion mutations are: A <-> T C <-> G A <-> C T <-> G

### The `count_mutations`

method

Mutations are counted using the `count_mutations`

method. The method outputs a tuple. The first value is the number of mutations counted. The second value is the number of sites examined. Sites which have gaps and uncertain nucleotides are not examined and so this second value will be less than the length of the two biological sequences.

count_mutations(DifferentMutation, [dna"ATCGATCG", dna"ACCGATCG"]) count_mutations(TransitionMutation, [dna"ATCGATCG", dna"ACCGATCG"]) count_mutations(TransversionMutation, [dna"ATCGATCG", dna"ACCGATCG"]) count_mutations(TransitionMutation, TransversionMutation, [dna"ATCGATCG", dna"ACCGATCG"]) count_mutations(DifferentMutation, [rna"AUCGAUCG", rna"ACCGAUCG"]) count_mutations(TransitionMutation, [rna"AUCGAUCG", rna"ACCGAUCG"]) count_mutations(TransversionMutation, [rna"AUCGAUCG", rna"ACCGAUCG"]) count_mutations(TransitionMutation, TransversionMutation, [rna"AUCGAUCG", rna"ACCGAUCG"])

### The `is_mutation`

method

#
** Bio.Var.is_mutation** —

*Function*.

is_mutation{T<:Nucleotide}(::Type{AnyMutation}, a::T, b::T)

Test if two nucleotides constitute a `AnyMutation`

.

is_mutation{T<:Nucleotide}(::Type{TransitionMutation}, a::T, b::T)

Test if two nucleotides constitute a `TransitionMutation`

.

is_mutation{T<:Nucleotide}(::Type{TransversionMutation}, a::T, b::T)

Test if two nucleotides constitute a `TransversionMutation`

.

## Computing evolutionary and genetic distances

Just as you can count the number of mutations between two nucleotide sequences, you can compute the evolutionary distance between two nucleotide sequences.

### Different evolutionary distance measures

The types of distances that can currently be computed are described below.

#
** Bio.Var.Count** —

*Type*.

A distance which is the count of the mutations of type T that exist between the two sequences.

#
** Bio.Var.Proportion** —

*Type*.

Proportion{T} is a distance which is the count of the mutations of type T that exist between the two biological sequences, divided by the number of valid sites examined (sites which don't have gap or ambiguous symbols).

In other words this so called p-distance is simply the proportion of sites between each pair of sequences, that are mutated (again where T determines what kind of mutation).

#
** Bio.Var.JukesCantor69** —

*Type*.

The JukesCantor69 distance is a p-distance adjusted/corrected by the substitution model developed by Jukes and Cantor in 1969.

The Jukes and Cantor model assumes that all substitutions (i.e. a change of a base by another one) have the same probability. This probability is the same for all sites along the DNA sequence.

#
** Bio.Var.Kimura80** —

*Type*.

The Kimura80 distance uses a substitution model developed by Kimura in 1980. It is somtimes called Kimura's 2 parameter distance.

The model makes the same assumptions as Jukes and Cantor's model, but with a crucial difference: two-kinds of mutation are considered called Transitions and Transversions. Transitions and transversions can occur with different probabilities in this model, however, both transition and transversion rates/probabilities are the same for all sites along the DNA sequence.

### The distance method

#
** Bio.Var.distance** —

*Function*.

Compute the pairwise genetic distances for a set of aligned nucleotide sequences.

The distance measure to compute is determined by the type provided as the first parameter. The second parameter provides the set of nucleotide sequences.

#
** Bio.Var.distance** —

*Method*.

distance{T<:MutationType,A<:NucleotideAlphabet}(::Type{Count{T}}, seqs::Vector{BioSequence{A}})

Compute the number of mutations of type `T`

between a set of sequences in a pairwise manner.

This method of distance returns a tuple of the number of mutations of type `T`

between sequences and the number of valid (i.e. non-ambiguous sites) counted by the function.

#
** Bio.Var.distance** —

*Method*.

distance{T<:MutationType,A<:NucleotideAlphabet}(::Type{Proportion{T}}, seqs::Vector{BioSequence{A}})

This method of distance returns a tuple of a vector of the p-distances, and a vector of the number of valid (i.e. non-ambiguous sites) counted by the function.

#
** Bio.Var.distance** —

*Method*.

distance{A<:NucleotideAlphabet}(::Type{JukesCantor69}, seqs::Vector{BioSequence{A}})

This method of distance returns a tuple of the expected JukesCantor69 distance estimate, and the computed variance.

#
** Bio.Var.distance** —

*Method*.

distance{A<:NucleotideAlphabet}(::Type{Kimura80}, seqs::Vector{BioSequence{A}})

This method of distance returns a tuple of the expected Kimura80 distance estimate, and the computed variance.