Positional Dependencies Models

Positional dependencies models capture at least some of the affinity dependencies that can exist within a binding site. Such a dependency exists when a protein’s affinity for a particular nucleotide in the binding site is affected by the properties of another nucleotide at another position. Multiple studies have shown that the strongest dependencies between different positions within the binding site tend to be nearest-neighbor. This finding is in sync with existing protein-crystal structures that show that when amino acids simultaneously interact with multiple DNA nucleotides, they also tend to be nearest-neighbor.

An Nth order model is able to capture N simultaneous interactions between N+1 nearest-neighbor positions in a binding site of length K. The table below summarizes current Nth order models used to capture protein-DNA and protein-RNA specificity.

 

Order

Affinity Models

0th Order PWM, PSAM, 0th Order pHMM (Positional Independence Models)
1st Order FSAM (dinucleotide PSAM), dinucleotide PWM, 1st Order pHMM
2nd thru (K-2)th Order Nth Order PSAM, PWM, or pHMM
(K-1)th Order All K-mer model

 

PSAMs and PWMs are fixed-length models that cannot model variable length motifs. Profile Hidden Markov Models (pHMMs) contain additional parameters that model tolerated insertions and deletions (indels) between the positions in the consensus binding motif.

Note that as the order of the models increase, the number of parameters in the models also increases significantly. For example, when going from a 0th Order PSAM to the All K-mer model the number of parameters increases from 3K to 4K.