BioJava PWM XML Format

The BioJava PWM XML format is a positional-independence-model format that assumes that each nucleotide position in the binding site contributes independently and additively to the overall binding-free energy of the DNA motif.

This format also assumes that there is only one affinity maxima in the sequence-affinity space of the protein, and therefore cannot capture possible different binding modes.

The BioJava PWM XML format contains a table of nucleotide probabilities (PWM format), instead of nucleotide relative affinities (PSAM format) or nucleotide counts (TRANSFAC format). The ADB expects models saved in this format to have a “.xml” file extension.

Number of parameters in the model: 3N (where N is the length of the motif)

Here is the complete BioJava PWM XML for a 6bp Cbf1 affinity model:

 

<MarkovModel>
<alphabet name="DNA"/>
<col indx="1">
<weight sym="cytosine" prob="0.9487825401918982"/>
<weight sym="thymine" prob="0.0033001018582571617"/>
<weight sym="guanine" prob="0.009399690901317006"/>
<weight sym="adenine" prob="0.038517667048527535"/>
</col>
<col indx="2">
<weight sym="cytosine" prob="0.053737558499030676"/>
<weight sym="thymine" prob="0.035435747451041465"/>
<weight sym="guanine" prob="0.014293542979801562"/>
<weight sym="adenine" prob="0.8965331510701263"/>
</col>
<col indx="3">
<weight sym="cytosine" prob="0.8803246480000866"/>
<weight sym="thymine" prob="0.08740478088882925"/>
<weight sym="guanine" prob="0.03227052970623411"/>
<weight sym="adenine" prob="4.1404850015281485E-8"/>
</col>
<col indx="4">
<weight sym="cytosine" prob="0.03227052970623411"/>
<weight sym="thymine" prob="4.1404850015281485E-8"/>
<weight sym="guanine" prob="0.8803246480000866"/>
<weight sym="adenine" prob="0.08740478088882925"/>
</col>
<col indx="5">
<weight sym="cytosine" prob="0.014293542979801562"/>
<weight sym="thymine" prob="0.8965331510701263"/>
<weight sym="guanine" prob="0.053737558499030676"/>
<weight sym="adenine" prob="0.035435747451041465"/>
</col>
<col indx="6">
<weight sym="cytosine" prob="0.009399690901317006"/>
<weight sym="thymine" prob="0.038517667048527535"/>
<weight sym="guanine" prob="0.9487825401918982"/>
<weight sym="adenine" prob="0.0033001018582571617"/>
</col>
</MarkovModel>