K-mer Table Format

The K-mer Table format is a positional-dependencies-model format that can capture any and all possible dependencies between any positions in the binding motif.

This format contains a table of K-mer relative affinities (PSAM format), where all the K-mers must be of the same length. Typically, not all possible K-mers are present in the table for motifs longer than 10bp due to file size and memory constraints. Instead, only those K-mers with a relative affinity above a certain threshold (typically .01) are listed in the table. The ADB expects models saved in this format to have a “.table” file extension.

While the K-mer table has many, many more parameters as compared to the other models, it also has the potential to be the most accurate. Since this model makes no assumptions about the existence or scope of positional dependencies, it can capture possible non-nearest-neighbor interactions between positions. In addition, the K-mer table can capture multiple affinity maxima in the sequence-affinity space of the protein. Multiple affinity maxima can arise when the protein has different binding modes.

Number of parameters in the model (where N is the length of the motif):

  • If N < 10, 4N for all possible K-mers of length N
  • If N ≥ 10, the number of K-mers with relative affinity ≥ threshold

 

Here are the first 20 lines of a K-mer Table file for a 12bp affinity model (for the dimeric complex Exd-Scr):

Kmer	relKa
ATGATTAATTGC	1
TAATTAATCATT	0.806771253626065
AATGATTAATTG	0.855031018109592
ATGATTAATTGT	0.731800251156829
GTAATTAATCAT	0.940354559987678
AATGATTAATTA	0.809843242810146
ATGATTAATTAT	0.68122963050104
ATGATTTATTGC	0.802031684627441
ATGATTAATTAC	0.91464262703798
GATGATTAATTA	0.820254079828303
GATGATTAATTG	0.845893456881783
ATGATTTATTAC	0.751180522309748
ATGATTAATTAG	0.785971078233234
ATGATTAATTGG	0.784398168794276
TATGATTAATTA	0.630472333713446
ATGATTAATGAT	0.78209526387397
TATGATTAATTG	0.648223356425888
AATGATTTATTA	0.60657199228306
ATGATTAATGGT	0.763677516386204