Biophysical Browser

macbook with slider image 2The ADB Biophysical Sequence Browser includes many novel features for accurate binding site annotation.

1. The browser includes controls to model the free protein concentration of all the proteins loaded into the browser. This biophysical framework uses the protein concentrations to model the nonlinear saturation effect of binding as the concentrations are increased. In addition, this framework allows for the investigation of instances of concentration-dependent competitive binding for the same site between two or more proteins. Competitive binding typically occurs between proteins that belong to the same protein family and thus have similar specificity.

2. Seamless integration with dbSNP data allows us to now depict changes in affinity due to any genetic polymorphisms. Through this integration, the browser is extremely useful for finding functional SNPs. A SNP is more likely to be functional if the polymorphism alters the binding at an overlapping, functional protein-DNA or protein-RNA response element.

3. Integration with the UCSC genomic annotations database allows users to load any UCSC gene annotations into the browser, including annotations of coding sequence, UTRs, and introns. Also, multiple genomic sequences can be loaded into the browser and analyzed together.

4. The ADB Biophysical Browser also introduces novel biophysical positional priors:

  • In order to quantify the level of DNA or RNA accessibility for protein binding, the browser supports accessibility positional priors that scale the on-rate of binding. These tracks specify the probability that the DNA or RNA is accessible for protein binding at each genomic locus. These tracks are typically based on some prior knowledge about the chromatin or methylation states, or the nucleosome positioning. Recent advances in high-throughput sequencing have facilitated genome-wide maps of such epigenetic markers based on cell or tissue-type. Common assays to measure chromatin accessibility are DNAse I and ATAC-seq footprinting whereas a common assay to measure DNA methylation is bisulfite sequencing. Studies have shown that these positional priors, either singly or in combination, can correlate well with occurrences of in vivo binding – and thus assist in discarding inaccessible DNA. The result is a considerable reduction in the false positive rate. Since accessibility positional priors scale the on-rate of binding, they are considered by the ADB as affinity-based positional priors.
  • Also, studies have shown that the relative distance from the transcription start site (TSS) is highly correlated with functional binding. In order to create functional positional priors, the ADB Browser accepts complex, aggregate mathematical formulas with predefined variables. With predefined variables like txStart, txEnd, cdsStart, cdsEnd, and others, the probabilities of loci being functional can be finely defined for any gene region. The ADB Browser also supports conservation positional priors that finely define probabilities of conserved functional loci for any region in a gene. In essence, conservation priors are a type of functional priors since phylogenic conservation is a by-product of evolutionary selection to conserve functional protein-DNA or protein-RNA occupancy along the DNA. Since both functional and conservation priors scale occupancy, they are considered by the ADB as occupancy-based positional priors. To learn more about the p53 functional positional prior click here.
  • All loaded positional priors are normalized into probability maps by first replacing any negative values with 0 and then normaling them to range from 0 to 1.
  • Any overlapping affinity-based or occupancy-based positional priors at a given locus are each multiplied together to create combined affinity-based or occupancy-based positional priors.
  • Therefore, the biophysical equation for functional occupancy in the ADB is:
    • where F is a functional positional prior, C a conservation positional prior, A an accessibility positional prior, [Pfree] the free protein concentration, and Ka the association constant of binding.
  • Due to the efficacy of positional priors, a few different tools for motif discovery and binding site annotation have implemented them in various forms. Fortunately, the ADB Biophysical Browser is able to leverage these already existing cell or tissue-specific datasets.

6. By default, all affinity models loaded into the ADB are assumed to model both sequence-specific and non-specific binding. However, if any of the Ka entries are lower than the configurable non-specific binding constant, then the non-specific binding constant is added to all the relative Ka entries in the model and all the total relative affinities are adjusted accordingly. By this process, all binding site sequences will always have affinities greater than or equal to the non-specific binding constant. The default non-specific binding constant is 1.0 × 10-5 in terms of RT.

7. Gene, polymorphism, binding, and positional prior annotations can be exported separately or together in ADB sequence annotation format.

8. Fast and efficient partitioned interval trees are used to store all genomic information locally to guarantee optimal performance.

9. Both zooming in/out and left/right scrolling of genomic sequence are very smooth and very fast.

10. Easy-to-use sliders control the viewing scale, free protein concentrations, and occupancy thresholds for viewing levels of protein-DNA or protein-RNA binding for each loaded affinity model.

  • All free protein concentrations are controlled by a slider that specifies the level of occupancy for the completely accessible, highest-affinity binding site. The free protein concentration is calculated from this high-affinity reference occupancy and then used to calculate the occupancies of all lower affinity sites.