|
|
|
|
Coding region polymorphisms that result in amino-acid residue changes [ie,
non-synonymous cSNPS (nsSNPs)] are of critical importance in human disease
and drug sensitivity. The NCBI dbSNP database is a public repository that lists over four million SNPs in the human genome, but currently does not include information about functional consequences of these SNPs. We have developed a software pipeline that maps nsSNPs onto protein sequences, multiple sequence alignments, functional pathways, and comparative protein structure models on a genome-wide scale. By integrating information based on sequence, evolution, and structure with a combination of knowledge-based rules and machine learning (a support vector machine), we predict positions where amino-acid substitutions destabilize protein structure, interfere with the formation of domain-domain interfaces, have an effect on protein-ligand binding, or have deleterious impact on human health. The pipeline identifies 28,000 validated SNPs that produce an amino-acid residue substitution in proteins from the SwissProt/TrEMBL database and predicts that 821 have a deleterious impact on human health. All annotations are accessible via a queryable web interface at https://salilab.org/LS-SNP. These results will be useful in prioritizing nsSNPs relevant to epidemiological association studies, pharmacogenomics, genetic counseling, and in probing the biological mechanisms underlying disease-associated alleles.
|
|
|
|