|
ID_type
|
LS-SNP currently supports queries using the following ID types:
|
SwissProt ID
|
Examples: O94759,P78312
|
dbSNP rsID
|
Examples: rs273258,rs987495
|
Kegg pathway ID
|
Examples: hsa00600,hsa04210
|
HUGO gene ID
|
Examples: ABCB1,BRCA2
|
|
Comparative structure model viewing
|
Rasmol scripts are provided for viewing the location of SNPs on comparative structure models. In UNIX, this script can be used as a rasmol viewer.
|
The models are displayed in ribbon (cartoon) format with the SNP position in spacefill.
|
|
SVM disease predictions
|
Predicted disease-associated with high confidence
|
Predicted disease-associated with low confidence
|
Insufficient confidence to predict
|
Predicted neutral with low confidence
|
Predicted neutral with high confidence
|
|
Disease-associated nsSNPs are predicted by a support vector machine (SVM) trained on OMIM amino-acid variants and putatively neutral nsSNPs from dbSNP. |
What do the confidence ratings mean? |
Excluding the "low confidence" predictions slightly improves the overall accuracy, false positive fraction and false negative fraction of our method. |
The method was evaluated on a benchmark set of 1457 disease-associated missense mutations and 2504 putatively neutral missense polymorphisms. Using a three-fold cross-valdiation protocol, we measured the overall accuracy, false positive fraction (fraction of disease-associated missense mutations that are incorrectly classified), and false negative fraction (fraction of neutral missense polymorphisms that are incorrectly classified). Error bars were obtained by repeating the protocol ten times. |
Confidence is measured by difference between the SVM's discriminant score from 0. The scores range from -1.77 (for the most confidently predicted disease-associated nsSNPs) to 1.71 (for the most confidently predicted neutral nsSNPs). |
We have found empirically that if only predictions with scores greater than abs(0.2) are accepted, the SVM is slightly more accurate than if all predictions are accepted. In our benchmark tests, this threshold results in accuracy of 0.805 (error 0.003), false positive fraction of 0.197 (error 0.002), and false negative fraction of 0.187 (error 0.008). |
If we accept all SVM predictions, regardless of confidence, accuracy is slightly degraded to 0.782 (error 0.003), with false positive fraction 0.214 (error 0.003), and false negative fraction 0.235 (error 0.007). |
For details of our feature selection and prediction method see: |
R. Karchin, M. Diekhans, L. Kelly, D. Thomas, U. Pieper, N. Eswar, D. Haussler, A. Sali "LS-SNP: Large-scale annotation of coding non-synonomous SNPs based on multiple information sources" Bioinformatics. 2005 Jun 15;21(12):2814-20.[Epub 2005 Apr 12] |
R. Karchin, L. Kelly, A. Sali "Improving functional annotation of non-synonomous SNPs with information theory" Pacific Symposium on Biocomputing 2005 World Scientific |
|
Linking to LS-SNP
|
You can link directly to LS-SNP from a web page or external program by constructing url query strings.
|
The query strings may contain either a genomic range or an arbitrary number of genes, SwissProt/TrEMBL IDs, dbSNP rsIDs, or KEGG pathway ids. Mixed id type queries (such as querying for several HUGO gene ids and several dbSNP rsIDs) are not supported.
|
Query url strings should have one of the following formats:
|
Query by genomic range:
|
https://salilab.org/LS-SNP-cgi/LS_SNP_query.pl?RequestType=QueryByRange&idtype=IDTYPE&Range=RANGE&PropertySelect=PROPERTY
|
Query by id list:
|
https://salilab.org/LS-SNP-cgi/LS_SNP_query.pl?idvalue=IDVALUE&RequestType=QueryById&idtype=IDTYPE&PropertySelect=PROPERTY
|
Examples of valid ID types
|
Example url query strings
|
Supported values of IDTYPE and PROPERTY
|
IDTYPE |
rsID |
geneID |
sprotID |
keggID |
PROPERTY |
Protein_structure |
Protein_sequence |
Functional |
Genomic_sequence |
|