Human Membrane Production Study

The human Membrane Protein Production Study (hMPPS, http://hmpps.sbkb.org/) aims to investigate expression systems and screening processes for the production of human integral membrane proteins (IMPs). Several PSI membrane centers and NYSGRC came together to ascertain if any of the protein production systems currently in use are optimal for IMP families for which there is currently no structure, with the aim to identify best practice so that production of these challenging proteins can get improved.

Publication highlighted at the PSI Knowledgebase!

Our recent publication was highlighted on the PSI Knowledgebase web-site:
http://sbkb.org/update/news/390

Results of the Analysis

Stage 1: Finding human α-helical transmembrane domains:
Of the 29,375 unique human protein sequences from the RefSeq-37 database of the human genome9, 7,299 were predicted by the TMHMM 2.01 program to contain at least one transmembrane α-helix; 3,838 were predicted to contain two or more such helices. For each full-length sequence, only the transmembrane domain from the first to the last predicted transmembrane α-helix residue was used for all subsequent analyses. Only a single representative of sequences with more than 98% sequence identity to each other was retained, using program USEARCH10, yielding the final non-redundant dataset of 2,925 domains

References for Results/Methods

1. Krogh, A., Larsson, B., von Heijne, G. & Sonnhammer, E.L. Journal of molecular biology 305, 567-80 (2001).
2. Fagerberg, L., Jonasson, K., von Heijne, G., Uhlen, M. & Berglund, L. Proteomics 10, 1141-9 (2010).
3. Nugent, T. & Jones, D.T. PLoS computational biology 6, e1000714 (2010).
4. Kall, L., Krogh, A. & Sonnhammer, E.L. Journal of molecular biology 338, 1027-
36 (2004).
5. Kall, L., Krogh, A. & Sonnhammer, E.L. Bioinformatics 21 Suppl 1, i251-7 (2005).
6. Bernsel, A. et al. Proceedings of the National Academy of Sciences of the United States of America 105, 7177-81 (2008).

Methods used in analysis

Summary of the 5 stages of the analysis:
Stage 1: Finding human α-helical transmembrane domains
Stage 2: Assessing current modeling coverage
Stage 3: Clustering of domain sequences
Stage 4: Assessing target selection strategies
Stage 5: Expanding the target set by adding homologous sequences

Publication available

The manuscript featuring results of this project has now been published as a commentary in Nature Structural & Molecular Biology:

Nat Struct Mol Biol. 2013 Feb;20(2):135-8. doi: 10.1038/nsmb.2508.
Pubmed: http://www.ncbi.nlm.nih.gov/pubmed/23381628

TMH domains with Archaea Homologs

Number of Homologs Gi Id Human Cluster Size Modeling Coverage (at 25% SeqIdent) Annotation
263 115529486 2 100 SubName: Full=ATPase, Cu++ transporting, alpha polypeptide (Menkes syndrome) (ATPase, Cu++ transporting, alpha polypeptide (Menkes syndrome), isoform CRA_a);
254 55743071 2 100 Copper-transporting ATPase 2 (EC 3.6.3.4) (Copper pump 2) (WilsonDE disease-associated protein)
133 28373109

Tags: 

TMH domains with Bacterial Homologs

Number of Homologs Gi Id Human Cluster Size Modeling Coverage (at 25% SeqIdent) Annotation
7985 115529486 2 100 SubName: Full=ATPase, Cu++ transporting, alpha polypeptide (Menkes syndrome) (ATPase, Cu++ transporting, alpha polypeptide (Menkes syndrome), isoform CRA_a);
7940 55743071 2 100 Copper-transporting ATPase 2 (EC 3.6.3.4) (Copper pump 2) (WilsonDE disease-associated protein)
4700 42741659

Tags: 

Sequences with large number of homologs

Most sequences have non-human homolog counts in the hundreds. However, a few classes of sequences have a significantly higher number of homologs. Here is a list of the top-homolog sequences:

  • The winner: Cytochrome B (70,000)
  • Runners up: GPCR sequences: 430 GPCR sequences have 1,000 to 40,000 non-human homologs
  • Still in the running: Cytochrome C (20,000)
  • Some ATPases also have a large number of homologs
  • Solute carrier proteins have up to 2,000 homologs each
  • 6 ABC transporters have 1,000 to 7,000 homologs

Tags: 

Cluster Family - Claudins

Claudins are a family of proteins that are the most important components of the tight junctions, where they establish the paracellular barrier that controls the flow of molecules in the intercellular space between the cells of an epithelium. They have four transmembrane domains, with the N-terminus and the C-terminus in the cytoplasm.

Pages