The project was discussed during a lunch-breakout sessions of all involved PSI membrane centers and representatives of NIH and PDB at the PSI directors meeting at NIH, December 2011. Below are the notes from the meeting.
- Most papers report 30% of the human proteome contains membrane proteins but this is probably old data propagated over the years from some old analysis. This paper will be looked at as an updated analysis and we are at 10% (helical membrane proteins at least). We should describe the differences and why as people will wonder.
See answer at http://modbase.compbio.ucsf.edu/projects/membrane/content/membrane-prote...
- Related to the above, is the human species an exception (10%), or is mouse and other mammalian species also about the same. A simple statement of this is all that is needed.
- We should have a solid section in the paper describing the protein families covered with statistics so there is some biology connection in this paper. We make a statement on page 3 about the big families, but we need to be more quantitative. For example, how big is the gpcr family and does this family break down into subfamilies at different sequence identity cutoffs? Where do the classes other than class A gpcrs fall? How many transporters are there and how does this family break down into structural folds (it was commented that there are roughly 500 members, 48 subfamilies, smaller number of unique folds), sequence identity? Same for channels and any other medium to big subfamilies. Indirectly related, if I look at figure 2, it lists 7 transmembrane as 838 members. Are these all gpcrs? Any exceptions or is it too difficult to sift this out?
An updated cluster histogram with images of family folds might be a start, then a paragraph with a more detailed analysis.
- Single transmembrane helical proteins need to be described (we do not need to include these numbers, but need to describe what the numbers are in the text). There was consensus that there are good prediction programs for signal sequences and so we should be able to separate out the signal sequences info that is not appropriate for what we are doing. Maybe add a percent of the genome with some assumption about single & 2 crossers.
The signal sequence analysis is still missing from the page reference above.
- In the end, we should include a section on feasibility of proposed new targets. For example, are there bacterial homologs that might make certain targets easier.
- Re Josh LaBaer’s question, could we include a priority list for highest impact that an MP in a particular family would have today in terms of covering unknowns –maybe list the top few in the publication?. –and since this would change as each new structure is determined maybe keep a dynamic record on your web site as priorities change.
- Related to item 5 above, are we really talking about 2 different papers (one is on the analysis of the human membrane proteome, the second on target selection of PSI:Biology), or can we efficiently include both stories into one paper? One paper would be easiest and the draft Ursula sent is a great start but I wanted to raise this issue for discussion.
- For journal choice, JSFG was listed but I would like to suggest we try for higher profile. NSMB or Structure.