To augment the potential target pool by the non-human homologs of the human sequences, we calculated sequence profiles of each of the human alpha-helical transmembrane domains, using the full uniprot database (18.5 million sequences). We identified 454,904 unique non-human homolog sequences using a sequence identity threshold of 30% with at least 70% coverage. The corresponding expanded alignments are likely to be valuable for target selection. On average, each domain sequence has approximately 2000 homologs from other organisms in UniProt. The number of homologs per sequence ranges from approximately 50,000 for the transmembrane domain of cytochrome b to 1 for 40 transmembrane domains. 17 sequences didn't yield any non-human homologs in the given sequence identity/coverage range.
- Eukaryota: 368,401 sequences
- Bacteria: 82,386 sequences
- Archaea: 1,985 sequences
The images illustrate the number of sequences with a given number of homologs at 30% and 50% sequence identity (70% length coverage).
The numbers are preliminary, since the coverage has has only been estimated.