Fig. 1 TM domains of the available crystal structures. Top: Two views of the
24 inactive crystal structures from classes A, B, C, and F (aligned to beta2) show the
general GPCR fold of the transmembrane (TM) bundle. Class A in green, class B in
blue (CRF1, GLR), class C in orange (MGLU1, MGLU5), class F in magenta (SMO).
Bottom: Same views for only the 19 inactive class A structures showing the highly
conserved class A TM fold. A detailed view of the conserved hydrogen bonding
networks is shown in S1 Fig.
Fig. 2 Conserved inter-helical contacts. Top left: Diagram of 40 conserved
inter-helical contacts (CHICOs) present in at least 23 out of 24 studied class A
structures. The contacts common to all classes are shown in purple, and contacts
present only in class A in orange. Top right: List of these contacts in
Ballesteros-Weinstein numbering scheme. Bottom: Extracellular view of the same
contacts in the beta2 crystal structure. The contacts in the inner and outer half of the
membrane are shows on the left and right respectively.
Fig. 3 Testing the robustness of the alignment of the Vomeronasal
receptors with the other groups. The table shows similarity between TMs
averaged over all pairs of sequences formed from the two groups (red denotes high
similarity, blue low similarity). For most TMs the optimal choices agree with the
optimal alignment to Aalpha (full table in S5 Fig); all combinations are shown only for
TM5. The same table but using the GPCRtm substitution matrix [74] instead of
BLOSUM62 is shown in S7 Fig. GPCRtm was developed in particular for GPCR
proteins, but in this case both matrices result in the same alignment.
Fig. 4 Testing the robustness of the alignment of the Taste2 receptors with
the other groups. The table shows similarity between TMs averaged over all pairs of
sequences formed from the two groups (red denotes high similarity, blue low similarity).
For most TMs the optimal choices agree with the optimal alignment to Aalpha (full table in
S6 Fig) only TM6 shows a second possible alignment at offset +4. The same table but
using the GPCRtm substitution matrix instead of BLOSUM62 is in S8 Fig. Again, both
matrices result in the same alignment.
Figs. 5 and 6 Sequence alignments of TMs 1 through 7 for the 25 crystal structures.
The sequences are taken from the selected PDB files. The TM helix
residues are colored in the Zappos scheme, which captures the chemical nature of each
residue (e.g. helix breakers, proline and glycine, are shown in purple). The loop residues
are shown in grey. The BW n.50 residue (numbering displayed below the sequences) is
the most conserved within the class A. The consensus sequence is most similar to class
A, because most sequences are from this class. The largest differences are for the last 5
sequences, which belong to the classes B, C, and F. The figure was prepared using
Jalview.
Fig. 7 The phylogenetic tree based only on TM similarity using the GRoSS
alignment (loops were ignored). Color coding denotes the GPCR class. Proteins
with known crystal structure are emphasized with a dot. The full resolution version of
this figure is in S4 Fig.
Fig. 8 Native activation \hot-spot" residues (NACHOs), which are
contacts that change upon receptor activation. The width of the green lines is
proportional to the number of contacts common to all six structures (RHO, beta2AR, M2,
and their active structures). Blue shows the contacts present only in inactive structures,
and not in inactive structures; while red shows the opposite. The upper diagrams show
contacts in the extracellular half of the membrane. We see that there is no systematic
change common to the class A receptors in the conformation of the extracellular half of
the TMs. This is not obvious, because there are conformational changes accompanying
ligand binding. All the systematic changes, which enable G protein binding, occur in
the intracellular half of the TMs. The list only contains 15 different residues in 15
different contacts. Thus many of the residues switch partners upon activation.
Fig. 9 Magnitude of the rigid body moves of the helices necessary to map
one structure to another. All TMs 1-7 from all available structure pairs were
compared and each symbol denotes which TM is the data point from. The coordinate
system is defined in the text. The maximal observed deviation is approximately
proportional to the sequence dissimilarity of the two compared TMs, and it follows the
same trend within class A (blue symbols) and across the GPCR superfamily (green
symbols). The red symbols, which correspond to the active-inactive structure pairs,
show rigid body moves caused by receptor activation. S10 Fig has an analogous plot of
residual RMSD vs. similarity for each helix after the best rigid body transformation.
RMSD shows a similar trend as the plots in this figure.
Table 1
table1.pdf
Number of GPCR sequences by class. The total number of candidate
human GPCR sequences that were considered are listed. The full list of Uniprot ACs is in S2 Table.
Table 2
table2.pdf
Selection of the alignment between class A and classes B, C, and F. This table shows
the selection process for assigning BW .50 residues to non class A proteins. Shifting BW .50 residue on each
helix renumbers the relative BW numbers, effectively changing the labels of contacts observed in these
proteins. Subsequently, the number of common contacts each structure shares with the class A structures
changes for different BW residue assignments. The second rightmost column shows the cumulative number
of contact occurrences among the 24 class A structures (including active conformations). The BW
assignment with the highest number of contacts is selected (except for MGLU5, see text). The selected
alignment is in bold.
Table 3
table3.pdf
Examples of natural variants and mutations that are associated with functional change or disease and which coincide
with the NACHO residues.
Table 4
table4.pdf
Summary of SNPs annotated on Uniprot. The complete list is in S3 Table.
S1 Table
S1Table.pdf
List of studied GPCR crystal structures. When multiple structures are available,
then the one with the highest resolution or the one with least deformed TM helices is
used.
S2 Table
S2Table.csv
GRoSS sequence alignment for all 817 human GPCRs. S1 File has this
alignment in fasta format. Since there are no gaps in the TM domains, the alignment of
each protein is uniquely determined by the BW .50 residues for each TM 1 through 7.
We list also the expected range of the helical TM regions, which is estimated as the
average TM region in the known crystal structures from the same class. In the
discussion of the bitter taste receptors (TAS2Rs), we identified two possible alignments
of TM6, but only the first one is presented in the following table. The second choice is
to decrease the start, end, and BW50 residue of TM6 by 4.
S3 Table
S3Table.csv
GPCR natural variants annotated by Uniprot mapped to BW numbering
and indicating their proximity to the NACHO and CHICO residues. The
mutations are ordered according to the following score: \distance to the closest NACHO
+ distance to CHICO - multiplicity of the closest NACHO - multiplicity of CHICO +
Blosum62 of the mutation".
S4 Table
S4Table.pdf
Conservation of CHICO and NACHO residues among orthologs. For
orthologs of several proteins we computed average amino acids conservation over TM,
and over CHICO/NACHO residues. The data shows that CHICO and NACHO
positions are more conserved than other TM residues in all GPCR classes. Residues
present on both lists are even more conserved. Two measures of conservation provided
by Jalview are used: Consensus is the percentage of orthologs sharing the human amino
acid; and Conservation is a qualitative measure counting the number of conserved
chemical properties. For P2Y12, we used a curated list of 77 orthologs from [100]. For
other proteins, we collected predicted orthologs from the MetaPhOrs database (release
201405 [101]), aligned them with Clustal Omega, and then removed sequences with gaps
in the TM regions.
S1 Fig.
Detailed view of conserved motifs in class A GPCRs. The conserved residues
in 24 different structures (including active) have very similar positions, which shows
that the class A GPCR fold is highly conserved. The full TM bundle is shown in Fig. 1.
S2 Fig.
Sequence similarity (%) of the TM bundles between crystal structures for
the final sequence alignment. Two residues are similar if their BLOSUM62 entry is
positive.
S3 Fig.
Backbone (atoms N, Calpha, C, O) RMSD of the TM bundles for the final
sequence alignment. For a given pair of structures, there may exist a different
sequence alignment, which results in a lover RMSD than the listed one.
S4 Fig.
S4Fig.pdf
High-resolution phylogenetic tree (Fig. 7) based on TM similarity only.
The pdf file is searchable for the UNIPROT accession numbers. Loops were ignored.
Color coding denotes the GPCR class. Proteins with known crystal structure are
emphasized with a dot.
S5 Fig.
Testing the robustness of the alignment of the Vomeronasal receptors with
the other groups. This is an extended version of Fig. 3, same caption.
S6 Fig.
Testing the robustness of the alignment of the Taste2 receptors with the
other groups. This is an extended version of Fig. 4, same caption.
S7 Fig.
Testing the robustness of the alignment of the Vomeronasal receptors with
the GPCRtm substitution matrix. Same caption as in Fig. 3.
S8 Fig.
Testing the robustness of the alignment of the Taste2 receptors with the
GPCRtm substitution matrix. Same caption as in Fig. 4.
S9 Fig.
Diagram of interhelical contacts present in classes B, C, and F. The width of
the line connecting two TMs is proportional to the number of contacts present in all
structures from the given class. The list in red font shows the contacts not present in
any available structure from other classes.
S10 Fig.
RMSD of helices after best rigid body move. Same caption as Fig. 9.
S1 Text
S1Text.pdf
Comparison of the GRoSS alignment to the HMM-HMM alignment [77]
and to the GPCRDB alignment [24, 78].
S1 File
gross-alignment.fasta
The GRoSS alignment in fasta format and annotation of the TM regions
and BW residues in Jalview format for all Human GPCRs. The first 29 sequences are the actual
sequences from the PDB files of used crystal structures; the rest of the sequences are
from Uniprot. N-terminal, loops and C-terminal are not aligned. For interactive work it
is useful to also highlight the TM regions and BW residues using the Jalview annotation
gross-alignment.gff file.