Comparative protein structure network analysis on 3CLpro from SARS-CoV-1 and SARS-CoV-2
Abstract
The main protease Mpro, 3CLpro is an important target from coronaviruses. In spite of having 96% sequence identity among Mpros from SARS-CoV-1 and SARS-CoV-2; the inhibitors used to block the activity of SARS-CoV-1 Mpro so far, were found to have differential inhibitory effect on Mpro of SARS-CoV-2. The possible reason could be due to the difference of few amino acids among the peptidases. Since, overall 3-D crystallographic structure of Mpro from SARS-CoV-1 and SARS-CoV-2 is quite similar and mapping a subtle structural variation is seemingly impossible. Hence, we have attempted to study a structural comparison of SARS-CoV-1 and SARS-CoV-2 Mpro in apo and inhibitor bound states using protein structure network (PSN) based approach at contacts level. The comparative PSNs analysis of apo Mpros from SARS-CoV-1 and SARS-CoV-2 uncovers small but significant local changes occurring near the active site region and distributed throughout the structure. Additionally, we have shown how inhibitor binding perturbs the PSG and the communication pathways in Mpros. Moreover, we have also investigated the network connectivity on the quaternary structure of Mpro and identified critical residue pairs for complex formation using three centrality measurement parameters along with the modularity analysis. Taken together, these results on the comparative PSN provide an insight into conforma- tional changes that may be used as an additional guidance towards specific drug development.
1 | INTRODUCTION
Coronaviridae family of virus usually possesses enveloped, positive sense RNA virus that generally includes three highly pathogenic viruses such as Severe Acute Respiratory Syndrome Coronavirus1(SARS-CoV-1), Middle East Respiratory Syndrome Coronavirus (MERS-CoV) and SARS-CoV-2.1 SARS-CoV-1 originated in China and caused a global pandemic in 2003 with about a 10% fatality rate.2,3 MERS-CoV was first reported from Saudi Arabia in 2012 and has infected the human population with limited human-to-human trans- mission.4 SARS-CoV-2, a new coronavirus reported for the first timefrom Wuhan, China in December 2019, causes severe human respira- tory disease.5 It has also been characterized as a very contagious pathogenic virus with rapid transmission capability among human-to- human that has caused an outbreak of the severe pulmonary diseases in almost 216 countries, resulting approximately 814 438 confirmed deaths globally till date (WHO report, 2020). WHO has coined SARS- CoV-2 causing disease as COVID-19 pandemic and that has now become a global health emergency and expected to have severe rami- fication on the global economy.Currently, there is no specific treatment available to control COVID-19 pandemic. Efforts are being made towards design ofvaccines as well as drugs against COVID-19. In the past, therapies have been developed against SARS-CoV-1 targets such as proteases, helicases and polymerases. Moreover, Immune modulators such as interferons and corticosteroids have also been used as therapeutics.6 Among the viral targets, main protease (Mpro, 3CLpro) has been desig- nated as an important drug target because of its essential role in the processing of polyprotein translated from the viral RNA.
The Mpro is a homodimer cysteine protease where each protomer consists of three domains, domain I (residue 8-101), domain II (residues 102-184) and domain III (residues 201-303) and catalytic residues and the sub- strate binding sites are situated between domains I and II of Mpro7. Recent crystal structure of SARS-CoV-2 Mpro9 reveals it’s structural similarity with Mpro of SARS-CoV-1 and has a high degree of sequence identity (96.1%) among the two.9 Previous studies showed that HIV-1 protease inhibitors block SARS-CoV-1 Mpro.10 Hence, hav- ing a structural similarity to the Mpro from SARS-CoV-1 and SARS- CoV-2, the known inhibitors could also impart a similar effect on the Mpro of SARS-CoV-2. However, the HIV protease inhibitors show dif- ferent binding effect on Mpro of SARS-CoV-2.11 One of the HIV pro- tease inhibitors, Lopinavir, was shown to inhibit Mpro of SARS-CoV-1, in-vitro.12 While, none of the HIV inhibitors was able to significantly inhibit Mpro of SARS-CoV-2, in-vitro.13 Other known potent inhibi-tors such as α-ketoamide and N3 are also reported to have differen-tial inhibition on the activity of Mpros from SARS_CoV-1 and SARS- CoV-2.9,14,15A small rearrangement of protein at the structural level by the substitution of a few amino acids at the substrate binding pockets or allosteric sites, results in changes in internal interactions which may lead to differing patterns of inhibitors sensitivity.16 Similar might be the case with the Mpro from SARS-CoV-2, where few changes in the amino acid sequence in comparison to Mpro of SARS-CoV-1, may attribute towards the differential effect on Mpros. Since no significant structural changes are noticed at the active site, a subtle change of interactions at the allosteric sites of the proteins may have an effect of sensitivity of the inhibitors.
Hence, a protein structure network (PSN) based approach can investigate the negligible conformational changes associated in the protein structure.A PSN mainly depicted on a protein structure as a system of networks that comprises nodes and links. Nodes are represented by amino acids residues and links are represented as long and short range interactions among the nodes. Interestingly, this method identifies small changes in structures of protein which are otherwise not easily detectable.17–19 Moreover, similar methods have already been implicated in investigating various features of a protein such structural flexibility,20 protein domain folding,21 key residue in folding,22 structural pattern,23 cluster of residue flexi- bility24 and identification of functional residues.25 Similarly, a PSN-ENM based method has also been used to construct a PSN on a protein 3-D structure by integrating the information from systems dynamic supplied from the Elastic Network mode analysis (ENM-NMA).26 A global (average) network parameters generated from these methods reveal diminutive structural changes among proteins.In order to probe subtle conformational changes occurring due to differences of few amino acids in the Mpro sequences, alteration of local contacts as well as residue specific network parameters were investigated on the structures of Mpro (apo and inhibitor bound states) from both SARS-CoV-1 and SARS-CoV-2 by using the PSN and PSN- EMA methods. Recently, topological interaction properties of Mpro were analyzed.27 In another similar study, an analysis of changes in residue interactions of Mpro when bound to N3 inhibitor was also investigated.28 However, in both the cases, monomeric unit of Mpro was considered for the analysis. While, it is well known that a biologi- cal active Mpro molecule exists as a dimer. In fact, as reported earlier that the subunits interfacial region of the Mpro can be a possible target for a rational drug design against the SARS-CoV.29,30 Hence, it is essential to understand the network connectivity in the biologically active state. Here, we analyzed a comparative PSN in both the prote- ases (in a biological active state). Our PSN study showed differences in contacts and communication patterns in Mpro of SARS-CoV-2 as compared to Mpro of SARS-CoV-1. Further, we elucidated the negligi- ble changes throughout the protein structure by quantifying their resi- dues connectivity pattern and mapped the network parameters on 3D structures of protein. We also applied a graph theory centrality con- cept such as betweeness, closeness, hubs and modularity to highlight critical residues for complex formation. This study will provide an understanding about the sensitivity and effectiveness of the existing inhibitors and this would further be helpful to design specific inhibitors.
2| MATERIALS AND METHODS
The 3-D coordinates of 3CLpro were downloaded from Protein Data Bank (1Z1I31 & 6M03 for apo Mpro of SARS-CoV-1 & 2, respectively; 5 N19 & 6Y2G9 for the inhibitor bound complex of Mpro from SARS- CoV-1 & 2, respectively). Here, Calpha atom of amino acid residues is considered as a node and it forms an edge with another Calpha atom if the distance cutoff is 7 Å. Edge weighted Calpha network that is based on Euclidean distance was constructed using NAPS32 and protein net- work global parameters such as Degree, Betweeness Centrality, and Clustering Coefficient were analyzed.Clustering coefficient (CC) computes the cliquishness for each node in a protein network graph. Cliquishness is defined with respect to total possible edges between them. CC varies between 0 (for no clustering) and 1 (for maximum clustering).Betweenness centrality (BC) is a centrality measurement in a network graph which is based on the total number of shortest path passing between connected nodes in such a way that edges passing for weighted graphs is minimized.Closeness centrality (CCen) represents the closeness of a node to other nodes. It is centrality measurement which calculates the sum of the shortest path.Community or modularity is the region in a network where nodes are more connected to each other.PSN and elastic network model-normal mode analysis (ENM-NMA) approaches were used for long-space communication and effect of allostery on network connectivity.33–35 Previously, the ENM-NMA approach for PSN was applied to characterize the topological and allosteric communication pathways in proteins.36 Other network parameters such as hubs, community, and structural communica- tion analysis were analyzed using a mixed PSN ENM-NMA approach implemented in WebPSN.26 It constructs Protein Struc- ture Graph (PSG) based on interaction strength of two connected nodes.
3| RESULTS AND DISCUSSION
A PSN depicts a network of nodes and links. These nodes are repre- sented by amino acids and links are represented as long and short range interactions among the nodes and that provide a useful infor- mation at the contact level in a protein structure. Mpro has been desig- nated as an attractive drug target. In spite of having 96% sequence identity and negligible variation in 3-D structure compared to SARS- CoV-1, the drugs/inhibitors developed so far against Mpro of SARS- CoV-1 showed different inhibitory effect on the Mpro of SARS-CoV-2. Since, structural changes among the two Mpros are negligible, hence a network based approach has been utilized to map subtle conforma- tional alteration arising in the protein structure. Network parameters such as Degree, BC, C Cen, CC, SP and Modularity were analyzed for both free and inhibitor bound forms of SARS-CoV Mpro structures. A little difference was observed in the average network parameters of the Mpro structures (Table 1) suggests a diminutive change in the overall structures.Calculated degrees are compared among the two structures. It was found that the near active site residues (T26, I43, Q189 and Q192) of SARS-CoV-2 Mpro showed an increase in degree by 2, com- pared to SARS-CoV-1 Mpro, and while degree of D187 was observed to be decreased by 2 in SARS-CoV-2. It was also observed that the N and C-terminal residues (G2 and E290), crucial for dimerization, were associated with higher degree compared to the same residues of Mpro in SARS-CoV-1.
Additionally, few other residues of domain II & III of SARS-CoV-2 Mpro also found to have changes in the calculated degree. List of residues showing the largest change in degrees among the two structures are listed in Table S1. Interestingly, replacement of A46 in SARS-CoV-1 Mpro with S46 in SARS-CoV-2 Mpro resulted in the rearrangement of contacts and observed to form new contacts with L27 and H41 from domain I of SARS-CoV-2 Mpro. A recent report states that SARS-CoV-2 Mpro possesses an active site with a solvent surface accessible area of 356 Å2 and the solvent accessible surface area in case of SARS-CoV-1 Mpro was observed to be only 256 Å2.28 These changes may be attributed due to the variation ofamino acid, S46 which resulted in the rearrangement of contacts in SARS-CoV-2 Mpro. Additionally, few residues of domain II of SARS- CoV-2 Mpro are observed to form five new contacts (Figure 1A).where interaction percentage (Iij) of nodes i and j represents the number of side chain atoms pairs with given cut off (4.5 Å), Ni and Nj are normalization factors.37–39 The interaction strength (represented as percent) between residues i and j (Iij) is calculated for all node pairs. If Iij is more than the minimum interaction strength cutoff (Imin) among the residue pairs, then is considered to be interacting and hence represented as a connection in the PSG. It builds PSG on atomic cross-correlation motions using ENM-NMA.39 All network parameters were visualized using PYMOL.Changes in the contact patterns surrounding the active site of SARS- CoV-2 Mpro is due to change in amino acid, suggesting a subtle con- formational change in the SARS-CoV-2 Mpro, which may contribute towards the efficiency of inhibitors on Mpros.
Hubs were also analyzed for the SARS-CoV-1 and SARS-CoV-2 Mpros. Interestingly, a significant difference in the total number of hubs was observed among the two main proteases. The Mpro from SARS-CoV-1 possesses 42 hubs, whereas the same from SARS-CoV-2 consists of 47 hubs, in total (Table 2). Many hubs were found to be similar among the two structures. Few hubs were distinctive to eachstructure, suggesting their important role in interactions and stability. Hubs near the active site region such as H41, H163, D187 and Q192 from the SARS-CoV-2 are assumed to be crucial for the catalysis. The unique hubs are distributed in all the three domains of SARS-CoV-2Mpro, and may suggest a subtle change in inter domain communication within the protease.Betweeness Centrality (BC) have been reported to play an impor- tant role in the structural complexes. In our study, the residues fromthe both Mpro structures and their corresponding BC scores are plot- ted in Figure S1A,B. The trend of the plots is quite comparable in both the structures except few residues shows significant change in the BC scores. Residues with significantly high BC scores (z scores ≥0.4) from each Mpro structures are listed in Table S2. Significantly high BC value of a residue signifies its involvement in the communication among dif- ferent modules of the PCN. The residue V114 with a high BC value is observed to make a new contact with F140 in case of SARS-CoV-2. In addition, other residues such as C128, G146, and T292 found to have high BC values that are also involved in the formation of a new set of contacts in the SARS-CoV-2 (Figure 1B). The new contacts formed in the SARS-CoV-2 Mpro, suggest their role in providing connectivity among residues of the network.Residues wise CC were analyzed for both protease structures and values are depicted in Figure S1C,D.
The residue at 46 positions in SARS-CoV-2 compared to SARS-CoV-1, resulted in an increase in CC of nearby N-finger active site residues such as G23, T24, and S46. Interestingly, T24 is also observed to form a direct contact with the active site residues in SARS-CoV-2. This suggests that changes in the interconnectedness among the residues at and near the active site region may play a role towards selectivity of the inhibitors.Though the average parameters calculated from the PSN of Mpros from both SARS-CoV-1 & 2, did not show significant changes. How- ever, residues wise comparison of degree and BC values among the two exhibited noticeable change (Table S1 and Table S2). Moreover, these observations on the change in the network parameters suggested their effect on the local conformations of Mpros, which is assumed to provide an insight into the sensitivity and selectivity of inhibitors.Mpro structures from SARS-CoV-1 and SARS-CoV-2 do not show much difference, however, an analysis of the contact points either generated or lost due to change of few amino acids, may provide an insight into restructuring of modules within the Mpro. Hence, the community structure for both SARS-CoV-1 & SARS-CoV-2 Mpros was analyzed. The analysis resulted in 12 communities in case of SARS-CoV-1 Mpro and 11 communities in SARS-CoV-2 Mpro and residues of each community are shown in Figure 2. The residues at the active site region of SARS-CoV-2 Mpro are observed to consti- tute the largest community shown as C1 red module in Figure 2A, that consists of 12 nodes, 18 links and seven hubs. Unlike SARS- CoV-2, the largest community (formed with eight nodes, 11 links, four hubs) in case of SARS-CoV-1 Mpro is located at the interfacial residues of domain I and II, instead of active site region (Figure 2B). Moreover, the community formed at the active site region of SARS- CoV-1 Mpro is found smaller than that of SARS-CoV-2 Mpro.
Rearrangement of modules was also observed throughout the structure which indicates the perturbation at global level in the 3-D structure of two proteins. The PSG of SARS-CoV-1 Mpro inhibitor bound complex is richer in nodes, links as well as hubs compared to its unbound state (Table 3). Binding of inhibitors to the Mpro generated many hubs at the inhibitor binding site and these hubs are associated with residues such as H41, Y54, F140, S144, H163, H172, and Q192. Additionally, inhibitor com- plex specific hubs are also formed between the interface of domain I & II (residues C16, Y101, F150, and L115), while invariant hubs spanned throughout the structure. Interestingly, similar trends for nodes and links were not observed for SARS-CoV-2 Mpro inhibitor bound complexes when compared with apo form of the same (Table 3). However, the total number of hub residues in the apo form of SARS-CoV-2 Mpro was found to be 36, whereas the inhibitor com- plex of the same possesses 35 hub residues. Few hubs are found to be unique in each structure, suggesting their role towards the specific- ity of inhibitors. Unlike SARS-CoV-1, the residue H41 and Q192 from SARS-CoV-2 Mpro inhibitor complex form do not participate in the active site hubs formation.We mapped the perturbations on the 3-D structure which con- siders nodes and links unique to each structure (Figure 3A,B). In the case of the apo and inhibitor bound states of SARS-CoV-1 Mpro, the bound inhibitor was observed to induce perturbations which are essentially consistent with a gain of intermolecular links and nodes. The perturbations associated with a gain of links are mostly located inCoV-2 Mpro. Changes in the most frequent nodes and links in the structural communication upon inhibitor binding was also observed (Figure 4). The significant inhibitor induced perturbations in the form of loss or frequency reduction of nodes were observed within the active site region of the complexes in either cases (for SARS-CoV-1: C44, P52, Y54, F140, S144, L167, R187, Q192 and for SARS-CoV-2:F140, S144, H163 and R187).
Moreover, a redistribution of nodes was observed around the N-finger of Mpro inhibitor complex from SARS-CoV-1, suggesting a role of intercommunication exchange between domain II & III, which may be crucial for the dimerization of Mpro.the region of small helix near P2 group consists of residues S46- L50, β-hairpin loop near P3-P4 (Res E166-G170) and P5 loop (Res T190- A194). Additional gain of links is also observed in the interfacial regionof domain I and II, along with N-finger residues making new links with C-terminal of domain III. However, in case of SARS-CoV-2 Mpro, the comparison of the apo and inhibitor bound states was observed to have not very significant perturbations and the specific contacts show changes to a lesser extent than SARS-CoV-1 Mpro.Perturbations in inter and intra subunit communication due to binding of inhibitors were also analyzed (Table 4). To investigate more into the communication pattern within the whole structure, we ana- lyzed meta-path and mapped residues participating in each path. The length of the shortest communication paths in SARS-CoV-1 Mpro apo form was 62 345 and a total of 77 391 paths were observed in the inhibitor bound form of Mpro. This indicates an increase in the path- ways upon the inhibitor binding to SARS-CoV-1 Mpro. In contrast, a decrease in possible pathways was observed in case of SARS-CoV-2 Mpro inhibitor bound complex. However, the average path length increased in both inhibitors bound states of SARS-CoV-1 and SARS-The community structure of PSNs of the Mpro from SARS-CoV-1 and SAR-CoV-2, in apo and inhibitor bound states were analyzed. The SARS-CoV-1 Mpro bound complex is found to have eleven communi- ties, whereas the SARS-CoV-2 Mpro inhibitor complex possesses six communities and these communities are mapped on PSNs. In both the complexes, inhibitor binding sites were part of a large community, C1 (Figure 5). The long loop connecting domain II & III was involved in the second largest community for SARS-CoV-1 complex which includes seven nodes, eleven links, and three hubs.
This connecting loop is a part of community C3 in case of SARS-CoV-2 complex which includes five nodes, seven links and three hubs. The rearrangement of communities in the inhibitor complex forms in comparison to the apo forms, suggests that the inhibitors induce perturbations in the net- work connectivity. Our observations show a correlation with the pre- vious report of RIN on Mpros of SARS-CoV-1 and SARS-CoV-2, with and without inhibitor N3.28 Similarly, a recent study on topologies of Mpros by PCN methods highlights sensitive structural perturbations.It has been known that biologically active SARS-CoV Mpro exists as a dimer. Previously, mutagenesis of E290A in SARS-CoV-2 Mpro reported loss of catalytic activity, indicating importance of domain III in dimerization.40 Hence, In order to highlight structural differencesand commonalities in the quaternary structures (homodimers) of Mpros, PSN parameters were computed and identified the crucial resi- dues from the subunit interfacial region. Average network parameters for the two subunits of Mpros do not show any significant changes, hence no correlation was drawn (Table 1). So, we further evaluated the previously mentioned network components like links, hubs, and link mediated hubs. Interestingly, these parameters were observed to be slightly higher for SARS-CoV-1 dimeric form (Table 3). Moreover, significant changes were observed in the hubs and links mediated hubs. A total number of 58 hubs residues were observed in the quaternary SARS-CoV-1 Mpro, whereas 53 hubs were noted in the SARS-CoV-2 Mpro homodimer (Figure 6A). Interestingly, subtle rearrangements of hub residues in domain III in both homodimerswere also observed.
In case of SARS-CoV-1 Mpro, the residues at 206: A, 259: A, 289: A, 218: B, 230: B (the alphabets A & B represent sub-units) were observed to form a specific hub. Similarly, the resi- dues at 288: A, 273: B, 288: B, 290: B from SARS-CoV-2 Mpro were involved in a hub formation. In addition, specific hub residues were also observed in the domain II of SARS-CoV-2 (161: B, 181: B) and unfortunately these were not seen to form hubs in SARS-CoV-1 Mpro homodimer. Other residues at 39: A, 141: B, 163: B, 172: B were also involved in hub formation in SARS-CoV-1 Mpro. It was observed that residues at 185: A, 192: A, 192: B from SARS-CoV-2 Mpro were engaged in the formation of active site hubs. These rearrangements in hub residues suggest some disruption in the inter-domain communica- tion between both proteases in their quaternary structure.The BC for each residue of Mpro from SARS-CoV-1 as well as SARS-CoV-2 dimeric complexes was analyzed and residues with high BC are listed in (Table S3). N-finger residues (Res 3 & and C- terminal residues of domain III (Res 282 & 290) that are at the inter- face of both monomers were showing high betweeness values in both the structures, indicating the importance of these residues in the dimer formation.39 Additionally, this interfacial residues showing high BC are also involved in hub formation which suggests their role in catalysis. The residues at positions 28 and 144 in SARS-CoV-2 Mpro showed high BC value and their essentiality for enzyme activity and dimerization has been confirmed with experimental mutagenesis stud- ies in homodimer formation.41The closeness values of all residues were computed and classified them into three categories: i. High closeness value, ii. Intermediate closeness values, and iii. Small closeness values. Our results suggested that residues form N-finger (3–11), B2 (112-117), B3 (122-130) and B4 (149-151) of domain II might be considered as the most likely recogni- tion sites (Figure 6B,C).
Previously, it has been reported that residue C117 makes direct interaction with N28 and plays a major role in the dimer stability and enzymatic activity of SARS-CoV-1 Mpro.41Experimentally, it has been identified that N28A mutant plays a critical role in active site structural integrity and positions the important resi- dues involved in dimer interface binding and catalysis of substrate.42 This suggests that residues showing high closeness might be responsi- ble for long range interactions that are crucial for dimerization.The search for shortest communication pathway led to a total of 290 236 and 468 290 paths for the Mpros of SARS-CoV-1 and SARS- CoV-2, respectively which indicates a significant increase in paths for SARS-CoV-2 Mpro dimeric form (Table 4). The total number of nodes and links along with the specific nodes and links in global Metapath were observed to be 60% and 21.79% for SARS-CoV-1 and SARS-CoV-2, respectively. Additionally, we observed some of the interface residues are specific to SARS-CoV-1 and those are fre- quent nodes in communication pathways (Chain A: Res. 3, 6, 123, 126, 140, 290; Chain B: Res. 4, 6, 116, 141, 122, 126, 299).Few substrate binding residues (Chain A: Res. 41, 49, 144, 163, 165; Chain B: Res.163, 167) were also involved in the communication pathway of SARS-CoV-1 Mpro, while these corresponding frequent nodes were absent in SARS-CoV-2 Mpro. In addition, average hub percent involved in the communication pathway was also observedto be decreased for SARS-CoV-2 (Table 4). These observations among the two suggest a change at structural communication level in the dimeric form of Mpro.Common modules are shared among the two homo-dimeric Mpros and depicted in Figure 6D. Two large communities CI and CII consisting of active site residues from both monomers possess 10 nodes, 15 links, and 7 hubs. Third large community CIII is distrib- uted on the strands of domain II in both monomers. Additionally, a fourth community was observed at the interface residues of bothmonomers; N-term residue M6 of chain A and β-strands of domain IIfrom the chain B. One of the residues F140 from this community has been previously reported to present on the dimer interface of SARS- CoV Mpro and mutation of this residue resulted in the conforma- tional change of Mpro.
4|CONCLUSIONS
Our study on comparative PSN analysis of Mpros from SARS-CoV-1 and SARS-CoV-2, investigated the noticeable difference in the net- work parameters among the two proteases. Moreover, the study also highlights differential perturbation among the community structures in inhibitor bound form of proteins. Interestingly, the investigations helped us to probe subtle conformation changes associated through- out the structure of the two proteases, which otherwise are not evi- dent from the crystals structures. Our observations gauge an insight into the diminutive structural changes which may provide an under- standing towards selectivity of inhibitors towards Mpros of SARS- CoV-2. In addition, the investigation of PSN on the quaternary struc- ture of Mpros suggests structural and network changes at the interface as well as long range interactions and highlights critical residue pairs for the complex formation using three centrality measurement param- eters. This study is a thorough comparative investigation of subtle structural changes that may provide an insight into designing a specific Simnotrelvir inhibitor/drug.