Genomic characterisation and epidemiology of 2019 novel coronavirus: implications for virus origins and receptor binding. The S1 protein of Pangolin-CoV is much more closely related to SARS-CoV-2 than to RaTG13. Posterior rate distributions for MERS-CoV (far left) and HCoV-OC43 (far right) using BEAST on n=27 sequences spread over 4 years (MERS-CoV) and n=27 sequences spread over 49 years (HCoV-OC43). Current Overview on Disease and Health Research Vol. 6 acknowledges support by the Research FoundationFlanders (Fonds voor Wetenschappelijk OnderzoekVlaanderen (nos. RegionB is 5,525nt long. Evol. It performs: K-mer based detection Map/align, variant calling Consensus sequence generation Lineage/clade analysis using Pangolin and NextClade Access the DRAGEN COVID Lineage App on BaseSpace Sequence Hub Instead, similarity in codon usage metrics between the SARS-CoV-2 and eukaryotes analyzed was correlated with coding sequence GC content of the eukaryote, with more similar codon usage being identified in eukaryotes with low GC content similar to that of the coronavirus (b). It is clear from our analysis that viruses closely related to SARS-CoV-2 have been circulating in horseshoe bats for many decades. We use three bioinformatic approaches to remove the effects of recombination, and we combine these approaches to identify putative non-recombinant regions that can be used for reliable phylogenetic reconstruction and dating. Menachery, V. D. et al. We find that the sarbecovirusesthe viral subgenus containing SARS-CoV and SARS-CoV-2undergo frequent recombination and exhibit spatially structured genetic diversity on a regional scale in China. A dynamic nomenclature proposal for SARS-CoV-2 lineages to assist 26, 450452 (2020). Preprint at https://doi.org/10.1101/2020.05.28.122366 (2020). Virus Evol. 84, 31343146 (2010). Graham, R. L. & Baric, R. S. Recombination, reservoirs, and the modular spike: mechanisms of coronavirus cross-species transmission. All three approaches to removal of recombinant genomic segments point to a single ancestral lineage for SARS-CoV-2 and RaTG13. Cov-Lineages A., Lytras, S., Singer, J. Nature 579, 270273 (2020). Biol. These shy, quirky but cute mammals are one of the most heavily trafficked yet least understood animals in the world. 3). Maciej F. Boni, Philippe Lemey, Andrew Rambaut or David L. Robertson. 36, 17931803 (2019). # File containing the ID of the samples, the Sequence of the haplotype, the Continent, the country, the Region, the Data, the Lineage of Pangolin and Nextstrain clade, and the haplotype number # In this order # Could be obtained from the database Yuan, J. et al. 4). 1 Phylogenetic relationships in the C-terminal domain (CTD). Correspondence to For weather, science, and COVID-19 . The fact that these estimates lie between the rates for MERS-CoV and HCoV-OC43 is consistent with the intermediate sampling time range of about 18years (Fig. The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus . Given that these pangolin viruses are ancestral to the progenitor of the RaTG13/SARS-CoV-2 lineage, it is more likely that they are also acquiring viruses from bats. Biazzo et al. Preprint at https://doi.org/10.1101/2020.04.20.052019 (2020). Slider with three articles shown per slide. ISSN 2058-5276 (online). Split diversity in constrained conservation prioritization using integer linear programming. Isolation and characterization of a bat SARS-like coronavirus that uses the ACE2 receptor. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. collected SARS-CoV data and assisted in analyses of SARS-CoV and SARS-CoV-2 data. Don't blame pangolins, coronavirus family tree tracing could prove key J. Virol. Meet the people who warn the world about new covid variants Lond. 24, 490502 (2016). Evol. Lancet 395, 949950 (2020). In outbreaks of zoonotic pathogens, identification of the infection source is crucial because this may allow health authorities to separate human populations from the wildlife or domestic animal reservoirs posing the zoonotic risk9,10. 206298/Z/17/Z. Its origin and direct ancestral viruses have not been . Stegeman, A. et al. 4, vey016 (2018). Boni, M. F., Posada, D. & Feldman, M. W. An exact nonparametric method for inferring mosaic structure in sequence triplets. Accurate estimation of ages for deeper nodes would require adequate accommodation of time-dependent rate variation. Eden, J.-S., Tanaka, M. M., Boni, M. F., Rawlinson, W. D. & White, P. A. Recombination within the pandemic norovirus GII.4 lineage. PI signals were identified (with bootstrap support >80%) for seven of these eight breakpoints: positions 1,684, 3,046, 9,237, 11,885, 21,753, 22,773 and 24,628. & Andersen, K. G. Pandemics: spend on surveillance, not prediction. Biol. Early detection via genomics was not possible during Southeast Asias initial outbreaks of avian influenza H5N1 (1997 and 20032004) or the first SARS outbreak (20022003). MERS-CoV data were subsampled to match sample sizes with SARS-CoV and HCoV-OC43. More evidence Pangolin not intermediary in transmission of SARS-CoV-2 For coronaviruses, however, recombination means that small genomic subregions can have independent origins, identifiable if sufficient sampling has been done in the animal reservoirs that support the endemic circulation, co-infection and recombination that appear to be common. Here, we analyse the evolutionary history of SARS-CoV-2 using available genomic data on sarbecoviruses. Now, the two researchers used genomic sequencing to compare the DNA of the new coronavirus in humans with that in animals and found a 99% match with pangolins. & Bedford, T. MERS-CoV spillover at the camelhuman interface. This is notable because the variable-loop region contains the six key contact residues in the RBD that give SARS-CoV-2 its ACE2-binding specificity27,37. Extended Data Fig. We thank all authors who have kindly deposited and shared genome data on GISAID. Mol. Mol. Lu, R. et al. 4), but also by markedly different evolutionary rates. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Boxplots show interquartile ranges, white lines are medians and box whiskers show the full range of posterior distribution. RegionC showed no PI signals within it. Mol. Share . To estimate non-synonymous over synonymous rate ratios for the concatenated coding genes, we used the empirical Bayes Renaissance countingprocedure67. But some theories suggest that pangolins may be the source of the novel coronavirus. In case of DRAGEN COVID Lineage tool, the minimum accepted alignment score was set to 22 and results with scores <22 were discarded. Genetics 172, 26652681 (2006). J. Virol. Phylogenetic Assignment of Named Global Outbreak Lineages Med. There is a 90% DNA match between SARS CoV 2 and a coronavirus in pangolins. and JavaScript. The construction of NRR1 is the most conservative as it is least likely to contain any remaining recombination signals. The virus then. Published. As of December 2, 2021, SJdRP, a medium-sized city in the Northwest region of So Paulo state, Brazil (Fig. Of importance for future spillover events is the appreciation that SARS-CoV-2 has emerged from the same horseshoe bat subgenus that harbours SARS-like coronaviruses. Duchene, S. et al. The difficulty in inferring reliable evolutionary histories for coronaviruses is that their high recombination rate48,49 violates the assumption of standard phylogenetic approaches because different parts of the genome have different histories. Developed by the Centre for Genomic Pathogen Surveillance. The ongoing pandemic spread of a new human coronavirus, SARS-CoV-2, which is associated with severe pneumonia/disease (COVID-19), has resulted in the generation of tens of thousands of virus genome sequences. The most parsimonious explanation for these shared ACE2-specific residues is that they were present in the common ancestors of SARS-CoV-2, RaTG13 and Pangolin Guangdong 2019, and were lost through recombination in the lineage leading to RaTG13. Li, Q. et al. Relevant bootstrap values are shown on branches, and grey-shaded regions show sequences exhibiting phylogenetic incongruence along the genome. A phylogenetic treeusing RAxML v8.2.8 (ref. 94, e0012720 (2020). 26 March 2020. Transparent bands of interquartile range width and with the same colours are superimposed to highlight the overlap between estimates. Webster, R. G., Bean, W. J., Gorman, O. T., Chambers, T. M. & Kawaoka, Y. Evolution and ecology of influenza A viruses. COVID-19: Time to exonerate the pangolin from the transmission of SARS Below, we report divergence time estimates based on the HCoV-OC43-centred rate prior for NRR1, NRR2 and NRA3 and summarize corresponding estimates for the MERS-CoV-centred rate priors in Extended Data Fig. Humans' selfish, speciesist treatment of these animals could be the very reason why the novel coronavirus exists. Did Pangolin Trafficking Cause the Coronavirus Pandemic? Nature 558, 180182 (2018). Bayesian evaluation of temporal signal in measurably evolving populations. performed Srecombination analysis. This study provides an integration of existing classifications and describes evolutionary trends of the SARS-CoV . 110. We thank originating laboratories at South China Agricultural University (Y. Shen, L. Xiao and W. Chen; no. Center for Infectious Disease Dynamics, Department of Biology, Pennsylvania State University, University Park, PA, USA, Department of Microbiology, Immunology and Transplantation, KU Leuven, Rega Institute, Leuven, Belgium, Department of Biological Sciences, Xian Jiaotong-Liverpool University, Suzhou, China, State Key Laboratory of Emerging Infectious Diseases, School of Public Health, The University of Hong Kong, Hong Kong SAR, China, Department of Biology, University of Texas Arlington, Arlington, TX, USA, Institute of Evolutionary Biology, University of Edinburgh, Edinburgh, UK, MRC-University of Glasgow Centre for Virus Research, Glasgow, UK, You can also search for this author in Publishers note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations. Katoh, K., Asimenos, G. & Toh, H. in Bioinformatics for DNA Sequence Analysis (ed. Gorbalenya, A. E. et al. Pangolin relies on a novel algorithm called pangoLEARN. Trova, S. et al. & Holmes, E. C. A genomic perspective on the origin and emergence of SARS-CoV-2. J. Med. and D.L.R. However, inconsistency in the nomenclature limits uniformity in its epidemiological understanding. In the meantime, to ensure continued support, we are displaying the site without styles Two exceptions can be seen in the relatively close relationship of Hong Kong viruses to those from Zhejiang Province (with two of the latter, CoVZC45 and CoVZXC21, identified as recombinants) and a recombinant virus from Sichuan for which part of the genome (regionB of SC2018 in Fig. Microbes Infect. Ge, X. et al. USA 113, 30483053 (2016). Stamatakis, A. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Pink, green and orange bars show BFRs, with regionA (nt 13,29119,628) showing two trimmed segments yielding regionA (nt13,29114,932, 15,40517,162, 18,00919,628). 2, vew007 (2016). COVID-19 lineage names can be confusing to navigate; there are many aliases and if you want to catch them all to examine further in data analyses it helps to Allen O'Brien on LinkedIn: #r #rstudio #rstats #pangolin #covid19 #datascience #epidemiology Virus Evol. You signed in with another tab or window. 32, 268274 (2014). Evidence of the recombinant origin of a bat severe acute respiratory syndrome (SARS)-like coronavirus and its implications on the direct ancestor of SARS coronavirus. Coronavirus: Pangolins may have spread the disease to humans On first examination this would suggest that that SARS-CoV-2 is a recombinant of an ancestor of Pangolin-2019 and RaTG13, as proposed by others11,22. BFRs were concatenated if no phylogenetic incongruence signal could be identified between them. Posterior distributions were approximated through Markov chain Monte Carlo sampling, which were run sufficiently long to ensure effective sampling sizes >100. Zhou, H. et al. Methods Ecol. Bioinformatics 30, 13121313 (2014). Although the human ACE2-compatible RBD was very likely to have been present in a bat sarbecovirus lineage that ultimately led to SARS-CoV-2, this RBD sequence has hitherto been found in only a few pangolin viruses. Bayesian evolutionary rate and divergence date estimates were shown to be consistent for these three approaches and for two different prior specifications of evolutionary rates based on HCoV-OC43 and MERS-CoV. . The lineage B.1 has been the major basal and widespread lineage from the initial SARS-CoV-2 spread and it became the more prevalent lineage in Colombia ( 13 ), while the B.1.111 lineage, first detected in the USA from a sample collected on March 7, 2020 and subsequently in Colombia on March 13, 2020 is currently circulating and mainly represented Emerg. Except for specifying that sequences are linear, all settings were kept to their defaults. Because coronaviruses are known to be highly recombinant, we used three different approaches to identify non-recombinant regions for use in our Bayesian time-calibrated phylogenetic inference. Biol. Note that six of these sequences fall under the terms of use of the GISAID platform. M.F.B., P.L. and P.L.) It is available as a command line tool and a web application. Biol. performed recombination analysis for non-recombining alignment3, calibration of rate of evolution and phylogenetic reconstruction and dating. Software package for assigning SARS-CoV-2 genome sequences to global lineages. Cell 181, 223227 (2020). This boundary appears to be rarely crossed. 1) and thus likely to be the product of recombination, acquiring a divergent variable loop from a hitherto unsampled bat sarbecovirus28. In March, when covid cases began spiking around India, Bani Jolly went hunting for answers in the virus's genetic code. We showed that severe acute respiratory syndrome coronavirus 2 is probably a novel recombinant virus. Extended Data Fig. and T.A.C. Nature 579, 265269 (2020). One geographic clade includes viruses from provinces in southern China (Guangxi, Yunnan, Guizhou and Guangdong), with its major sister clade consisting of viruses from provinces in northern China (Shanxi, Henan, Hebei and Jilin) as well as Hubei Province in central China and Shaanxi Province in northwestern China. Google Scholar. To examine temporal signal in the sequenced data, we plotted root-to-tip divergence against sampling time using TempEst39 v.1.5.3 based on a maximum likelihood tree. CNN . Yu, H. et al. A third approach attempted to minimize the number of regions removed while also minimizing signals of mosaicism and homoplasy. 2, bottom) show that SARS-CoV-2 is unlikely to have acquired the variable loop from an ancestor of Pangolin-2019 because these two sequences are approximately 1015% divergent throughout the entire Sprotein (excluding the N-terminal domain). Adv. Using both prior distributions, this results in six highly similar posterior rate estimates for NRR1, NRR2 and NRA3, centred around 0.00055 substitutions per siteyr1. 30, 21962203 (2020). We considered (1) the possibility that BFRs could be combined into larger non-recombinant regions and (2) the possibility of further recombination within each BFR. PLoS ONE 5, e10434 (2010). Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic, https://doi.org/10.1038/s41564-020-0771-4. Conducting analogous analyses of codon usage bias as Ji et al. The variable-loop region in SARS-CoV-2 shows closer identity to the 2019 pangolin coronavirus sequence than to the RaTG13 bat virus, supported by phylogenetic inference (Fig. All four of these breakpoints were also identified with the tree-based recombination detection method GARD35. First, we took an approach that relies on identification of mosaic regions (via 3SEQ14 v.1.7) that are also supported by PI signals19. With horseshoe bats currently the most plausible origin of SARS-CoV-2, it is important to consider that sarbecoviruses circulate in a variety of horseshoe bat species with widely overlapping species ranges57. Despite the SARS-CoV-2 lineages acquisition of residues in its Spike (S) proteins receptor-binding domain (RBD) permitting the use of human ACE2 (ref. J. Med Virol. 1, vev003 (2015). Virus Evol. The divergence time estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent among the three approaches we use to eliminate the effects of recombination in the alignment. A hypothesis of snakes as intermediate hosts of SARS-CoV-2 was posited during the early epidemic phase54, but we found no evidence of this55,56; see Extended Data Fig. 87, 62706282 (2013). To evaluate the performance procedure, we confirmed that the recombination masking resulted in (1) a markedly different outcome of the PHI test64, (2) removal of well-supported (bootstrap value >95%) incompatible splits in Neighbor-Net65 and (3) a near-complete reduction of mosaic signal as identified by 3SEQ. Posada, D., Crandall, K. A. Biol. Lam, H. M., Ratmann, O. with an alignment on which an initial recombination analysis was done. Identifying SARS-CoV-2-related coronaviruses in Malayan pangolins. By mid-January 2020, the virus was spreading widely within Hubei province and by early March SARS-CoV-2 was declared a pandemic8. We call this approach breakpoint-conservative, but note that this has the opposite effect to the construction of NRR1 in that this approach is the most likely to allow breakpoints to remain inside putative non-recombining regions. Viruses 11, 174 (2019). Why Can't We Just Call BA.2 Omicron? - The Atlantic We say that this approach is conservative because sequences and subregions generating recombination signals have been removed, and BFRs were concatenated only when no PI signals could be detected between them. 6, 8391 (2015). Posterior means (horizontal bars) of patristic distances between SARS-CoV-2 and its closest bat and pangolin sequences, for the spike proteins variable loop region and CTD region excluding the variable loop. SARS-CoV-2 genetic lineages in the United States are routinely monitored through epidemiological investigations, virus genetic sequence-based surveillance, and laboratory studies. Figure 1 (top) shows the distribution of all identified breakpoints (using 3SEQs exhaustive triplet search) by the number of candidate recombinant sequences supporting them. 88, 70707082 (2014). Sequence similarity. BEAGLE 3: improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics. SARS-CoV-2 Variant Classifications and Definitions performed recombination and phylogenetic analysis and annotated virus names with geographical and sampling dates. PubMed Central The boxplots show divergence time estimates (posterior medians) for SARS-CoV-2 (red) and the 20022003 SARS-CoV virus (blue) from their most closely related bat virus. COVID-19: A Catastrophe or Opportunity for Pangolin Conservation? - Nature 3 Priors and posteriors for evolutionary rate of SARS-CoV-2. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. The origins we present in Fig. Anyone you share the following link with will be able to read this content: Sorry, a shareable link is not currently available for this article. 92, 433440 (2020). 4), that region and shorter BFRs were not included in combined putative non-recombinant regions. A second breakpoint-conservative approach was conservative with respect to breakpoint identification, but this means that it is accepting of false-negative outcomes in breakpoint inference, resulting in less certainty that a putative NRR truly contains no breakpoints. Grey tips correspond to bat viruses, green to pangolin, blue to SARS-CoV and red to SARS-CoV-2. As illustrated by the dashed arrows, these two posteriors motivate our specification of prior distributions with standard deviations inflated 10-fold (light color). Regions AC were further examined for mosaic signals by 3SEQ, and all showed signs of mosaicism. 11,12,13,22,28)a signal that suggests recombinationthe divergence patterns in the Sprotein do not show evidence of recombination between the lineage leading to SARS-CoV-2 and known sarbecoviruses. 62,63), the GTR+ model and 100bootstrap replicateswas inferred for each BFR >500nt. Over relatively shallow timescales, such differences can primarily be explained by varying selective pressure, with mildly deleterious variants being eliminated more strongly by purifying selection over longer timescales44,45,46. A., Filip, I., AlQuraishi, M. & Rabadan, R. Recombination and lineage-specific mutations led to the emergence of SARS-CoV-2. The inset represents divergence time estimates based on NRR1, NRR2 and NRA3. Virus Evol. is funded by the MRC (no. RegionsB and C span nt3,6259,150 and 9,26111,795, respectively. To employ phylogenetic dating methods, recombinant regions of a 68-genome sarbecovirus alignment were removed with three independent methods. Visual exploration using TempEst39 indicates that there is no evidence for temporal signal in these datasets (Extended Data Fig. PubMedGoogle Scholar. We compiled a set of 69SARS-CoV genomes including 58 sampled from humans and 11 sampled from civets and raccoon dogs. 5). "This is an extremely interesting . Because these subclades had different phylogenetic relationships in regionD (Supplementary Fig. The authors declare no competing interests. Use the Previous and Next buttons to navigate the slides or the slide controller buttons at the end to navigate through each slide. These means are based on the mean rates estimated for MERS-CoV and HCoV-OC43, respectively, while the standard deviations are set ten times higher than empirical values to allow greater prior uncertainty and avoid strong bias (Extended Data Fig. The latter was reconstructed using IQTREE66 v.2.0 under a general time-reversible (GTR) model with a discrete gamma distribution to model inter-site rate variation. The Artic Network receives funding from the Wellcome Trust through project no. Martin, D. P., Murrell, B., Golden, M., Khoosal, A. Because the SARS-CoV-2 S protein has been implicated in past recombination events or possibly convergent evolution12, we specifically investigated several subregions of the Sproteinthe N-terminal domain of S1, the C-terminal domain of S1, the variable-loop region of the C-terminal domain, and S2. In such cases, even moderate rate variation among long, deep phylogenetic branches will substantially impact expected root-to-tip divergences over a sampling time range that represents only a small fraction of the evolutionary history40. covid19_mostefai2021_paper/01_CreateObjects.r at master HussinLab All sequence data analysed in this manuscript are available at https://github.com/plemey/SARSCoV2origins. Because the estimated rates and divergence dates were highly similar in the three datasets analysed, we conclude that our estimates are robust to the method of identifying a genomes NRRs. Eight other BFRs <500nt were identified, and the regions were named BFRAJ in order of length. D.L.R. 5, 536544 (2020). In our analyses of the sarbecovirus datasets, we incorporated the uncertainty of the sampling dates when exact dates were not available. 3). There are outstanding evolutionary questions on the recent emergence of human coronavirus SARS-CoV-2 including the role of reservoir species, the role of recombination and its time of divergence from animal viruses. Zhang, Y.-Z. volume5,pages 14081417 (2020)Cite this article. Wang, H., Pipes, L. & Nielsen, R. Synonymous mutations and the molecular evolution of SARS-Cov-2 origins. Using the most conservative approach to identification of a non-recombinant genomic region (NRR1), SARS-CoV-2 forms a sister lineage with RaTG13, with genetically related cousin lineages of coronavirus sampled in pangolins in Guangdong and Guangxi provinces (Fig.