with an alignment on which an initial recombination analysis was done. The idea is that pangolins carrying the virus, SARS-CoV-2, came into contact with humans. Extended Data Fig. Trends Microbiol. N. China corresponds to Jilin, Shanxi, Hebei and Henan provinces, and the N. China clade also includes one sequence sampled in Hubei Province in 2004. Our results indicate the presence of a single lineage circulating in bats with properties that allowed it to infect human cells, as previously described for bat sarbecoviruses related to the first SARS-CoV lineage29,30,31. Anderson, K. G. nCoV-2019 codon usage and reservoir (not snakes v2). Virus Evol. 1 Phylogenetic relationships in the C-terminal domain (CTD). A distinct name is needed for the new coronavirus. Nature 579, 270273 (2020). Phylogenetic trees and exact breakpoints for all ten BFRs are shown in Supplementary Figs. The shaded region corresponds to the Sprotein. M.F.B. T.T.-Y.L. Evidence of the recombinant origin of a bat severe acute respiratory syndrome (SARS)-like coronavirus and its implications on the direct ancestor of SARS coronavirus. And this genotype pattern led to creating a new Pangolin lineage named B.1.640.2, a phylogenetic sister group to the old B.1.640 lineage renamed B.1.640.1. Nature 558, 180182 (2018). Mol. Emergence of SARS-CoV-2 through recombination and strong purifying selection. 35, 247251 (2018). The new paper finds that the genetic sequences of several strains of coronavirus found in pangolins were between 88.5 percent and 92.4 percent similar to those of the novel coronavirus. Because the estimated rates and divergence dates were highly similar in the three datasets analysed, we conclude that our estimates are robust to the method of identifying a genomes NRRs. The canine viral genome was excluded from the Bayesian phylogenetic analyses because temporal signal analyses (see below) indicated that it was an outlier. The Pango dynamic nomenclature is a popular system for classifying and naming genetically-distinct lineages of SARS-CoV-2, including variants of concern, and is based on the analysis of complete or near-complete virus genomes. COVID-19: Time to exonerate the pangolin from the transmission of SARS Based on the identified breakpoints in each genome, only the major non-recombinant region is kept in each genome while other regions are masked. SARS-CoV-2 Variant Classifications and Definitions Nature 538, 193200 (2016). 2). Curr. Virus Evol. Nat. In outbreaks of zoonotic pathogens, identification of the infection source is crucial because this may allow health authorities to separate human populations from the wildlife or domestic animal reservoirs posing the zoonotic risk9,10. Nature 503, 535538 (2013). Scientists trying to trace the ancestry of SARS-CoV-2, the virus responsible for COVID-19, have found the pangolin is unlikely to be the source of the virus responsible for the current pandemic. Host ecology determines the dispersal patterns of a plant virus. PubMed Preprint at https://doi.org/10.1101/2020.02.10.942748 (2020). Even before the COVID-19 pandemic, pangolins have been making headlines. BEAGLE 3: improved performance, scaling, and usability for a high-performance computing library for statistical phylogenetics. Due to the absence of temporal signal in the sarbecovirus datasets, we used informative prior distributions on the evolutionary rate to estimate divergence dates. The proximal origin of SARS-CoV-2 | Nature Medicine Genet. As informative rate priors for the analysis of the sarbecovirus datasets, we used two different normal prior distributions: one with a mean of 0.00078 and s.d. In December 2019, a cluster of pneumonia cases epidemiologically linked to an open-air live animal market in the city of Wuhan (Hubei Province), China1,2 led local health officials to issue an epidemiological alert to the Chinese Center for Disease Control and Prevention and the World Health Organizations (WHO) China Country Office. Furthermore, the other key feature thought to be instrumental in the ability of SARS-CoV-2 to infect humansa polybasic cleavage site insertion in the Sproteinhas not yet been seen in another close bat relative of the SARS-CoV-2 virus. However, the coronavirus isolated from pangolin is similar at 99% in a specific region of the S protein, which corresponds to the 74 amino acids involved in the ACE (Angiotensin Converting Enzyme . Phylogenetic Assignment of Named Global Outbreak LINeages, The pangolin web app is maintained by the Centre for Genomic Pathogen Surveillance. Green boxplots show the TMRCA estimate for the RaTG13/SARS-CoV-2 lineage and its most closely related pangolin lineage (Guangdong 2019). Evol. Article 5 Comparisons of GC content across taxa. Now, the two researchers used genomic sequencing to compare the DNA of the new coronavirus in humans with that in animals and found a 99% match with pangolins. SARS-CoV-2 is an appropriate name for the new coronavirus. Trends Microbiol. Holmes, E. C., Dudas, G., Rambaut, A. A SARS-like cluster of circulating bat coronaviruses shows potential for human emergence. Nguyen, L.-T., Schmidt, H. A., Von Haeseler, A. All authors contributed to analyses and interpretations. Suchard, M. A. et al. Five example sequences with incongruent phylogenetic positions in the two trees are indicated by dashed lines. Several of the recombinant sequences in these trees show that recombination events do occur across geographically divergent clades. To gauge the length of time this lineage has circulated in bats, we estimate the time to the most recent common ancestor (TMRCA) of SARS-CoV-2 and RaTG13. Identifying SARS-CoV-2 related coronaviruses in Malayan pangolins 04:20. A phylogenetic treeusing RAxML v8.2.8 (ref. Concatenated region ABC is NRR1. Sci. Hu, B. et al. PubMed Identifying the origins of an emerging pathogen can be critical during the early stages of an outbreak, because it may allow for containment measures to be precisely targeted at a stage when the number of daily new infections is still low. DRAGEN COVID Lineage App This app aligns reads to a SARS-CoV-2 reference genome and reports coverage of targeted regions. The consistency of the posterior rates for the different prior means also implies that the data do contribute to the evolutionary rate estimate, despite the fact that a temporal signal was visually not apparent (Extended Data Fig. The pangolin coronaviruses show lower similarity to SARS-CoV-2 than bat coronavirus RaTG13 across the whole genome, but higher similarity in the spike receptor binding domain, although the similarity at either scale remains too low to implicate . It compares the new genome against the large, diverse population of sequenced strains using a For the HCoV-OC43, MERS-CoV and SARS datasets we specified flexible skygrid coalescent tree priors. Consistent with this, we estimate a concomitantly decreasing non-synonymous-to-synonymous substitution rate ratio over longer evolutionary timescales: 1.41 (1.20,1.68), 0.35 (0.30,0.41) and 0.133 (0.129,0.136) for SARS, MERS-CoV and HCoV-OC43, respectively. Ji, W., Wang, W., Zhao, X., Zai, J. 874850). BEAST inferences made use of the BEAGLE v.3 library68 for efficient likelihood computations. A., Lytras, S., Singer, J. Boxes show 95% HPD credible intervals. Evol. Extended Data Fig. These residues are also in the Pangolin Guangdong 2019 sequence. Microbiol. To avoid artefacts due to recombination, we focused on NRR1 and NRR2 and the recombination-masked alignment NRA3 to infer time-measured evolutionary histories. We infer time-measured evolutionary histories using a Bayesian phylogenetic approach while incorporating rate priors based on mean MERS-CoV and HCoV-OC43 rates and with standard deviations that allow for more uncertainty than the empirical estimates for both viruses (see Methods). Evolutionary origins of the SARS-CoV-2 sarbecovirus lineage responsible for the COVID-19 pandemic. R. Soc. It is available as a command line tool and a web application. 1a-c ), has the third-highest number of confirmed COVID-19 cases in the state of So. Maciej F. Boni, Philippe Lemey, Andrew Rambaut or David L. Robertson. PubMed Mol. In the presence of time-dependent rate variation, a widely observed phenomenon for viruses43,44,52, slower prior rates appear more appropriate for sarbecoviruses that currently encompass a sampling time range of about 18years. The genetic distances between SARS-CoV-2 and RaTG13 (bottom) demonstrate that their relationship is consistent across all regions except for the variable loop. Nat Microbiol 5, 14081417 (2020). c, Maximum likelihood phylogenetic trees rooted on a 2007 virus sampled in Kenya (BtKy72; root truncated from images), shown for five BFRs of the sarbecovirus alignment. The Bat, the Pangolin and the City: A Tale of COVID-19 PubMed Central performed codon usage analysis. In case of DRAGEN COVID Lineage tool, the minimum accepted alignment score was set to 22 and results with scores <22 were discarded. 190, 20882095 (2004). ISSN 2058-5276 (online). All sequence data analysed in this manuscript are available at https://github.com/plemey/SARSCoV2origins. volume5,pages 14081417 (2020)Cite this article. collected SARS-CoV data and assisted in analyses of SARS-CoV and SARS-CoV-2 data. This statement informs us of the possibility that a virus has spilled over from a very rare and shy reptile-looking mammal . Google Scholar. However, for several reasons, nucleotide sequences may be generated that cover only the spike gene of SARS-CoV-2. Mol. It allows a user to assign a SARS-CoV-2 genome sequence the most likely lineage (Pango lineage) to SARS-CoV-2 query sequences. Yres, D. L. et al. Since the release of Version 2.0 in July 2020, however, it has used the 'pangoLEARN' machine-learning-based assignment algorithm to assign lineages to new SARS-CoV-2 genomes. The key to successful surveillance is knowing which viruses to look for and prioritizing those that can readily infect humans47. PureBasic 53 13 constellations Public Python 42 17 Virus Evol. EPI_ISL_410538, EPI_ISL_410539, EPI_ISL_410540, EPI_ISL_410541 and EPI_ISL_410542) for the use of sequence data via the GISAID platform. GARD identified eight breakpoints that were also within 50nt of those identified by 3SEQ. 36, 7597 (2002). SARS-CoV-2 and RaTG13 are the most closely related (their most recent common ancestor nodes denoted by green circles), except in the 222-nt variable-loop region of the C-terminal domain (bar graphs at bottom). performed recombination analysis for non-recombining regions1 and 2, breakpoint analysis and phylogenetic inference on recombinant segments. If the latter still identified non-negligible recombination signal, we removed additional genomes that were identified as major contributors to the remaining signal. A reduced sequence set of 25sequences chosen to capture the breadth of diversity in the sarbecoviruses (obvious recombinants not involving the SARS-CoV-2 lineage were also excluded) was used because GARD is computationally intensive. We focused on these three non-recombining regions/alignments for divergence time estimation; this avoids inappropriate modelling of evolutionary processes with recombination on strictly bifurcating trees, which can result in different artefacts such as homoplasies that inflate branch lengths and lead to apparently longer evolutionary divergence times. Ge, X. et al. A hypothesis of snakes as intermediate hosts of SARS-CoV-2 was posited during the early epidemic phase54, but we found no evidence of this55,56; see Extended Data Fig. 1, vev003 (2015). Med. PubMed Grey tips correspond to bat viruses, green to pangolin, blue to SARS-CoV and red to SARS-CoV-2. On first examination this would suggest that that SARS-CoV-2 is a recombinant of an ancestor of Pangolin-2019 and RaTG13, as proposed by others11,22. MC_UU_1201412). This underscores the need for a global network of real-time human disease surveillance systems, such as that which identified the unusual cluster of pneumonia in Wuhan in December 2019, with the capacity to rapidly deploy genomic tools and functional studies for pathogen identification and characterization. Of importance for future spillover events is the appreciation that SARS-CoV-2 has emerged from the same horseshoe bat subgenus that harbours SARS-like coronaviruses. 84, 31343146 (2010). Sign up for the Nature Briefing newsletter what matters in science, free to your inbox daily. Coronavirus origins: genome analysis suggests two viruses may have combined Lancet 383, 541548 (2013). Phylogenetic supertree reveals detailed evolution of SARS-CoV-2, Origin and cross-species transmission of bat coronaviruses in China, Emerging SARS-CoV-2 variants follow a historical pattern recorded in outgroups infecting non-human hosts, Inferring the ecological niche of bat viruses closely related to SARS-CoV-2 using phylogeographic analyses of Rhinolophus species, Genomic recombination events may reveal the evolution of coronavirus and the origin of SARS-CoV-2, A Bayesian approach to infer recombination patterns in coronaviruses, Metagenomic identification of a new sarbecovirus from horseshoe bats in Europe, A comparative recombination analysis of human coronaviruses and implications for the SARS-CoV-2 pandemic, Pandemic-scale phylogenomics reveals the SARS-CoV-2 recombination landscape, https://github.com/plemey/SARSCoV2origins, https://doi.org/10.1101/2020.04.20.052019, https://doi.org/10.1101/2020.02.10.942748, https://doi.org/10.1101/2020.05.28.122366, http://virological.org/t/ncov-2019-codon-usage-and-reservoir-not-snakes-v2/339, http://virological.org/t/ncovs-relationship-to-bat-coronaviruses-recombination-signals-no-snakes-no-evidence-the-2019-ncov-lineage-is-recombinant/331. PubMed Central GitHub - cov-lineages/pangolin: Software package for assigning SARS-CoV-2 genome sequences to global lineages. Across a large region of the virus genome, corresponding approximately to ORF1b, it did not cluster with any of the known bat coronaviruses indicating that recombination probably played a role in the evolutionary history of these viruses5,7. At present, we analyzed the diversity of SARS-CoV-2 viral genomes in India to know the evolutionary patterns of viruses in the country through their pangolin lineage and GISAID-Clade. A new coronavirus associated with human respiratory disease in China. There are outstanding evolutionary questions on the recent emergence of human coronavirus SARS-CoV-2 including the role of reservoir species, the role of recombination and its time of divergence from animal viruses. Methods Ecol. Zhang, Y.-Z. To begin characterizing any ancestral relationships for SARS-CoV-2, NRRs of the genome must be identified so that reliable phylogenetic reconstruction and dating can be performed. In Extended Data Fig. Biol. The virus then. 87, 62706282 (2013). Smuggled pangolins were carrying viruses closely related to the one sweeping the world, say scientists. We aimed to analyze 3 naso-oropharyngeal swab samples collected between August and December 2021 to describe the amino acid changes present in the sequence reads that may have a role in the emergence of new . CAS Wu, F. et al. TMRCA estimates for SARS-CoV-2 and SARS-CoV from their respective most closely related bat lineages are reasonably consistent for the different data sets and different rate priors in our analyses. A pneumonia outbreak associated with a new coronavirus of probable bat origin. Pangolin-CoV is 91.02% and 90.55% identical to SARS-CoV-2 and BatCoV RaTG13, respectively, at the whole-genome level. PDF How COVID-19 Variants Get Their Name - doh.wa.gov Identification of diverse alphacoronaviruses and genomic characterization of a novel severe acute respiratory syndrome-like coronavirus from bats in China. Coronavirus: Pangolins may have spread the disease to humans J. Gen. Virol. A second breakpoint-conservative approach was conservative with respect to breakpoint identification, but this means that it is accepting of false-negative outcomes in breakpoint inference, resulting in less certainty that a putative NRR truly contains no breakpoints. Kosakovsky Pond, S. L., Posada, D., Gravenor, M. B., Woelk, C. H. & Frost, S. D. W. Automated phylogenetic detection of recombination using a genetic algorithm. 82, 48074811 (2008). 21, 15081514 (2015). 91, 10581062 (2010). A new SARS-CoV-2 variant (B.1.1.523) capable of escaping immune protections Zhou, P. et al. G066215N, G0D5117N and G0B9317N)) and by the European Unions Horizon 2020 project MOOD (no. Boni, M. F., de Jong, M. D., van Doorn, H. R. & Holmes, E. C. Guidelines for identifying homologous recombination events in influenza A virus. Martin, D. P., Murrell, B., Golden, M., Khoosal, A. matics program called Pangolin was developed. To employ phylogenetic dating methods, recombinant regions of a 68-genome sarbecovirus alignment were removed with three independent methods. Early transmission dynamics in Wuhan, China, of novel coronavirus-infected pneumonia.