CoVsurver: Frequently Asked Questions

  1. What is the CoVsurver?
  2. What can it do?
  3. Best usage scenarios and common misconceptions.
  4. Special notes for using results in publications.
  5. What kind of information is being curated in the CoVsurver project?
  6. Will I be able to add information of the effects of an amino acid (aa) change not yet reflected by CoVsurver?
  7. Which reference sequence are you using?
  8. What do the colors of the aa changes mean?
  9. What do the superscript characters $#rhalo on the aa changes mean?
  10. How are the global aa change data obtained?
  11. I am uncertain of the information available for my aa change of interest, how can I find out more about the aa change?
  12. I think my aa change of interest is causing some effects. However, there is very limited meaningful information from the literature. What else can be done?
  13. Do you have a tutorial on how to use the CoVsurver?
  14. How can I cite the CoVsurver?
  15. Who is behind the CoVsurver?
  16. Further Acknowledgements.


  1. What is the CoVsurver?

    The CoVsurver is a research tool developed to aid the research community with the identification, analysis and interpretation of aa changes in coronavirus genomes.

    Back to Top

  2. What can it do?

    The CoVsurver permits researchers, clinician scientists and surveillance labs to rapidly screen coronavirus genomes for potentially noteworthy aa changes to identify candidates for phenotypic changes or special epidemiological relevance. For the latter, geographic and temporal frequency of occurrence are provided. For phenotypic changes, we created an in-house database of curated literature annotations for effects of aa changes such as host receptor binding, virulence, antigenic change and antibody escape mutants, as well as drug resistance. The CoVsurver also shows the position of the aa change(s) in structural models and highlight if aa changes are close to common drug, host receptor or antibody binding sites.

    Important: Kindly note the next two paragraphs to mitigate potential misinterpretation of analysis results.

    Back to Top

  3. Best usage scenarios and common misconceptions.

    Curated reference sequences used for annotation of equivalent aa changes are mainly comprised of strains that recently infected humans. Therefore, the usage scenario that will give the most fruitful and reliable results are current surveillance sequences. While more sequences of coronaviruses detected in animal hosts may be added in the future, the current clear bias is towards strains that are known to infect humans.

    The CoVsurver does not do a BLAST search against all available coronavirus strains in the first step but only against the limited set of selected reference strains. This limitation is indispensable to annotate each reference strain including human quality control steps to check alignments with each other (to allow identification of equivalent positions), structural models, sites of small ligand or antibody binding, aa change occurrence statistics (including geo-mapping) etc.

    The CoVsurver is not suited to detect recombination at this moment. Since query sequences are compared with the small set of annotated reference sequences, it is in most cases not suitable to interpret hits to different reference strains as recombinants.

    Back to Top

  4. Special notes for using results in publications.

    The main intention for the CoVsurver research tool is to allow highlighting phenotypically or epidemiologically interesting candidate aa changes for further research and should ideally be combined with experimental testing and verification of any predicted phenotypes. Importantly, any direct diagnostic use, assumed severity or recommendation on patient treatment should not be based solely on these computational predictions. The CoVsurver effect of aa change annotation results are based on knowledge transfer by similarity to aa changes studied in specific sequence contexts which in most cases will not be identical to the one of the user input sequences. For this, the simple rule applies that the closer your sequence is to the one for which the phenotype has been reported, the more likely a similar effect can be expected for your aa change.

    Inclusion of results for publications of any potential phenotypic changes highlighted by CoVsurver must be substantiated by careful analysis and consideration of the evidence leading to the assumed effect by reading and understanding the associated literature (links provided in aa change summary report) as well as any accompanying further experimental, clinical and/or epidemiological data.

    Given that the CoVsurver results are purely computationally derived and thus require careful expert judgement, unfiltered results are not suitable for public communication or any kind of publication without proper peer review by the research community.

    If you are in doubt how to interpret or communicate the CoVsurver results, please feel free to contact us for advice.

    Back to Top

  5. What kind of information is being curated in the CoVsurver project?

    Although the user only sees the agglomerated cross-linked results in the CoVsurver output, under the hood we essentially use and curate 4 different databases. The first is a selection of reference sequences which is mainly comprised of strains of particular interest for research and/or causing human infections. This database includes a curated MAFFT L-INS-I alignment of the reference strains as well as a residue position mapping to allow linking up the respective equivalent aa change positions among strains. Importantly, this also includes a disambiguation for different used numbering schemes.

    The second database stores information on aa changes that are known to affect drug resistance, alter virulence, cause antigenic change or host specificity shifts as curated by our group from the literature. This includes over 160 aa changes with information extracted from publications. Accompanying information such as the host, protein, strain and PubMed references for the aa change effect are also provided.

    The third database is derived through another pipeline to annotate relevant structural positions of aa changes based on  processing coronavirus crystal structures in PDB and identifying positions as being within the host cell receptor binding interface or close to bound small molecules such as drugs for polymerase and protease drug targets.

    Finally, the fourth database stores all aa change occurrence information. It is based on viral sequences in GISAID's EpiCoV database and updated every 24 hours. These sequences are aligned and compared with the reference sequence to count individual aa change occurrences. Since coronavirus sequences most often include date of collection and geographical location we provide this information in associated tables as well as a global occurrence map.

    Back to Top

  6. Will I be able to add information of the effects of a aa change not yet reflected by CoVsurver?

    Since manual inspection of the flood of new papers is a tedious and difficult task, we welcome suggested new aa change effect reports. You may contact us [here] for possible inclusion into the CoVsurver.

    Back to Top

  7. Which reference sequence are you using?

    There is a clear consensus among the first shared genomes from the outbreak in late December comprising of 7 identical genomes and the central reference also used here is hCoV-19/Wuhan/WIV04/2019|EPI_ISL_402124.

    Back to Top

  8. What do the colors of the aa changes mean?

    The aa changes are color-coded according to their known or predicted biological effect and epidemiological significance. When there are no known effects for the aa change and the aa change occurred only once in the current set of sequences, the aa change will appear in black colored font. AA changes occurring more than 100 times are more interesting epidemiologically and will appear in blue colored font. If the aa change occurs at a site known to be involved in phenotypic effects such as altering host-cell receptor binding or antigenicity, it will appear in orange if the site is shared but not the type of amino acids involved. AA changes that create or remove a potential glycosylation site are colored magenta, and aa changes that lead to an insertion or deletion of amino acid residues are colored in cyan. Only if the phenotype change reported before matches the same amino acid types seen in the query we color green for predicted neutral effect and red for predicted enhancing effect. It is crucial to read the associated literature (link provided) to make final judgement on the expected effect of any aa change.

    Back to Top

  9. What do the superscript characters $#rhalo on the aa changes mean?

    If there is an associated literature in our manually curated database for the corresponding position of the aa change, there will be a superscript $ on the aa change. On a monthly basis, we search the PDB database for structures that have a sequence similarity of at least 70% from the reference hCoV-19/Wuhan/WIV04/2019|EPI_ISL_402124 coronavirus. If the corresponding structural position of the aa change is found to be within 6 Angstrom from host cell surface receptor binding, host cell protein/RNA interaction, antibody, ligand or viral oligomerization interfaces, it will be denoted by a superscript # follow by the corresponding characters rhalo respectively.

    Back to Top

  10. How are the global aa change data obtained?

    Every 24 hours, viral sequences from GISAID's EpiCoV database are aligned and compared with the reference sequence. Using associated information such as date of collection and geographical location, the CoVsurver is capable of generating global occurrence statistics of the relevant aa changes.

    Back to Top

  11. I am uncertain of the information available for my aa change of interest, how can i find out more about the aa change?

    An aa change summary can be accessed from the first output page by clicking on the respective aa change of interest. Further hyperlinks are provided within each report for additional details behind each annotation statement, including literature links where available.

    Back to Top

  12. I think my aa change of interest is causing some effects. However, there is very limited meaningful information from the literature. What else can be done?

    Contact the CoVsurver Team about your concern. The hosting research institute also offers more manual computational follow-up analyses such as molecular dynamics simulations and other structure calculations (stability, drug binding, host receptor binding, glycosylation modelling) and a variety of bioinformatics approaches (whole genome phylogenetic analysis, monophyletic clade analysis, etc.) to examine aa changes if there are mutual interests in collaborations.

    Back to Top

  13. Do you have a tutorial on how to use the CoVsurver?

    Not yet. This resource continues to be developed.

    Back to Top

  14. How can I cite the CoVsurver in my references?

    The manuscript for the CoVsurver is currently in preparation. For now, please cite Khare et. al. 2021. China CDC Wkly. 3(49): 1049−1051.

    Back to Top

  15. Who is behind the CoVsurver?

    The CoVsurver was conceived by Sebastian Maurer-Stroh and developed with his group at the A*STAR Bioinformatics Institute (BII) in Singapore since 2020. Many colleagues contributed critically to its development and maintenance, including:

    Raphael Tze Chuen Lee, Shi Shu Yuan, Ashar Malik, Frank Eisenhaber, GISAID Database Technical Group and Scientific Advisory Council, Sebastian Maurer-Stroh

    Back to Top

  16. Further acknowledgements

    The idea for CoVsurver follows into the footsteps of the widely used FluSurver and arose out of the need to make sense out of the rapidly increasing amount of coronavirus sequences as a result of the COVID-19 pandemic as well as more generally available and affordable sequencing technologies. We are very grateful to the GISAID Initiative and to its submitters of genomic sequence and metadata to its EpiCoV database and our collaborators that provided sequences for analysis and helped shape CoVsurver into a tool useful for a whole scientific community.

    Back to Top