- What is the CoVsurver?
The
CoVsurver is a research tool developed to aid the
research community with the identification,
analysis and interpretation of aa changes in coronavirus
genomes.
Back to Top
- What can it do?
The CoVsurver permits researchers, clinician scientists and surveillance labs to rapidly screen coronavirus genomes for potentially noteworthy aa changes to identify candidates for phenotypic changes or special epidemiological relevance. For the latter, geographic and temporal frequency of occurrence are provided. For phenotypic changes, we created an in-house database of curated literature annotations for effects of aa changes such as host receptor binding, virulence, antigenic change and antibody escape mutants, as well as drug resistance. The CoVsurver also shows the position of the aa change(s) in structural models and highlight if aa changes are close to common drug, host receptor or antibody binding sites.
Important: Kindly note the next two paragraphs to mitigate
potential misinterpretation of analysis results.
Back to Top
- Best usage scenarios and common
misconceptions.
Curated reference sequences used for annotation
of equivalent aa changes are mainly comprised
of strains that recently infected humans. Therefore, the
usage scenario that will give the most fruitful and
reliable results are current surveillance sequences. While more
sequences of coronaviruses detected in animal hosts may be added in the future, the current
clear bias is towards strains that are known to infect
humans.
The CoVsurver does not do a BLAST search against
all available coronavirus strains in the first
step but only against the limited set of selected
reference strains. This limitation is indispensable to
annotate each reference strain including human quality
control steps to check alignments with each other (to
allow identification of equivalent positions),
structural models, sites of small ligand or antibody
binding, aa change occurrence statistics (including
geo-mapping) etc.
The CoVsurver is not suited to detect recombination
at this moment. Since query sequences are compared
with the small set of annotated reference sequences,
it is in most cases not suitable to interpret hits to
different reference strains as recombinants.
Back to Top
- Special notes for using results
in publications.
The main
intention for the CoVsurver research tool is to allow
highlighting phenotypically or epidemiologically
interesting candidate aa changes for further research
and should ideally be combined with experimental
testing and verification of any predicted phenotypes.
Importantly, any direct diagnostic use, assumed
severity or recommendation on patient treatment should
not be based solely on these computational
predictions. The CoVsurver effect of aa change annotation
results are based on knowledge transfer by similarity
to aa changes studied in specific sequence contexts
which in most cases will not be identical to the one
of the user input sequences. For this, the simple rule
applies that the closer your sequence is to the one
for which the phenotype has been reported, the more likely a similar effect can be
expected for your aa change.
Inclusion of results for publications of any potential
phenotypic changes highlighted by CoVsurver must be
substantiated by careful analysis and consideration of
the evidence leading to the assumed effect by reading
and understanding the associated literature (links
provided in aa change summary report) as well as any
accompanying further experimental, clinical and/or
epidemiological data.
Given that the CoVsurver results are purely
computationally derived and thus require careful expert
judgement, unfiltered results are not suitable for
public communication or any kind
of publication without proper peer review by the
research community.
If you are in doubt how to interpret or communicate
the CoVsurver results, please feel free to contact us
for advice.
Back to Top
- What kind of information is
being curated in the CoVsurver project?
Although
the user only sees the agglomerated cross-linked
results in the CoVsurver output, under the hood we
essentially use and curate 4 different databases. The
first is a selection of reference sequences which is
mainly comprised of strains of particular interest for research
and/or causing human infections. This database
includes a curated MAFFT L-INS-I alignment of the
reference strains as well as a residue position
mapping to allow linking up the respective equivalent
aa change positions among strains. Importantly, this
also includes a disambiguation for different used
numbering schemes.
The second database stores information on aa changes
that are known to affect drug resistance, alter
virulence, cause antigenic change or host specificity
shifts as curated by our group from the literature.
This includes over 160 aa changes with information
extracted from publications.
Accompanying information such as the host,
protein, strain and PubMed references for the aa change
effect are also provided.
The third database is derived
through another pipeline to annotate relevant structural
positions of aa changes based on processing
coronavirus crystal structures in PDB and identifying
positions as being within the host cell receptor binding interface or close to bound small molecules such
as drugs for polymerase and protease drug targets.
Finally, the fourth database stores all aa change
occurrence information. It is based on
viral sequences in GISAID's EpiCoV database and updated every 24 hours.
These sequences are aligned and compared with the reference
sequence to count individual aa change occurrences.
Since coronavirus sequences most often include
date of collection and geographical location we
provide this information in associated tables as well
as a global occurrence map.
Back to Top
- Will I be able to add
information of the effects of a aa change not yet
reflected by CoVsurver?
Since manual inspection of the flood of new papers is
a tedious and difficult task, we welcome
suggested new aa change effect reports. You may contact us [here] for possible inclusion
into the CoVsurver.
Back to Top
- Which reference sequence are you using?
There is a clear consensus among the first shared genomes from the outbreak in late December comprising of 7 identical genomes and the central reference also used here is hCoV-19/Wuhan/WIV04/2019|EPI_ISL_402124.
Back to Top
- What do the colors of the
aa changes mean?
The aa changes are color-coded according to their known or predicted biological effect and epidemiological significance. When there are no known effects for the aa change and the aa change occurred only once in the current set of sequences, the aa change will appear in black colored font. AA changes occurring more than 100 times are more interesting epidemiologically and will appear in blue colored font. If the aa change occurs at a site known to be involved in phenotypic effects such as altering host-cell receptor binding or antigenicity, it will appear in orange if the site is shared but not the type of amino acids involved. AA changes that create or remove a potential glycosylation site are colored magenta, and aa changes that lead to an insertion or deletion of amino acid residues are colored in cyan. Only if the phenotype change reported before matches the same amino acid types seen in the query we color green for predicted neutral effect and red for predicted enhancing effect. It is crucial to read the associated literature (link provided) to make final judgement on the expected effect of any aa change.
Back to Top
- What do the superscript characters $#rhalo on the aa changes mean?
If there is an associated literature in our manually curated database for the corresponding position of the aa change, there will be a superscript $ on the aa change. On a monthly basis, we search the PDB database for structures that have a sequence similarity of at least 70% from the reference hCoV-19/Wuhan/WIV04/2019|EPI_ISL_402124 coronavirus. If the corresponding structural position of the aa change is found to be within 6 Angstrom from host cell surface receptor binding, host cell protein/RNA interaction, antibody, ligand or viral oligomerization interfaces, it will be denoted by a superscript # follow by the corresponding characters rhalo respectively.
Back to Top
- How are the global aa change
data obtained?
Every 24 hours, viral
sequences from GISAID's EpiCoV
database are aligned and
compared with the reference sequence. Using
associated information such as date of collection and
geographical location, the CoVsurver is capable of
generating global occurrence statistics of the
relevant aa changes.
Back to Top
- I am uncertain of the
information available for my aa change of interest,
how can i find out more about the aa change?
An
aa change summary can be accessed from the first output
page by clicking on the respective aa change of
interest. Further hyperlinks are provided within each
report for additional details behind each annotation
statement, including literature links where available.
Back to Top
- I think my aa change of
interest is causing some effects. However, there is
very limited meaningful information from the
literature. What else can be done?
Contact the CoVsurver Team about your concern. The hosting research
institute also offers more manual computational
follow-up analyses such as molecular dynamics
simulations and other structure calculations
(stability, drug binding, host receptor binding,
glycosylation modelling) and a variety of
bioinformatics approaches (whole genome phylogenetic
analysis, monophyletic clade analysis, etc.) to
examine aa changes if there are mutual interests in
collaborations.
Back to Top
- Do you have a tutorial on how
to use the CoVsurver?
Not yet.
This resource continues to be developed.
Back to Top
- How can I cite the CoVsurver in my references?
The
manuscript for the CoVsurver is currently in
preparation. For now, please cite Khare et. al. 2021. China CDC Wkly. 3(49): 1049−1051.
Back to Top
- Who is behind the CoVsurver?
The
CoVsurver was conceived by Sebastian
Maurer-Stroh and developed with his group at
the A*STAR Bioinformatics Institute (BII) in Singapore
since 2020. Many colleagues
contributed critically to its development and
maintenance, including:
Raphael Tze Chuen Lee, Shi Shu Yuan, Ashar Malik, Frank Eisenhaber, GISAID Database Technical Group and Scientific Advisory Council, Sebastian Maurer-Stroh
Back to Top
- Further acknowledgements
The
idea for CoVsurver follows into the footsteps of the widely used FluSurver and arose out of the need to make sense
out of the rapidly increasing amount of coronavirus
sequences as a result of the COVID-19 pandemic as
well as more generally available and affordable
sequencing technologies. We are very grateful to the GISAID Initiative and to its submitters of genomic sequence and metadata to its EpiCoV database and our
collaborators that provided sequences for analysis and
helped shape CoVsurver into a tool useful for a whole
scientific community.
Back to Top