Skip to main content
Advertisement
  • Loading metrics

Details in the evaluation of circular RNA detection tools: Reply to Chen and Chuang

Chia-Ying Chen and Trees-Juen Chuang (referred as CYC & TJC below) recently submitted their comment [1] on our previous paper [2]. In their paper, they scrutinized the CircBase [3] candidates that we used and pointed out several weak points of our paper. In summary, they suggested that the positive dataset we derived from CircBase required further evaluation. They also indicated that using all of these candidates as our dataset was not appropriate. They further suggested that three main confounding factors may affect our assessment of circRNA detection tools and that their performances should be re-evaluated.

Before we begin to discuss their comment, we will briefly introduce the positive dataset we used. First, as stated in our previous paper, the 14,689 candidates detected in HeLa cells were downloaded from CircBase and reported by the study of Salzman et al. [4]. These candidates were not identified with the use of find_circ [5] tool. As described in the study of Salzman et al. [4], all UCSC annotated exons in scrambled order were used to construct a custom database and identify circRNA candidates. Second, in our positive dataset, constant coverage of 10× for the intervening sequence and a minimum of two read pairs (paired-end simulated reads) to cross the back-spliced junction sites were generated for each candidate.

Now, we will discuss the three confounding factors they listed in their paper.

First, they suggested to remove 1046 candidates with unannotated exon boundaries from the positive dataset, especially candidates without canonical splice signals, such as GT-AG, GC-AG, or AT-AC, for the junctions. As mentioned above, CircBase-deposited circRNA candidates that we used were identified by Salzman et al. [4]; the candidates identified by their method should all match the exon boundaries. The discrepancies may be caused by inconsistent gene annotation files used. Salzman et al. [4] used UCSC known genes [6], whereas CYC & TJC used NCBI RefSeq-identified mRNA annotation files. We manually checked several candidates marked with “junctions with unannotated exon boundaries” in CYC & TJC’s Supplemental Dataset S1. The junction sites of these candidates were annotated as exon boundaries in UCSC known genes annotation file (http://hgdownload.soe.ucsc.edu/goldenPath/hg19/database/knownGene.txt.gz). Thus, detection of circRNAs with annotated exon boundaries relies on the gene annotation files used, and novel candidates may be missed because of the incompleteness of the current database [7]. For example, Szabo et al. [7] reinforced an annotation-based algorithm with a de novo module and discovered a validated circRNA from the not-fully-annotated RMST gene and several U12 circRNAs produced from unannotated boundaries. Such case was also demonstrated by Xiao-Ou Zhang et al. [8]. They detected thousands of novel exons (non-RefSeq, non-Ensembl, or non-UCSC known genes) in circRNAs by using an updated CIRCexplorere2 tool, and several of them were confirmed by Northern blot analysis and Sanger sequencing after RT-PCR [8]. Other examples were shown by Salzman et al. [4], they found several noncoding RNA genes expressed circular isoforms in mouse and human [4]. Gao et al. also provided evidence of intronic or intergenic circRNAs [9]. Moreover, the well-known CDR1as [5, 10] is an intergenic circRNA by definition. To study the mechanism of circularization, Starke et al. observed that both canonical splice sites are essential; however, they also cannot rule out the potential use of cryptic sites for circularization [11]. Their experimental data showed that when the normal 5′ or 3′ splice site was mutated, circRNAs can also be formed with the use of cryptic, noncanonical 5′ and 3′ splice sites [11]. Given the above-mentioned evidence, excluding candidates with unannotated exon boundaries or without canonical splicing sites is subject to discussion.

Second, they suggested the removal of 2316 candidates, of which the concatenated exon sequences flanking back-spliced junction sites exhibited ambiguous alignments. We checked these candidates on HeLa and Hs68 samples. As shown in Table 1, we found that some of them were not depleted (≥ onefold enrichment) or even significantly enriched (≥ fivefold enrichment) after RNase R treatment. (A Detailed discussion on two examples can be referred to Section I of the Supplementary File.) Therefore, suggesting that all of the candidates with ambiguous alignments are false calls and should be excluded from the analysis is inappropriate. However, sequencing reads produced from these candidates may result in multiple hits due to their ambiguous alignments, and it’s important to take into account of factors, such as sequencing base quality, alignment mismatches, minimum number of bases overhang both sides of the junction sites, and mapping uniqueness of the supporting back-spliced junction reads [7].

thumbnail
Table 1. ‘2316 ambiguous CircBase circRNAs’ on HeLa and Hs68 samples.

https://doi.org/10.1371/journal.pcbi.1006916.t001

Third, they suggested that “unqualified reads” with ambiguous alignments and different supporting read counting methods of the tools affected our reported results. First, we would like to clarify that the result of CIRI, MapSplice, and find_circ that we provided in our previous paper [2] only included candidates with ≥ 2 supporting back-spliced junction reads because of the limited output with default parameter setting of the three tools. Thus, no circRNAs with one supporting reads for these tools are included in Fig 3B of CYC & TJC’s comment paper. If candidates with one supporting reads were reported by the three tools, then the total number of CircBase circRNAs identified by all 11 tools is expected to be more than 3580 events (Fig 3B of CYC & TJC’s comment paper). As for “unqualified reads”, the 4 reads they listed in Fig 3C of their paper were back-spliced junction reads generated by CIRI-simulator [9] to support this circRNA. (A detailed discussion on two of these reads can be referred to Section II of the Supplementary File.) As for “different counting methods” used by different tools, it possibly affects the detection of circRNAs with small size. If the spliced length of the candidates is smaller than the insert size of the sequencing library, then both mates of the paired-end reads possibly cross the back-spliced junction sites. If both mates of the paired-end reads cross the back-spliced junction site, then this case is beneficial to all tools because of increased opportunities to detect the back-spliced junction event. For Fig 4 of our previous paper, by focusing our analysis on common candidates with spliced length exceeding the insert size of the sequencing library, we eliminated the influence of different counting methods. For Table 1 of our previous paper, we generated sufficient (≥ 2) back-spliced junction reads for each circRNA in the positive dataset. And it was a common practice to keep candidates with ≥ 2 supporting reads for further analysis [12] [5, 9] [13], while reliable methods to reduce false-positive circRNAs still remains to be developed. In summary, it’s feasible to assess the sensitivity of each tool by keeping candidates with ≥ 2 supporting reads (Table 1 & Fig 4 of our previous paper).

Finally, CYC & TJC emphasized that either RTase- and non-RTase-based experiments or at least two different types of RTase-based experiments should be conducted to validate the authenticity of the circRNA candidates. We believe that the origins (from different tissues/cell lines) of our collected circRNAs will not affect the fairness of our evaluation. However, we acknowledge that not all of the 282 circRNAs, which we compiled from 17 published studies, were validated using methods indicated by CYC & TJC, such circRNAs should be collected if possible.

In our previous paper [2], to evaluate the performance of 11 circRNA detection tools, we generated a synthetic positive dataset from 14,689 candidates deposited in CircBase [3] that were previously identified from HeLa cells by using an annotation-based method [4]. Although the authenticity of these candidates still remains to be verified, they should all match the exon boundaries annotated in UCSC knownGene database [6]. In CYC & TJC’s comment paper, they further scrutinized these candidates. After analysis, they suggested that three main confounding factors may compromise the fairness of our assessment. Consequently, they suggested the removal of candidates with unannotated exon boundaries, particularly those without canonical splice sites. In addition, they suggested to exclude candidates with ambiguous alignments. As discussed in a previous study [14] and also shown by our data, although these heuristic filtering steps can eliminate particular types of false positives, they may create blind spots and reduce sensitivity. Third, they suggested that our evaluation of the tools was affected by unqualified reads with ambiguous alignments and different supporting read-counting methods. However, all the unqualified reads listed in Fig 3C of the comment paper are back-spliced junction reads generated by CIRI-simulator [9]. The discrepancies may be caused by the failure of BLAT [15] to detect supporting reads of which only a small portion spans the back-spliced junction sites. In our previous paper, prior to further analysis, relevant steps were adopted to minimize the effect of different counting methods. In summary, CYC & TJC underlined several knowledge-based filtering steps and an experimental validation method to address the bioinformatic and experimental challenges in detecting circRNAs, but whether these heuristic filtering steps should be enforced still requires further discussion. Finally, we reanalyzed the positive and mixed datasets with their suggested removal of ‘uncertain circRNA candidates’. Data in Table 1 of our previous paper were updated as Table 2 below. In general, our previous conclusions drawn from these two datasets are robust to the change.

thumbnail
Table 2. Summary of accuracy measures on the positive and mixed datasets.

https://doi.org/10.1371/journal.pcbi.1006916.t002

Supporting information

S1 File.

(I) Examples of not-depleted or even enriched “ambiguous CircBase circRNAs” after RNase R treatment. (II) Examples of back-spliced junction read pairs being mistaken as “unqualified reads”.

https://doi.org/10.1371/journal.pcbi.1006916.s001

(DOCX)

Acknowledgments

We would like to thank Prof. Chuang and Chia-Ying Chen for kindly and patiently discussing with us when we have questions on their comment paper.

References

  1. 1. Chen C-YC, Chuang T-J. Comment on “A comprehensive overview and evaluation of circular RNA detection tools”. PLoS Comput Biol 2019; 15(5):e1006158.
  2. 2. Zeng X, Lin W, Guo M, Zou Q. A comprehensive overview and evaluation of circular RNA detection tools. PLoS Comput Biol. 2017;13(6):e1005420. pmid:28594838; PubMed Central PMCID: PMCPMC5466358.
  3. 3. Glazar P, Papavasileiou P, Rajewsky N. circBase: a database for circular RNAs. RNA. 2014;20(11):1666–70. pmid:25234927; PubMed Central PMCID: PMCPMC4201819.
  4. 4. Salzman J, Chen RE, Olsen MN, Wang PL, Brown PO. Cell-type specific features of circular RNA expression. PLoS Genet. 2013;9(9):e1003777. pmid:24039610; PubMed Central PMCID: PMCPMC3764148.
  5. 5. Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, et al. Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013;495(7441):333–8. pmid:23446348.
  6. 6. Hsu F, Kent WJ, Clawson H, Kuhn RM, Diekhans M, Haussler D. The UCSC Known Genes. Bioinformatics. 2006;22(9):1036–46. pmid:16500937.
  7. 7. Szabo L, Morey R, Palpant NJ, Wang PL, Afari N, Jiang C, et al. Statistically based splicing detection reveals neural enrichment and tissue-specific induction of circular RNA during human fetal development. Genome Biol. 2015;16:126. pmid:26076956; PubMed Central PMCID: PMCPMC4506483.
  8. 8. Zhang XO, Dong R, Zhang Y, Zhang JL, Luo Z, Zhang J, et al. Diverse alternative back-splicing and alternative splicing landscape of circular RNAs. Genome Res. 2016;26(9):1277–87. pmid:27365365; PubMed Central PMCID: PMCPMC5052039.
  9. 9. Gao Y, Wang J, Zhao F. CIRI: an efficient and unbiased algorithm for de novo circular RNA identification. Genome biology. 2015;16(1):4. pmid:25583365
  10. 10. Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, et al. Natural RNA circles function as efficient microRNA sponges. Nature. 2013;495(7441):384–8. pmid:23446346.
  11. 11. Starke S, Jost I, Rossbach O, Schneider T, Schreiner S, Hung LH, et al. Exon circularization requires canonical splice signals. Cell Rep. 2015;10(1):103–11. pmid:25543144.
  12. 12. Wang K, Singh D, Zeng Z, Coleman SJ, Huang Y, Savich GL, et al. MapSplice: accurate mapping of RNA-seq reads for splice junction discovery. Nucleic Acids Res. 2010;38(18):e178. pmid:20802226; PubMed Central PMCID: PMCPMC2952873.
  13. 13. Song X, Zhang N, Han P, Moon BS, Lai RK, Wang K, et al. Circular RNA profile in gliomas revealed by identification tool UROBORUS. Nucleic Acids Res. 2016;44(9):e87. pmid:26873924; PubMed Central PMCID: PMCPMC4872085.
  14. 14. Szabo L, Salzman J. Detecting circular RNAs: bioinformatic and experimental challenges. Nat Rev Genet. 2016;17(11):679–92. pmid:27739534; PubMed Central PMCID: PMCPMC5565156.
  15. 15. Kent WJ. BLAT—the BLAST-like alignment tool. Genome Res. 2002;12(4):656–64. pmid:11932250; PubMed Central PMCID: PMCPMC187518.