Fusion proteins or chimeras are distinct from conventionally spliced mRNA isoforms as they are produced by joining exons from two or more different gene loci. In humans, chimeric transcripts are generated in several ways: trans-splicing of pre-mRNAs, RNA transcription runoff, from other errors in RNA transcription processing, or represent artifacts of RNA sequencing. Alternatively, chimeric transcripts can be the products of gene fusion following inter-chromosomal translocations or intra-chromosomal rearrangements. Specific cellular phenotypes are characterized by expression of chimeric transcripts, for example, the fused BCR/ABL,FUS/ERG,MLL/AF6, and MOZ/CBPgenes are expressed in acute myeloid leukemia (AML), and theTMPRSS2/ETS chimera is associated with overexpression of the oncogene in prostate cancer. In principle, chimeric transcripts can augment the number of gene products available in a given genome and are suspected to function not only in cancer but also in normal cells.
The eukaryote transcriptome is composed of RNAs transcribed from almost any location in the genome. Although most RNAs can be assigned to a single locus, some of them, called chimeras, are composed of exons from distinct genes and are therefore assigned to several loci. In some cases, the loci are close to each other in the genome, suggesting that the chimera is generated by read-through transcription. In other instances, the loci are megabases apart or on different chromosomes, suggesting that the chimera is generated through genome rearrangements or trans-splicing. Although the possibility that some chimeras are the in vitro artifact of template switching by the reverse transcriptase cannot be totally ruled out (reverse transcriptase–free assays are much harder to perform), the recent evidence that some chimeras are translated corroborates their authenticity and motivated us to establish a systematic catalog of all chimeras. Another reason to categorize chimeras is their association with cancer, when the transcriptome is notoriously more complex owing to a large number of genome rearrangements, mutations and alterations of the splicing machinery.