Title:
Antisense transcription regulates the expression of sense gene via alternative polyadenylation
反义转录通过选择性多聚腺苷酸化调控有义基因的表达
Abstract:
Natural antisense transcripts (NAT) and alternative polyadenylation (APA) of messenger RNA (mRNA) are important contributors of transcriptome complexity, each playing a critical role in multiple biological pro-cesses. However, whether they have crosstalk and function collaboratively is unclear. We discovered that APA enriched in human sense-antisense (S-AS) gene pairs, and finally focused on RNASEH2C-KAT5 S-AS pair for further study. In cis but not in trans over-expression of the antisense KAT5 gene promoted the usage of distal polyA (pA) site in sense gene RNASEH2C, which generated longer 3′ untranslated region (3′UTR) and pro-duced less protein, accompanying with slowed cell growth. Mechanistically, elevated Pol II occupancy coupled with SRSF3 could explain the higher usage of distal pA site. Finally, NAT-mediated downregulation of sense gene’s protein level in RNASEH2C-KAT5 pair was specific for human rather than mouse, which lacks the distal pA site of RNASEH2C. We provided the first evi-dence to support that certain gene affected phenotype may not by the protein of its own, but by affecting the expression of its overlapped gene through APA, imply-ing an unexpected view for understanding the link between genotype and phenotype.
天然反义转录本(NAT)和信使RNA(mRNA)的选择性多聚腺苷酸化(APA)是转录组复杂性的重要贡献因子,其在多种生物学过程中都起着关键作用。但是,它们是否具有串扰和协同的功能,目前还不清楚。我们发现,APA富集在人类细胞中的有义-反义(S-AS)基因对中,我们最终聚焦于RNASEH2C-KAT5 S-AS,对其进行进一步的研究。顺式(而非反式)过表达反义KAT5基因,可以促进有义基因RNASEH2C中远端polyA(pA)位点的使用,其可以产生较长的3’非翻译区(3’UTR)并产生较少的蛋白质,并伴随着细胞生长减缓。从机制上讲,Pol II占有率的升高与SRSF3相结合可以解释远端pA位点的使用率较高这一现象。最后,NAT介导的RNASEH2C-KAT5基因对中有义基因蛋白水平的下调在人类细胞中特异(而非小鼠),该蛋白缺乏RNASEH2C的远端pA位点。我们提供了第一个证据来支持某些基因受影响的表型可能不是通过其自身的蛋白质,而是通过APA影响其重叠基因的表达,暗示出理解基因型和表型之间的联系的另一种观点。
Introduction:
Eukaryotic transcriptome has exhibited increasing com-plexity due to the discovery of gene regulation at multiple aspects (Licatalosi and Darnell, 2010), such as dynamic variation in transcription initiation, alternative splicing, alter-native 3′ end processing and RNA localization, etc. There exist many gene loci characterized with sense and over-lapping natural antisense transcripts (NATs) in many species (Katayama et al., 2005; Yelin et al., 2003; David et al., 2006). Cis-encoded NATs are driven by promoters at the opposite strand of the so-called sense gene, usually partially reverse complementary to its sense partner. The sense-antisense (S-AS) pairs were found to express synergistically rather than by chance (Chen et al., 2005). In S-AS gene pairs, sense gene usually refers to the protein-coding gene, while the antisense partner can be either coding or non-coding. NATs can affect the expression of corresponding sense genes in cis or in trans at multiple levels (Pelechano and Steinmetz, 2013). One possible mechanism is that NATs can act as a scaffolder to recruit trans-factors to the sense gene loci, and affect its transcription by changing locally the state of DNA methylation or histone modification (Pelechano and Steinmetz, 2013). Splicing processes can be modulated by NATs as well (Beltran et al., 2008; Hu et al., 2016). A recent study reported that an antisense transcript, 5S-OT, modu-lated alternative splicing in trans through Alu or anti-Alu pairing with target gene (Hu et al., 2016). NATs can also influence the stability and translation of transcripts via the formation of sense-antisense double-stranded RNA (dsRNA)(Faghihi et al., 2010; Carrieri, 2012).
由于在多个层面发现了基因调控(例如转录起始的动态变异,选择性剪接,选择性3’末端加工和RNA定位等),真核转录组表现出复杂性的升高。在许多物种中存在着许多以有义和重叠的天然反义转录本(NAT)为特征的基因座。顺式编码的NAT由有义基因在相反链上的启动子所驱动,其通常与其有义配偶体部分反向互补。有研究发现,有义-反义(S-AS)基因对是协同表达的而非偶然表达。在S-AS基因对中,有义基因通常是指蛋白质编码基因,而反义配偶体可以是编码基因或非编码基因。 NAT可以在多个水平上影响顺式或反式相应有义基因的表达。一种可能的机制是,NAT可以作为一种支架,用于将反式因子募集到有义基因位点,并通过局部改变DNA甲基化状态或组蛋白修饰来影响其转录。此外,剪接过程也可以通过NAT进行调控。最近的一项研究报道,反义转录本5S-OT通过Alu反式调控选择性剪接或与靶基因反式Alu配对。 NAT还可以通过形成有义-反义双链RNA(dsRNA)来影响转录本的稳定性和翻译。
Alternative polyadenylation (APA), defined as the polyadenylation of precursor messenger RNA (pre-mRNA) at multiple sites, is another layer of gene regulation that contributes to transcriptome complexity at the last step of mRNA maturation (Elkon et al., 2013; Colgan and Manley, 1997). APA has been demonstrated to play critical roles in biological and pathological processes such as development, tissue identity, cell proliferation, cell differentiation, as well as cancer and heart failure (Ji et al., 2009; Ni et al., 2013; Sandberg et al., 2008; Ji and Tian, 2009; Mayr and Bartel, 2009; Fu et al., 2011; Park et al., 2011). Most often, APA generates transcript variants with different length of 3′untranslated regions (3′UTR), though it could affect the coding region occasionally. Different length of 3′UTR could affect RNA stability and translation efficiency mediated by either microRNA (miRNA) or RNA binding protein (RBP)(Sandberg et al., 2008; Mayr and Bartel, 2009). Besides, different 3′UTR can also affect the subcellular localization of RNAs or corresponding proteins (An et al., 2008; Berkovits, 2015). So far as is known, APA can be regulated by cis-acting elements, trans-acting factors, Pol II occupancy and elongation rate, and chromatin state (Di Giammartino et al., 2011; Millevoi and Vagner, 2010; Ji, 2011; Pinto et al., 2011; Gunderson et al., 1998; Kaida et al., 2010; Spies et al., 2009).
选择性多聚腺苷酸化(APA),被定义为在多个位点中存在的前体信使RNA(pre-mRNA)的多聚腺苷酸化,它是基因调控的另一个层面,其在mRNA成熟的最后一步有助于转录组的复杂性的形成。目前已经证明,APA在生物学和病理学过程(例如发育,组织特性,细胞增殖,细胞分化以及癌症和心力衰竭)中发挥着关键作用。尽管APA偶尔会影响编码区,但最常见的是其可以产生具有不同长度的3’非翻译区(3’UTR)的转录变体。不同长度的3’UTR可以影响由microRNA(miRNA)或RNA结合蛋白(RBP)介导的RNA稳定性和翻译效率。此外,不同的3’UTR还可以影响RNA或相应蛋白质的亚细胞定位。据我们所知,APA可以被顺式作用元件、反式作用因子、Pol II占有率和伸长率以及染色质状态所调控。
There are more than 30% annotated human transcripts containing NATs (Ozsolak et al., 2010), and around 70%–75% human genes have APA (Elkon et al., 2013; Derti et al., 2012). Based on such prevalence of antisense tran-scription and APA in human genome, we speculate that they may crosstalk and function collaboratively in certain cases. A known example is that expression of NATs associates with the relative abundance of two sense isoforms generated by APA in mouse embryonic stem cells (Onodera et al., 2012). However, whether and how NATs regulate APA is completely unknown. Discovery of the interaction of these events will broaden our understanding of transcriptome complexity and form additional connections from genotype to phenotype.
已经有超过30%的注释人类转录本中含有NAT,约70%-75%的人类基因中具有APA。 由于人类基因组中反义转录和APA是普遍存在的,我们推测它们在某些情况下可能会发生串扰和协同作用。 已知的实例是NAT的表达与APA在小鼠胚胎干细胞中产生的两种有义亚型的相对丰度相关。 然而,NAT是否以及如何调控APA是完全未知的。 探索这些事件的相互作用将拓宽我们对转录组复杂性的理解,并形成从基因型到表型的额外联系。
To dive into this question, we first analyzed our published PA-seq data generated from 13 human tissues and found that APA had a significant enrichment in sense-antisense gene pairs, among of which the S-AS gene pair RNA-SEH2C-KAT5 was selected to address the causality between antisense transcription and APA. We found that in cis but not in trans over-expression of antisense KAT5 pro-moted the higher usage of distal pA site of sense RNA-SEH2C gene. Unexpectedly, in cis increased expression of KAT5 led to a dramatic protein decline of RNASEH2C, which successively led to decreased cell proliferation rate. Pol II occupancy and recruited SRSF3 were found associated with higher usage of distal pA site, and noteworthy, such regu-lation for RNASEH2C-KAT5 existed in human but not in mouse, suggesting this is a newly evolved mechanism and adds a hidden layer of transcriptome diversity in human genome. Together, we discovered for the first time that antisense transcription regulated sense gene’s expression through alternative polyadenylation.
为了深入研究这个问题,我们首先分析了我们已公布的从13种人体组织中产生的PA-seq数据,发现APA在有义-反义基因对中具有显著的富集,其中我们选择S-AS基因对RNA-SEH2C-KAT5来解决反义转录和APA之间的因果关系。我们发现对反义KAT5进行顺式(而非反义)过表达,可以促使有义RNA-SEH2C基因的远端pA位点的使用率升高。出乎意料的是,KAT5的顺式表达增加时,会导致RNASEH2C的蛋白质显着下降,这进而又会导致细胞增殖率降低。我们发现,Pol II的占用和SRSF3的募集与远端pA位点的高使用率相关,并且,值得注意的是,RNASEH2C-KAT5的这种调控存在于人类中(而非小鼠),这对于人类基因组来说,是一种新的进化机制并且增加了人类转录组的多样性。总之,我们首次发现反义转录通过选择性多聚腺苷酸化来调控有义基因的表达。
Results:
APA enriched in overlapped gene pairs
APA富集在重叠基因对中
To explore whether antisense transcripts and APA have possible connections genome-widely, we analyzed our pre-viously published PA-seq datasets from 13 human tissues (Ni et al., 2013), and found that sense-antisense (S-AS) genes accounted for 23.33% of the expressed genes (3,471/ 14,876), similar to the proportion previously reported (Oz-solak et al., 2010). Interestingly, genes with S-AS pairs had more numbers of APA gene than the rest genes (Table S1, 1.24-fold enrichment, X2 test, P value = 1.38 × 10−10). Then, tail-to-tail S-AS gene pairs were chosen for further study since they overlapped in the polyadenylation sites and more likely to have mechanistic interaction between antisense transcription and APA. Compared to non-overlapped genes, tail-to-tail S-AS gene pairs were found more enriched with APA genes (Table S1, 1.19-fold enrichment, X2 test, P value = 2.95 × 10−16), implying intrinsic relevance between anti-sense transcription and APA.
为了探索反义转录本和APA是否有可能在基因组范围内广泛联系,我们分析了我们之前发表的13种人体组织的PA-seq数据集,并发现有义反义(S-AS)基因占表达基因的23.33%(3,471 / 14,876),这与之前报道的比例相似。 有趣的是,具有S-AS对的基因比其余基因具有更多的APA基因。 然后,我们选择了尾到尾的S-AS基因进行进一步研究,因为它们在多聚腺苷酸化位点重叠并且更可能在反义转录和APA之间具有机制起相互作用。 与非重叠基因相比,我们发现尾到尾的S-AS基因对富集了更多APA基因,这暗示出反义转录和APA之间的内在联系。
Since distal polyA (pA) site of one gene in tail-to-tail S-AS gene pair always stayed on the way of the transcription of the other gene, we next examined the correlation between change of NATs expression and the distal pA site usage of the sense gene by PA-seq, which can quantify both the distal/proximal pA site usage and the relative gene expres-sion level (Ni et al., 2013). Interestingly, we found both positive and negative correlations (Fig. S1), suggesting the link between NATs expression and distal pA site usage was rather complicated. To probe into whether antisense tran-scription played a causal role in regulating pA site usage of sense gene, we applied candidate gene approach following the criteria: 1) relative high expression of both genes in a S-AS gene pair in at least 10 out of 13 human tissues; 2) distal and proximal pA sites are both used in all 13 tissues; 3) relatively high correlation coefficient between antisense transcription expression and distal pA site usage; 4) novel pA site detected by PA-seq (not annotated by RefSeq), which would likely indicate new functional aspects of known gene. Finally, RNASEH2C-KAT5 S-AS gene pair was selected for extensive investigation, because this pair met all the criteria above, and both genes in the pair were protein-coding and has molecular function related to genome stability and DNA repair, which have been reported involved in important biological processes such as cancer and aging (Loeb, 2011; Wallace et al., 2012; Lopez-Otin et al., 2013; Moskalev et al., 2013). RNASEH2C encodes a catalytic subunit of RNASEH2, which supervises genome integrity and stability during DNA replication (Reijns et al., 2012).
由于尾到尾S-AS基因对中一个基因的远端polyA(pA)位点总是停留在另一个基因的转录途径上,我们接下来通过PA-seq检查了NATs表达变化与有义基因远端pA位点使用之间的相关性,这可以量化远端/近端pA位点的使用和相对基因表达水平。有趣的是,我们同时发现了正相关和负相关(图S1),这表明NAT表达与远端pA位点使用之间的联系相当复杂。为了探索反义转录是否在调控有义基因的pA位点使用中发挥作用,我们应用候选基因的方法遵循以下标准:1)S-AS基因对中两个基因至少在10个人体组织中相对高表达(总共13种); 2)所有13种组织均采用了远端和近端pA位点; 3)反义转录表达与远端pA位点使用之间的相关系数相对较高; 4)PA-seq检测到的新型pA位点(未经RefSeq注释),这可能表明已知基因的新功能。最后,我们选择了RNASEH2C-KAT5这个S-AS基因对,并对其进行广泛研究,因为该基因对符合上述所有标准,该对中的两个基因均为蛋白质编码,具有与基因组稳定性和DNA修复相关的分子功能,这些功能涉及了癌症和衰老等重要的生物过程。 RNASEH2C编码RNASEH2的催化亚基,其在DNA复制过程中监管基因组完整性和稳定性。
KAT5 is a lysine acetyltransferase and plays roles in DNA repair and apoptosis through histone acetylation (Ikura et al., 2000; Squatrito et al., 2006). Interestingly, RNASEH2C has two pA sites while KAT5 has only one pA site, we thus defined RNASEH2C as the sense gene and KAT5 as its antisense partner, and utilized them to examine the effect of antisense transcription on sense gene’sAPA.
KAT5是赖氨酸乙酰转移酶,其通过组蛋白乙酰化在DNA修复和凋亡中起作用。 有趣的是,RNASEH2C具有两个pA位点,而KAT5仅具有一个pA位点,因此我们将RNASEH2C定义为有义基因,将KAT5定义为其反义配偶体,并利用它们检测反义转录对有义基因的APA的影响。
Transcript of RNASEH2C using the distal pA site is less stable and produces less protein
使用远端pA位点的RNASEH2C的转录物较不稳定并且产生较少的蛋白质
Human RefSeq gene annotation showed that KAT5 and RNASEH2C are overlapping pair with single polyadenylation site (Fig. 1A). However, our PA-seq data discovered that RNASEH2C has a novel pA site in the 3′UTR (named proximal pA site), resulting in a transcript variant with shorter 3′UTR and not overlaps with KAT5 (Figs. 1A and S2). Both pA sites of RNASEH2C were confirmed by two independent methods. First, public PolyA-Seq track provided by UCSC genome browser confirmed the existence of proximal pA site in multiple human samples (Fig. S3). Second, 3′ RACE (rapid-amplification of 3′ cDNA ends) showed two bands corresponding to distal (or known pA, annotated by RefSeq) and proximal pA sites (Fig. 1B), which was further validated by Sanger sequencing (Fig. S4). To inquire the difference between these two isoforms using these two pA sites, qRT-PCR (quantitative reverse transcription real-time polymerase chain reaction) was performed and demonstrated that tran-script using the distal pA site (with longer 3′UTR) was less stable than the shorter one upon transcription blocking (Fig. 1C). Interestingly, dual luciferase assay showed that transcript with longer 3′UTR produced less protein than the shorter one (Fig. 1D). These data suggested that alternative polyadenylation of RNASEH2C can affect the protein abundance.
人RefSeq基因注释显示出KAT5和RNASEH2C与单个多聚腺苷酸化位点重叠(图1A)。然而,我们的PA-seq数据发现,RNASEH2C在3’UTR中具有新的pA位点(命名为近端pA位点),导致转录本变体具有较短的3’UTR并且不与KAT5重叠(图1A和S2)。 RNASEH2C的两个pA位点都通过两种独立的方法被验证。首先,由UCSC基因组浏览器提供的公共PolyA-Seq轨道证实了多个人类样本中近端pA位点的存在(图S3)。第二,3’RACE显示对应于远端和近端pA位点的两条带(图1B),其通过Sanger测序进一步验证(图1B)。为了使用这两个pA位点查询这两种亚型之间的差异,我们进行qRT-PCR并证明了,在转录阻断时,相比于较短的PA位点,使用远端pA位点(具有更长的3’UTR)的转录本稳定性下降(图1C)。有趣的是,双荧光素酶测定显示,相比于较短的转录本,具有较长3’UTR的转录本会产生较少的蛋白质(图1D)。这些数据表明,RNASEH2C的选择性多聚腺苷酸化可以影响蛋白质丰度。
NAT regulates pA site usage of RNASEH2C in cis
NAT以顺式的方式调控RNASEH2C的pA位点使用
The difference in translation outcome between isoforms of RNASEH2C using different pA sites indicated biological importance of APA dynamic changes in human tissues. Interestingly, the usage of distal pA site of RNASEH2C was found positively correlated with KAT5 expression (Fig. S1), which was further validated in 6 additional human cell lines (Fig. S5). To investigate further the effect of NAT in controlling pA site usage of the sense gene, we perturbed the expression of NAT gene (KAT5) and then measured the pA site usage in RNASEH2C. Interestingly, we found that ectopic (or in trans) overexpression of KAT5 did not affect the pA site usage of RNASEH2C (Fig. 2A and 2B), neither did RNA interference (RNAi)-mediated knockdown of KAT5 (Fig. 2C). Then, we manipulated KAT5 expression in cis by replacing its original promoter with a stronger mammalian CMV promoter through CRISPR/Cas9 gene editing method (Fig. 2D). Intriguingly, unlike in trans overexpression of NAT, in cis elevation of NAT caused a significant higher usage of distal pA site in RNASEH2C by both qRT-PCR and Northern blot (Figs. 2D and S6). Consistently, in cis knockdown of KAT5 by deleting a core binding motif of transcription factor (E2F3) using CRISPR/Cas9 approach led to a significant lower usage of distal pA site of RNASEH2C (Fig. 2E). These intervention results were in consistent with the positive cor-relation between NAT expression and distal pA site usage in multiple human tissues and cell lines (Figs. S1 and S5). Thus, in cis transcription of NAT, rather than in trans expression, regulated pA site usage of the sense gene in RNASEH2C-KAT5 S-AS gene pair.
使用不同pA位点的RNASEH2C亚型之间在翻译产物方面的差异,暗示出动态变化的APA在人体组织中的生物学重要性。有趣的是,我们发现RNASEH2C的远端pA位点使用与KAT5表达正相关(图S1),其已在另外6种人体细胞系中被进一步验证(图S5)。为了进一步研究NAT在有义基因在调控pA位点使用方面的作用,我们扰乱了NAT基因(KAT5)的表达,然后测量了RNASEH2C中pA位点的使用。有趣的是,我们发现KAT5的异位(或反式)过表达并不影响RNASEH2C的pA位点使用(图2A和2B),并且RNA干扰(RNAi)介导的KAT5敲除也不影响(图2C)。接着,我们通过CRISPR/Cas9方法,用一个更强的哺乳动物CMV启动子来替换其原始启动子,用于操纵顺式KAT5的表达(图2D)。有趣的是,与NAT的反式过表达不同,由qRT-PCR和Northern blot的结果可知,NAT的顺式升高导致RNASEH2C中远端pA位点的使用显着更高(图2D和S6)。与其一致的是,我们通过使用CRISPR/Cas9方法删除了转录因子(E2F3)的核心结合基序,来顺式敲除KAT5,并导致RNASEH2C的远端pA位点的使用率更低(图2E)。这些干预结果与多种人体组织和细胞系中NAT表达和远端pA位点使用之间的正相关性一致(图S1和S5)。因此,NAT的顺式转录(而非反式表达中)可以调控RNASEH2C-KAT5 S-AS基因对有义基因的pA位点使用。
Since steady-state mRNA level is determined by the rates of nascent RNA transcription and RNA degradation, to pre-cisely examine whether antisense transcription regulated APA at transcriptional level, abundance of nascent RNA was evaluated by two independent approaches (Click-iT and Bru-PCR) (Jao and Salic, 2008; Paulsen et al., 2014). In cis transcriptional upregulation of KAT5 was first confirmed (Fig. 3A). Next, higher usage of distal pA site for RNASEH2C at nascent RNA level was validated by both Click-iT and Bru-PCR methods (Figs. 3A, 3B and S7). These evidences suggested that NAT controlled APA of the sense gene at transcriptional level.
由于稳态mRNA水平由新生RNA转录和RNA降解的速率决定,为了准确地检查反义转录是否在转录水平上调控APA,我们通过两种独立的方法评估新生RNA的丰度(Click-iT和Bru- PCR)。 在顺式转录中,KAT5的上调首先被证实(图3A)。 接下来,我们通过Click-iT和Bru-PCR方法验证了新生RNA水平下RNASEH2C的远端pA位点的更高使用(图3A,3B和S7)。 这些证据表明,NAT在转录水平上控制了有义基因的APA。
In cis upregulated antisense transcription leads to less protein production of the sense gene
顺式上调的反义转录导致有义基因的蛋白质产生较少
To explore the consequence of APA, we quantify the abun-dance of RNASEH2C isoform with short 3′UTR in cells with in cis overexpression of KAT5, because it was possibly the major template for translation (Fig. 1A and 1D). Consistent with this expectation, reduced abundance of the short RNASEH2C isoforms was detected in cytoplasmic fraction (Figs. 3C, 3D and S8). Accordingly, reduced protein level of RNASEH2C was found in both single clones (Fig. 3E) and mixed cells with in cis overexpression of KAT5 (KAT5_icOE)(Fig. S9). In contrast, in trans overexpression of KAT5 did not affect the protein level of RNASEH2C (Fig. 3E). These results above collectively demonstrated that in cis overex-pression of KAT5 led to higher distal pA site usage, then led to decreased mature mRNA template for translation in cytoplasm, and finally less protein generation.
为了探索APA的结果,我们在具有顺式过表达KAT5的细胞中量化具有短3’UTR的RNASEH2C亚型的丰度,因为它可能是翻译的主要模板(图1A和1D)。 与预期一致,我们在细胞质级分中检测到短RNASEH2C亚型的丰度降低(图3C,3D和S8)。 因此,在单个克隆(图3E)和具有顺式过表达KAT5的混合细胞(KAT5_icOE)中,我们发现RNASEH2C的蛋白质水平降低(图S9)。 相反,KAT5的反式过表达却不影响RNASEH2C的蛋白质水平(图3E)。 以上这些结果共同证明,在顺式过表达KAT5导致更高的远端pA位点使用,然后导致成熟mRNA模板在细胞质中翻译减少,并且最终蛋白质产生较少。
Decreased RNASEH2C protein production slows cell growth
降低RNASEH2C蛋白质产生减缓细胞生长
The cellular and molecular phenotypes of reduced RNA-SEH2C level were next investigated. Cell growth ability dramatically decreased in RNASEH2C-depleted human 293T cells (Fig. 4A), and the expression of related molecular makers, such as Mki67 (the replication marker) and CCND1 (en-codes Cyclin D1, an important cell cycle marker), were also sharply declined (Fig. 4B and 4C). Additionally, similar phenotypes were also observed in HUVEC (human umbilical vein endothelial cells) and A549 (human lung adenocarcinoma cell line) cells upon knockdown of RNASEH2C (Fig. S10).
接下来,我们研究了降低的RNA-SEH2C水平的细胞和分子表型。 在RNASEH2C耗尽的人体293T细胞中,细胞的生长能力显着降低(图4A),并且相关分子标记物(如Mki67和CCND1)也急剧下降(图4B和4C)。 另外,在敲除RNASEH2C后,我们在HUVEC(人脐静脉内皮细胞)和A549(人肺腺癌细胞系)细胞中也观察到类似的表型(图S10)。
Since the abundance of RNASEH2C protein significantly reduced upon in cis over-expression of KAT5, we then examined the cell growth behavior in cells over-expressing KAT5 in cis. Interestingly, these cells also exhibited reduced cell growth rate (Fig. 4D), which was similar to RNASE2HC-depleted ones. In contrast, ectopic over-expression of KAT5 did not have detectable effect on cell proliferation (Fig. 4E), which excluded the possibility that phenotypes presented above were resulted directly from the increased protein level of KAT5.
由于RNASEH2C蛋白的丰度在KAT5的顺式过表达显著降低,因此我们检测了在顺式过表达KAT5的细胞中的细胞生长行为。 有趣的是,这些细胞也表现出细胞生长速率的下降(图4D),这与RNASE2HC耗尽的细胞相似。 相反,我们没有检测到KAT5的异位过表达对细胞增殖的影响(图4E),这排除了上述表型直接由KAT5蛋白水平升高引起的可能性。
High Pol II occupancy coupled with SRSF3 is associated with distal pA site usage
高Pol II占用、与SRSF3的结合均与远端pA位点使用相关
As both RNASEH2C and KAT5 in this S-AS gene pair are protein-coding genes and transcribed by RNA polymerase II (Pol II), which has been demonstrated to play regulatory roles in alternative polyadenylation (Ji, 2011; Pinto et al., 2011; Hsin and Manley, 2012), we thus performed Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) for Pol II to probe into the possible mechanism by which antisense transcription regulated APA. ChIP-seq result showed higher Pol II occupancy near promoter region of KAT5 in both human cell mixture and cells derived from single clones of CRISPR/Cas9 gene editing (KAT5_icOE) compared to control cells (Fig. 5A). Such results were further confirmed by ChIP-PCR and ChIP-qPCR (Figs. 5B, 5C and S11). Intriguingly, higher Pol II occupancy was detected at both distal and proximal pA sites of sense gene RNASEH2C upon in cis overexpressing KAT5 (Figs. 5A–C and S11).
由于该S-AS基因对中的RNASEH2C和KAT5都是蛋白质编码基因并且由RNA聚合酶II(Pol II)转录,已有研究证明其在选择性多聚腺苷酸化中起调控作用,因此,我们进行了染色质免疫共沉淀,然后对Pol II进行了ChIP-seq,以探测反义转录调控APA的潜在机制。 ChIP-seq的结果显示,与对照细胞相比,在人体细胞混合物和源自CRISPR/Cas9基因编辑的单克隆(KAT5_icOE)的细胞中,KAT5的启动子区附近的Pol II占据率更高(图5A)。 ChIP-PCR和ChIP-qPCR进一步证实了这些结果(图5B,5C和S11)。 有趣的是,在顺式过表达KAT5时,我们在有义基因RNASEH2C的远端和近端pA位点检测到更高的Pol II占据(图5A-C和S11)。
To explain the higher usage of distal pA site upon ele-vation of Pol II occupancy, we hypothesize that distal pA site is more sensitive to local Pol II concentration compared to the proximal one. Two lines of evidence support this hypothesis. First, distal pA site has much lower usage than proximal one in most of the human tissues (Figs. 1A, S2 and S3), implying cis-regulatory strength is relatively weak near distal site. Second, distal pA site is newly evolved (Fig. S12), thus the cis-regulatory elements of polyadenylation might be not as strong as the conserved proximal one, and the usage of distal pA site possibly needs more trans-acting factors recruited by Pol II to help.
为了解释远端pA位点在Pol II占据情况下的较高用量,我们假设,与近端pA位点相比,远端pA位点对局部Pol II浓度更敏感。 目前有两条证据支持这一假设。 首先,在大多数人体组织中,远端pA部位的使用率远低于近端部位(图1A,S2和S3),这意味着远端部位附近的顺式调控强度相对较弱。 其次,远端pA位点是新进化的(图S12),因此多聚腺苷酸化的顺式调控元件可能不如保守的近端元件那样强,并且远端pA位点的使用可能需要更多由Pol II招募的反式作用因子。
To further investigate if existed proteins recruited by Pol II can regulate the selection of distal pA site in RNASEH2C, we screened 14 genes encoding 3′ end processing factors or splicing factors in 293T cells. Interestingly, splicing factor SRSF3 showed the most significant impact on changes in pA site usage (Figs. 5D–F and S13). In the test cells with SRSF3 knockdown, decreased ratio of the long transcript was observed, implying that usage of the distal pA site of RNASEH2C was inhibited upon SRSF3 knockdown. More-over, further study provided evidence that SRSF3 interacted with the C terminal domain of Pol II (Fig. 5G), which is consistent with previous findings (de la Mata and Kornblihtt, 2006). Collectively, we speculated that elevated Pol II coupled with SRSF3 might participate in the regulation of RNASEH2’sAPA.
为了进一步研究Pol II募集的现有蛋白质是否可以调控RNASEH2C中远端pA位点的选择,我们筛选了编码293T细胞中3’末端加工因子或剪接因子的14个基因。 有趣的是,剪接因子SRSF3对pA位点使用的变化显示出最显著的影响(图5D-F和S13)。 在具有SRSF3敲除的测试细胞中,可以观察到长转录本的比率降低,这意味着在SRSF3敲除后,会对RNASEH2C的远端pA位点的使用产生抑制。 此外,进一步的研究也提供了SRSF3与Pol II的C末端结构域相互作用的证据(图5G),这与先前的发现一致。 总的来说,我们推测与SRSF3偶联的Pol II的升高,可能参与RNASEH2’sAPA的调控。
NAT-mediated APA regulation in KAT5-RNASEH2C gene pair is a newly evolved mechanism in controlling protein production of sense gene
在KAT5-RNASEH2C基因对中,NAT介导的APA调控是控制有义基因蛋白质生成的新进化机制。
To ask the biological significance of antisense transcription mediated downregulation of sense gene’s protein production in an evolutionary view, we examined the conservation of DNA sequences near the distal pA site of RNASEH2C in multiple species. The results revealed that only primates had distal polyA signal in RNASEH2C gene (Fig. S12A), while other mammalians such as mouse and rat did not have the distal polyA signal, resulting the absence of isoform with long 3′UTR (Figs. 6A and S14). Moreover, upstream DNA sequence of distal pA site was less conserved than that of proximal one (Fig. S12B).
为了从进化的角度探寻反义转录在介导有义基因蛋白质产生下调方面的生物学意义,我们检测了多个物种中RNASEH2C远端pA位点附近DNA序列的保守性。 结果显示,只有灵长类动物在RNASEH2C基因中具有远端polyA信号(图S12A),而其他哺乳动物如小鼠和大鼠均没有远端polyA信号,这导致具有长3’UTR的亚型的缺失(图6A和图6A)。S14)。 此外,远端pA位点的上游DNA序列比近端pA位点的上游DNA序列更不保守(图S12B)。
Our data suggested that in cis overexpression of anti-sense transcript leading to downregulation of overlapping gene’s protein level was mediated by distal pA site prefer-ence in human cells. To demonstrate that distal pA site played a critical role in mediating the reduced protein pro-duction, we in cis overexpressed KAT5 in mouse cells, which lacks the distal pA signal in RNASEH2C gene. 3′ RACE and Sanger sequencing confirmed that KAT5_icOE mouse cells kept to use only one pA site (corresponding to human proximal pA site) (Figs. 6B and S14). Further, unlike that in KAT5_icOE human cells, no change in RNA level and pro-tein abundance of RNASEH2C was detected in KAT5_icOE mouse cells (Fig. 6C and 6D). All these data indicated that, in S-AS gene pairs like KAT5-RNASEH2C, antisense tran-scription mediated APA regulation acted as a novel and intriguing mechanism in regulating the expression of the overlapped gene, adding a new layer of complexity in human gene regulation.
我们的数据表明,顺式过表达的反义转录本导致重叠基因蛋白水平的下调是由人类细胞中的远端pA位点优先介导的。 为了证明远端pA位点在介导蛋白质生成的降低中起关键作用,我们在小鼠细胞中过表达KAT5,其缺乏RNASEH2C基因中的远端pA信号。 3’RACE和Sanger测序均证实KAT5_icOE小鼠细胞仅使用一个pA位点(对应于人近端pA位点)(图6B和S14)。 此外,与KAT5_icOE人细胞中的不同,在KAT5_icOE小鼠细胞中未检测到RNA水平和RNA酶H2C蛋白丰度的变化(图6C和6D)。 所有这些数据表明,在像KAT5-RNASEH2C这样的S-AS基因对中,反义转录介导的APA调控,可作为调控重叠基因表达的一种新颖而有趣的机制,这为人类基因调控增添了新的复杂层次。
Discussion:
Understanding the mechanism contributing to the complexity of transcriptome is one of the key issues in post-genomic era, which will help to bridge the gap from genotype to phenotype. Due to the universal existence of S-AS gene pairs genome-wide, the causality of phenotype sometimes need to take both target gene and its neighbors into con-sideration. As an example, targeting therapy of ERBB2,an important regulator of breast cancer, turns out to be not successful because of the co-amplification and over-ex-pression of its neighbor genes (Vanden Bempt et al. 2007;Hu et al., 2009). A second example is the TNFAIPI-POLDIP2S-AS gene pair, whose co-regulated expression is associ-ated with breast cancer phenotypes and patient survival, suggesting the therapeutic approach needs to take genes in this complex sense-antisense architecture into consideration (Grinchuk et al., 2010). Given such importance of collabo-rative role in nearby gene pairs, our finding that NAT affected overlapped sense gene’s expression through APA revealed another layer of transcriptional complexity that bridges genotype and phenotype. In the present study, we showed that in cis over-expression of KAT5 led to reduced cell pro-liferation not through increased protein level of KAT5 itself, but through transcriptional interference on its overlapped gene RNASEH2C via APA (Fig. 7). Such an unexpected mechanism broadens our knowledge in gene expression regulation and opens up a new way to search for key genes controlling physiological/pathological phenotypes.
去理解那些导致转录组复杂性的机制是后基因组时代的关键问题之一,这将有助于弥合从基因型到表型的差距。由于全基因组S-AS基因对的普遍存在,表型的因果关系有时需要同时考虑目标基因及其邻近的基因。例如,乳腺癌的重要调节因子ERBB2的靶向治疗由于其邻近基因的共扩增和过度表达而证明是不成功的。第二个例子是TNFAIPI-POLDIP2S-AS基因对,其共同调节的表达与乳腺癌表型和患者存活相关,这表明治疗方法需要考虑这种复杂的正义-反义结构中的基因。鉴于邻近基因对中协作作用的重要性,我们发现NAT通过APA影响重叠的有义基因的表达,揭示了另一层转录复杂性,这可以将基因型和表型联系起来。在本研究中,我们发现顺式过表达KAT5导致细胞增殖减少,这并不是通过增加KAT5本身的蛋白水平完成的,而是通过APA途径对重叠基因RNASEH2C的转录干扰做到的(图7)。这种意想不到的机制拓宽了我们在基因表达调控方面的知识,并为寻找控制生理/病理表型的关键基因开辟了新途径。
S-AS gene pair can either be coding or non-coding. In the case of present study, both RNASEH2C and KAT5 are coding genes. We defined RNASEH2C as sense gene here since it contained two pA sites, which generated two distinct 3′UTRs that can affect protein production. KAT5 was then defined as antisense gene accordingly. It should be men-tioned that although human KAT5 had four splicing isoforms, they shared the same transcriptional start site and single polyadenylation site, and did not contain alternative splicing events in the overlapped region (Fig. 1A), thus in cis over-expression of KAT5 won’t introduce additional factors affecting APA of RNASEH2C. The experimental results indicated that antisense-mediated regulation of sense gene depended on the overlapped distal pA site. The existence of distal pA site in human while lacks in mouse in RNASEH2C provides an excellent comparison pair to test the hypothesis. In human cells, elevated transcription of antisense gene (KAT5) caused a distal pA site usage shift in sense gene (RNASEH2C), which reduced the production of corre-sponding protein. However, in mouse cells, due to the absence of distal pA site, in cis overexpression of KAT5 cannot trigger APA, and result in the unchanged expression of RNASEH2C.
S-AS基因对可以是编码的或非编码的。在本研究中,RNASEH2C和KAT5都是编码基因。我们将RNASEH2C定义为有义基因,因为它含有两个pA位点,产生两个不同的3’UTR,可以影响蛋白质的产生。将KAT5定义为反义基因。应该提到的是,尽管人KAT5具有四种剪接异构体,但它们共享相同的转录起始位点和单个多聚腺苷酸化位点,并且在重叠区域中不包含选择性剪接事件(图1A),因此顺式过表达KAT5不会引入影响RNASEH2C的APA的其他因子。实验结果表明,反义介导的有义基因调控依赖于重叠的远端pA位点。RNASEH2C的远端pA位点在人类中存在(而小鼠中不存在),为我们提供了一个很好的比较来检验这一假设。在人体细胞中,反义基因(KAT5)的转录升高导致有义基因(RNASEH2C)的远端pA位点使用转变,这减少了相应蛋白质的产生。然而,在小鼠细胞中,由于缺乏远端pA位点,顺式过表达KAT5不能触发APA,并导致RNASEH2C的表达不变。
Pol II has been reported to regulate APA (Ji, 2011; Pinto et al., 2011; Hsin and Manley, 2012). Moreover, high tran-scriptional activity (usually corresponding to elevated Pol II occupancy) was also found promote proximal pA site usage of the same gene (Ji, 2011). In the case of tail-to-tail RNA-SEH2C-KAT5 gene pair, distal pA site is far away from RNASEH2C promoter, but is much closer to KAT5 promoter (Fig. 1A). Thus in cis overexpression of KAT5 favored the usage of pA site closer to KAT5 promoter (i.e., the distal pA site of RNASEH2C), consistent with the conclusion drawn by Ji et al. (2011). Previous studies have already showed that RNA processing factors including splicing factors and polyadenylation factors can bind to the C terminal domain (CTD) of Pol II to participate in RNA processing (Hsin and Manley, 2012; Di Giammartino et al., 2011). Intriguingly, we discovered splicing factor SRSF3 could bind to the C ter-minal domain of Pol II to contribute to pA site selection of RNASEH2C. However, it is worth noting that such a mech-anism might not be general in all tail-to-tail gene pairs given the correlation coefficient between antisense expression and distal pA site usage can be either positive or negative (Fig. S1).
据报道,Pol II可以调控APA。此外,还有研究发现高转录活性(通常对应于升高的Pol II占据率)会促进相同基因的近端pA位点使用。在尾对尾RNA-SEH2C-KAT5基因对的情况下,远端pA位点远离RNASEH2C启动子,但更接近KAT5启动子(图1A)。因此,顺式过表达KAT5有利于使用更靠近KAT5启动子的pA位点(即RNASEH2C的远端pA位点),这与Ji等人得出的结论一致。先前的研究已经表明,包括剪接因子和选择性多聚腺苷酸化因子在内的RNA加工因子可以与Pol II的C末端结构域(CTD)结合,以参与RNA加工。有趣的是,我们发现剪接因子SRSF3可以与Pol II的C末端结构域结合,从而有助于RNASEH2C的pA位点选择。然而,值得注意的是,鉴于反义表达与远端pA位点使用之间的相关系数可正可负(图S1),这种机制在所有尾对尾基因对中可能并不通用。
Over-expressing KAT5 in cis did not downregulate RNA-SEH2C protein level in mouse cells, suggesting the newly evolved distal pA site in human was critical to mediate APA in RNASEH2C and ultimately reduced the protein level. We speculated that such function of pA site usage might depend on the interplay between the cis-regulatory elements and trans-acting factors. Although the ideal experimental design to prove the importance of APA in down-regulating sense gene’s protein level was to completely disrupt the usage of distal pA site in RNASEH2C, it was technical challenging to delete all the cis-elements near the distal pA site completely. Actually, we tried multiple gRNAs and still can not obtain cell clones with mutated polyA signal upstream of the distal pA site, possibly due to the inefficiency of the related gRNA (Fig. S15A). Nevertheless, we successfully deleted polyadenylation-related cis-elements downstream of the distal pA site and observed the decreased expression of the long isoform of RNASEH2C (Fig. S15B and S15C), sug-gesting that these elements were helpful for the recognition and cleavage of the distal pA site. To circumvent this tech-nical limitation, we introduce mouse cells that lack of distal pA site for RNASEH2C to serve as an indirect evidence instead, wherein, in cis upregulation of KAT5 could not affect the expression of RNASEH2C (Fig. 6). These above results indicated that APA probably played a regulatory role in mediating RNASEH2C expression. Although the detailed mechanism of this intriguing regulation awaited to be eluci-dated, our results suggested that such a regulation should be taken into consideration in understanding the complicated transcriptome, especially in highly-evolved organism such as human.
过量表达顺式KAT5并未在小鼠细胞中下调RNA-SEH2C的蛋白水平,这表明人类中新进化出的远端pA位点,对介导RNASEH2C中的APA至关重要,并最终降低蛋白质水平。我们推测,pA位点使用的这种功能可能取决于顺式调控元件和反式作用因子之间的相互作用。尽管证明APA在下调有义基因蛋白水平中的重要性的理想实验设计是完全破坏RNASEH2C中远端pA位点的使用,但是完全删除远端pA位点附近的所有顺式元件存在技术上的挑战。实际上,我们尝试了多个gRNA,但仍然无法在远端pA位点的上游获得具有突变polyA信号的细胞克隆,这可能是由于相关gRNA的效率低下(图S15A)。但是,我们成功地删除了远端pA位点下游的多聚腺苷酸化相关的顺式元件,并观察到RNASEH2C长同种型的表达降低(图S15B和S15C),这表明这些元件有助于识别和切割远端pA部位。为了克服这种技术限制,我们引入缺乏RNASEH2C的远端pA位点的小鼠细胞作为间接证据,其中,顺式上调KAT5不能影响其RNASEH2C的表达(图6)。以上结果表明,APA可能在介导RNASEH2C表达中起调控作用。尽管该调控的详细机制有待阐明,但我们的结果表明,我们在理解复杂的转录组时应该考虑这样的调控,特别是在高度进化的生物体(如人类)中。