Construction of a Glycolysis-related long noncoding RNA signature for predicting survival in endometrial cancer

Background: long noncoding RNA (lncRNA) has been widely studied and understood in various cancer types. However, the expression profiles of glycolysis-related lncRNA in endometrial cancer (EC) have poorly been reported. Methods: In this study, we retrieved the “Glycolysis” gene list from Molecular Signatures Database (MSigDB) and screened prognostic glycolysis-related lncRNA using The Cancer Genome Atlas (TCGA) Uterine Corpus Endometrial Carcinoma (UCEC) RNA-seq dataset. Then, TCGA UCEC patients were randomly divided. Lasso algorithm and multivariate cox regression analyses were then performed to further select hub prognostic lncRNA and to develop a prognostic signature. The efficacy of the signature was also evaluated in the TCGA EC cohort. Moreover, we constructed a nomogram to predict EC patient outcomes. Results: Univariate cox analysis identified thirty-six glycolysis-related lncRNA correlated with EC patient prognosis. Among them, five lncRNA were further selected as hub lncRNA that mostly relate to EC patient outcomes, which are AL121906.2, BOLA3-AS1, LINC01833, AC016405.3, and RAB11B-AS1. A prognostic signature was then built based on the expression and coefficiency of five lncRNA. The efficacy of the signature was validated in part of and the entire TCGA EC cohort. In addition, the risk signature could precisely distinguish high- and low-risk EC patients and predict patient outcomes. The nomogram exhibited absolute concordance between the predictions and actual survival observations. Conclusions: The glycolysis-related lncRNA signature model and the nomogram may provide a new perspective for EC patients outcome prediction in clinical use.


Introduction
Malignancies are not only genetic diseases but also metabolic diseases. Since the "Warburg effect" was first uncovered by OttoWarburg, which is a remarkable phenomenon explicitly elucidating the transition of glycometabolism from oxidative phosphorylation to anaerobic glycolysis in cancer cells, our understanding of why tumors develop metabolic phenotypes that differ from adjacent, nonmalignant tissues have significantly been improved [1][2][3].
The reprogramming of cell metabolism is a hallmark of various cancer types, including endometrial cancer [3]. Endometrial cancer (EC) ranks the third most common female reproductive malignant tumors, leading to

Ivyspring
International Publisher approximately 90,000 global deaths each year. Metabolic disorders, including glucose and lipid metabolism, are high-risk factors for endometrial cancer. It is reported that overweight or obese women, especially those with diabetes or high blood cholesterol level, have a remarkably increased risk of developing endometrial cancer than common individuals [4]. At the genetic level, overexpression of the glucose transporter GLUT6 in EC was highly correlated with the malignant cell phenotype and survival [5]. Besides, Mori Y et al. reported that ALDH-mediated activation of glycolysis promoted the paclitaxel resistance in endometrial cancer [6]. These studies suggest that metabolic changes, especially glucose metabolism, may promote the initiation and progression of endometrial cancer. Long noncoding RNA (lncRNA) is a kind of conservative non-coding RNA with over 200 nucleotides in length that does not have any protein-coding ability. Aberrant expression of lncRNA plays a pivotal role in regulating multiple and complex biological processes, including cell proliferation, metabolism, and differentiation in many cancer types [7], Chen et al reported that HIF-1α-stabilizing long noncoding RNA (HISLA) that transmitted from tumor-associated macrophages (TAMs) extracellular vesicle (EV) could effectively promote the aerobic glycolysis and apoptotic resistance of breast cancer cell [8]. Likewise, Lin found that LINK-A (long intergenic non-coding RNA for kinase activation) activated normoxic HIF1α signaling, promoting the glycolysis reprogramming and tumorigenesis in triple-negative breast cancer [9]. However, lncRNA involved in metabolism reprogram in endometrial cancer has yet not fully elucidated. Liu et al. recently performed GSEA analysis of dysregulated genes in the TCGA EC cohort and results showed that the "Glycolysis" gene set was highly enriched [10]. Therefore, in this study, we focused on the relationship between glycolysis-related lncRNA and endometrial cancer treatment. We performed cox and lasso regression analysis to select prognostic glycolysis-related lncRNA and subsequently constructed a prognostic risk model and a nomogram for TCGA Uterine Corpus Endometrial Carcinoma (UCEC) patients' OS evaluation. This study may open up new ideas for the treatment of EC.

EC patients' gene expression and clinical data collection
541 EC patients' RNA-seq data (the FPKM format) and corresponding clinical information including survival time and status, patient age, clinical stage, tumor grade, and histology, and lymph nodes status were downloaded from The Cancer Genome Atlas (TCGA) database (https:// cancergenome.nih.gov/) and the cBio Cancer Genomics Portal (cbioPortal, http://cbioportal.org), respectively [11].

Glycolysis-related genes and lncRNA selection
Glycolysis-related genes were retrieved from the gene set "HALLMARK_GLYCOLYSIS" in Molecular Signatures Database (MSigDB). The relationship was calculated based on the expression value between lncRNA and glycolysis-related genes. lncRNA with Spearman's correlation coefficient with an absolute value of >0.4 and p < 0.001 were set for further analysis.

Identification of the prognosis associated glycolysis-related lncRNA in TCGA-UCEC patients
Univariate Cox regression analysis was performed to identify prognostic associated glycolysis-related lncRNA. Hazard Ration (HR) < 1 means better overall survival outcomes (OS) whereas HR > 1 presents worse OS. Genes with P < 0.05 were considered as independent prognostic associated glycolysis-related lncRNA and used to construct the lncRNA risk score. Besides, the expression level of prognostic associated glycolysis-related lncRNA from each patient and between cancerous and normal samples was displayed via "pheatmap" and "ggplot" R package, respectively.

Construction of lasso Cox regression model for key lncRNA related to the prognosis of EC
A total of 541 TCGA-UCEC patients were randomly divided into the training and the testing cohort. The Least Absolute Shrinkage and Selection Operator (LASSO) analysis is preferred to select a small number of features from a large number of candidates with a certain lambda parameter. We selected hub prognostic associated glycolysis-related lncRNA by performing LASSO and multivariate Cox regression analysis via the "glmnet" R package. The risk score for each EC patient was computed as follows: Risk score = � (Ei × Wi) =0 where N is the number of prognostic lncRNA, E i is the expression value of lncRNA i , and W i is the multivariate coefficient for lncRNA i . Patient survival status, death time, and lncRNA expression condition were unfolded via the "pheatmap" and "survival" R packages. In addition, Kaplan-Meier curve analysis, the time-dependent receiver operating characteristic (ROC) curve as well as the area under curve (AUC) analyses were used to evaluate the sensitivity and specificity of the lncRNA risk score for survival prediction.

Validation of the predictive efficacy of the prognostic lncRNA risk signature
The predictive efficacy of the risk signature was measured in the testing and the entire cohort. The risk score of patients in each cohort was calculated and ranked. The discrepancy of the different subgroups was then displayed regarding patients' survival status and survival time, as well as lncRNA expression. KM curve and ROC curve analysis were performed as well. Besides, we conducted principal component analysis (PCA) regarding the risk signature and other gene profiles to measure its classifying efficacy.
Total RNA extraction of clinical tissues and quantitative real-time PCR (qRT-PCR) analysis.
Ten EC tissues and 10 normal endometrial tissues were obtained from patients at the Department of Gynecology, Nanjing Drum Tower Hospital. Normal tissues were obtained from individuals who underwent a hysterectomy due to endometrialirrelevant diseases. All samples were immediately snap-frozen in liquid nitrogen and stored at -80°C until further analysis. Each individual provided informed consent, and this study was approved by the Ethics Committee of Nanjing Drum Tower Hospital. Total RNA of tissues was extracted using TRIzol reagent (Vazyme, Nanjing, China) and TaqMan Reverse Transcription kit (Applied Biosystems) with random hexamer primers was used to reverse-transcribe cDNAs corresponding to the mRNAs of interest. 2×SYBR Green qPCR Master Mix (Selleck, Shanghai, China) was used for qRT-PCR and the housekeeping gene GAPDH was used for normalization of the data before calculation using the ΔΔCt method. The primers used are listed in Table S1.

Evaluation of the clinical characteristics of lncRNA risk signature
TCGA-UCEC patients who lacked any detailed clinical index including patient age, clinical stage, tumor grade and histology, and lymph nodes status were removed, and the clinical and gene expression data of the remaining low-and high-risk subgroup patients were compared. Uni-and multivariate Cox regression analysis and ROC curve analysis regarding clinical indexes and risk scores were then performed to evaluate the independency and the predictive efficacy of the risk model. Besides, the clinical characteristics of hub lncRNAs of the risk signature were also measured.

Comprehensive clinical nomogram building
In the light of patients' risk scores and clinical features including age, grade, weight, histology, stage as well as lymph node status, we built a comprehensive prognostic nomogram to estimate EC patients' survival probability based on the TCGA entire set via the "rms" R package.

Identification of a list of prognostic associated glycolysis-related lncRNA
The detailed flow chart for the prognostic predictive model construction in this study was shown in Figure 1. From the gene set "HALLMARK_GLYCOLYSIS" in MSigDB, we extracted 200 genes involved in glycolysis and gluconeogenesis. Then, according to the correlation efficiency and probability cut-off value, a total of 522 lncRNAs were considered as glycolysis-related lncRNA. Univariate Cox regression analysis further identified 36 lncRNA significantly correlated to EC patients' OS ( Figure 2B and Table 1). The expression profile of 36 prognostic associated glycolysis-related lncRNA was presented in the heatmap and box plot ( Figure 2C-D).
Then, each patients' risk score was calculated and ranked. Patients were then divided into high-risk (n=136) and low-risk (n=136) subgroups based on the mean risk score. In addition, each individual's risk score distribution and survival status were also ranked ( Figure 5A-B). Clearly, patients in the high-risk subgroup were accompanied by more death events and Kaplan-Meier (KM) curve analysis confirmed this result since a significant discrepant OS between both subgroups was observed. The high-risk group showed worse outcomes in comparison with the low-risk subgroup (P =1.49e-05) ( Figure 5D). The area under the ROC curve (AUC) of the risk signature in the training cohort was also calculated and the result was 0.775 ( Figure 5E). Interestingly, we also observed significant overexpression of AL121906.2, BOLA3-AS1, LINC01833, AC016405.3, and downregulation of RAB11B-AS1 in the high-risk subgroup ( Figure 5C).

Validation of the efficacy of the 5 lncRNA prognostic model
The five-lncRNA prognostic model was then brought into the testing and the entire cohort and patients' risk score was calculated based on the formula. Similarly, according to the cut-off value (training cohort's median risk score), each patient was then ranked and categorized into high-risk (n=158, n=294) and low-risk (n=111, n=247) subgroups in the testing cohort and entire cohort, respectively. The efficacy of the model was perfectly validated in the testing and entire cohort since the survival condition and hub lncRNAs expression level in both subgroups showed significant divergence ( Figure 6A-C, Figure  7A-C). Survival status distribution and KM analysis showed higher 5-year survival rates in low-risk group patients compared with the high-risk group (P =3.954e-02 and 7.928e-06, respectively) ( Figure 6D and 7D). ROC curve analysis showed the AUC of the risk signature was 0.78 and 0.767, respectively ( Figure 6E and 7E). PCA analysis displayed a much better-classifying capability of risk signature ( Figure  8D) in comparison with all genes ( Figure 8A), glycolysis-related genes ( Figure  8B), and glycolysis-relate-gene associated lncRNA ( Figure 8C).

Validation of the expression levels of the 5 lncRNA in clinical samples
The expression signatures of the 5 lncRNA were then investigated in 10 cancerous and normal endometrial clinical specimens. The results showed that AL121906.2, BOLA3-AS1, LINC01833, and AC016405.3 gene levels were upregulated in cancerous tissues, while RAB11B-AS1 was downregulated, which was consistent with the above findings ( Figure 9).

The clinical independence and correlation estimation of the risk signature
After removing 110 TCGA-UCEC patients who lacked a detailed clinical index, we retained 431 patients' gene expression signature and clinical information (Table S2). Then, we integrated the risk model with several clinical factors including weight, stage, histology, and lymph node status, subsequently performing uni-and multivariate analysis to assess the independence of the risk model. Both the uni-and multivariate analysis results presented the model serves as an independent prognostic indicator (P<0.001 and =0.003, respectively) ( Figure 10A-B). The AUC value of the prognostic model was 0.751, significantly more precise than clinical index including age (0.535) and weight (0.633), stage (0.710), grade (0.656) and histology (0.522), as well as lymph node status (0.697) ( Figure 10C). The clinical features and five lncRNA expression profiles of both risk subgroups' EC patients were combined and displayed in the heatmap ( Figure 10D). The distribution of the clinical features was in high concordance with the risk signature. High-risk subgroup patients were more prone to live with older age, advanced stage, poor differentiation, serous tumor, and a larger amount of metastatic lymph nodes ( Figure 10D-E). The correlation between each lncRNA from the prognostic model and the patients' clinical features were also measured. All five lncRNAs were shown to be significantly associated with patients' clinical stage, tumor grade, and lymph node numbers ( Figure  11A-C). Each lncRNA's survival curve was also drawn and presented in Figure 12.

Comprehensive nomogram building and evaluation
According to the comprehensive landscape of the integrated patients' risk scores and clinical factors, we built a nomogram predicting EC patients' 5-year survival probability. Seven prognostic parameters, including the lncRNA risk signature and age, grade, weight, histology, stage as well as positive lymph node numbers, were fitted into the nomogram ( Figure  13A). Calibration plots demonstrated a high degree of consistency between the actual observation and nomogram forecast in terms of the 3-and 5-year survival rates ( Figure 13B-C).

Discussion
lncRNA has been reported to play a crucial role in various cancer development and progression [12].
Recently, many studies have focused on the value of lncRNA as minimally invasive biomarkers for diagnosis, prognosis, or monitoring curative effects [13]. In addition, the lncRNA-based prognostic model in predicting cancer patients' outcomes has also been widely performed, including hepatocellular carcinoma(HCC), neuroblastoma, and clear cell renal cell carcinoma(ccRCC) [14][15][16]  In this study, we analyzed the gene profile of TCGA UCEC patients and identified 36 glycolysis-related lncRNA associated with EC patients' prognosis. Through LASSO and multivariate cox regression analysis, we further screened out 4 onco-lncRNA and 1 antionco-lncRNA that significantly associated with the survival and other clinical characteristics of EC patients. Among the 5 metabolic lncRNA, RAB11B-AS1 and AC016405.3 have already been reported participating in tumor development and progression [17][18][19][20]. Following our findings that lncRNA RAB11B-AS1 was negatively correlated to the malignant phenotypes in EC, Chen et al. reported the expression of lncRNA RAB11B-AS1 was significantly down-regulated in osteosarcoma. RAB11B-AS1 suppressed the expression of RAB11B, induced the inhibiting effect in cancer cell proliferation [17]. However, Liu et al. found that overexpression of RAB11B-AS1 was significantly associated with a poorer overall survival rate in lung cancer [19]. Likewise, Niu et al. found that hypoxia-induced lncRNA RAB11B-AS1 overexpression could significantly promote the angiogenesis and distant metastasis of breast cancer [18]. As for AC016405.3. Ren et al reported that it was downregulated in glioblastoma tissue. AC016405.3 modulated TET2 expression by sponging of miR-19a-5p, suppressing cell proliferation, and metastasis in glioblastoma [20]. Speak of the heterogeneity of malignancies, more researches are needed to explore the definite mechanisms of these lncRNA in endometrial cancer despite the discrepant results in other cancer types. Note that there has been a rare report on BOLA3-AS1, LINC01833, and AL121906.2 in EC to date. BOLA3-AS1 is the divergent transcript of BOLA3, firstly identified by Fagerberg and his colleagues [21]. It was located in the chr2:74147981-74152389, with 4409 bp total size. LINC01833, also names RP11-89, was identified aberrantly expressed in lung adenocarcinoma and closely related to the Wnt pathway [22]. Therefore, exploring their roles in tumorigenesis may contribute to demonstrating their oncogenic or suppressor function in EC patients.
Note that, the identified 5 glycolysis-related lncRNA successfully divided each of the three independent cohorts into the high-risk and low-risk subgroups with significantly different survival outcomes. Besides, the prognostic role of the five-lncRNA signature is also independent with other well-established clinical risk factors. Last but not least, the comprehensive nomogram that combined the clinical features with the lncRNA risk signature exhibited a precise prognosis calculation model and robust predictive efficacy. As no comprehensive analysis to explore lncRNA profiling has been performed in EC so far, our results strongly highlight the use of this five-lncRNA model as a clinical biomarker for risk stratification and guidance for therapy.
Our research also has certain insufficiencies. First of all, the 5 glycolysis-related lncRNA signature was only constructed and validated in the TCGA UCEC dataset, the robustness of this risk signature and the nomogram upon prognostic prediction need to be further verified in large prospective clinical trials. Secondly, more evidence is needed for demonstrating the deep relationship between EC prognosis and the five lncRNA signatures since rare experimental data are available on these lncRNA. More basic researches should be performed to investigate the potential biological mechanism in EC. Despite the above limitations, our findings still presented a consistent and significant correlation between the risk signature and TCGA UCEC prognosis in both datasets, providing a high level of confidence regarding this signature and nomogram.

Conclusion
In conclusion, we identified 36 prognostic glycolysis-related lncRNA and constructed a 5-lncRNA risk signature in the TCGA EC cohort. The signature was validated to predict the outcome of TCGA EC patients. Combined with the risk model with other clinical features, the comprehensive nomogram effectively predicted the 5-year survival status of EC patients. These results might offer a new perspective for EC research and individual treatment in clinical practice.

Author Contributions
Yuan Jiang, Rong Li and Jie Chen designed the project. Xianghong Zhu and Jingxian Ling contributed to data analysis and prepared the main manuscript. Pinping Jiang, Huaijun Zhou and Xiaoqiu Tang revised and submitted the manuscript. All authors reviewed the manuscript.

Data availability statement
The expression data were retrieved from the TCGA database and the clinical information was downloaded from the cBioPortal website. Please contact the author for data and materials requests.