Development and validation of a robust multigene signature as an aid to predict early relapse in stage I-III clear cell and papillary renal cell cancer

Background and objectives: Multi-gene signature can be used as prognostic indicator in many types of cancer, but the association with early-relapse in patients with stage I-III clear cell and papillary renal cell cancer (RCC) is unknown. We aim to establish a mRNAs signature for improving prediction of early-relapse in patients with stage I-III clear cell and papillary RCC. Methods: The data of 610 patients with stage I-III RCC from The Cancer Genome Atlas (TCGA) and 270 patients from Fudan University Shanghai Cancer Center (FUSCC) were extracted. Propensity score matching analysis, linear models for microarray data VOOM method, least absolute shrinkage and selection operation Cox regression modeling analysis was conducted in turn for selecting multi-mRNA signature. Survival differences were assessed by Kaplan-Meier estimate and compared using log-rank test. Multivariable Cox regression and time-dependent receiver operating characteristic curves were used to evaluate the association of mRNAs signature with relapse-free survival (RFS). Results: Seventeen mRNAs were identified to constitute the early-relapse signature. Among patients with stage I-III RCC, those with high-risk score calculated from 17 mRNAs signature showed shorter RFS than those with low-risk score, both in TCGA discovery and internal validation sets, and in FUSCC discovery and internal validation sets (all p < 0.05). In multivariable Cox regression analysis, the 17 mRNAs signature remained an independent prognostic factor both in TCGA discovery (HR 2.43, 95%CI 1.98-2.96) and internal validation sets (HR 1.66, 95%CI 1.19-2.30), and FUSCC discovery (HR 1.28, 95%CI 1.13-1.43) and internal validation sets (HR 1.65, 95%CI 1.11-2.48). Additionally, the 17 mRNAs signature achieved a higher accuracy for RFS estimation beyond clinical indicator. Conclusion: The 17 mRNAs signature could classify stage I-III RCC patients into low- or high-risk of early-relapse, and will help to guide interventions to optimize survival outcomes.


Introduction
Renal cell cancer (RCC) is one of the worldwide common carcinomas, with approximately 403,262 new cases and 175,098 deaths expected in 2018 [1] . The overall prognosis of RCC could be improved from the Ivyspring International Publisher implement of curative resection, which is the benchmark for the treatment of RCC. However, approximately 20-30% of all patients treated with adequate surgical excision subsequently experience recurrence or metastases during follow-up [2,3] . The relapse of RCC is time-related, of which the greatest recurrence risk is in the first 5 years after surgery and only 10% of recurrences occur after 5 years from nephrectomy [4][5][6] . Early relapse in RCC is related to more symptoms at presentation, larger tumor size, and aggressive histology and pathological stage [7,8] , and naturally patients developed early relapse consistently tended to have poorer over survival than those with late recurrence 5 years after nephrectomy [8] .
Consequently, more valuable predictive factors are urgently needed to distinguish patients with early post-operative relapse.
Current clinical tools to stratify patients with RCC are limited to a set of clinical and pathologic variables (such as the TNM staging system), which are unable to reflect the biological heterogeneity of cancer [9] . As reason described above, prognosis even varies significantly in RCC patients with comparable clinicopathological characteristics and same tumor TNM stage. Despite researchers are exploring extensively the potential indicator or biomarker for predicting early relapse in RCC patients [10][11][12] , none of gene-based prognostic classifiers for predicting early relapse of RCC have been established. Although studies in clear cell RCC demonstrated that gene signature has better ability to both reflect heterogeneity of cancer and then accurately predict cancer prognosis [13][14][15] . These studies were limited to overall survival (OS)-related genes in clear cell RCC, and few precious gene profiling has been applied to detect the early relapse-associated multigene signature in both clear cell and papillary RCC. Because OS stands for all-cause mortality, there is a critical need for improved prognostic discrimination in RCC patients given the increasing awareness that some patients may be managed with active surveillance, while others with high-risk of early-relapse might benefit from adjuvant therapy following surgery. Therefore, exploring a novel gene signature to identify early relapse in clear cell and papillary RCC patients might be of concrete predictive value.
In this study, we adopted previously published gene expression data from The Cancer Genome Atlas (TCGA) project and conducted mRNA profiling on large cohorts of RCC patients. Using the sample splitting method and Cox regression analysis, a prognostic 17-mRNA signature was identified from the discovery set and validated in the internal validation series and the external cohorts, and could provide additional prognostic information beyond standard clinical parameters, offering a new approach for risk stratification. This 17-mRNA signature could help distinguish the subset of stage I-III clear cell and papillary RCC patients at high risk of early relapse, who should be managed with extensive postoperative treatment and surveillance.

Patient cohorts
For the discovery set and internal validation set, a total of 610 stage I-III RCC patients were obtained from TCGA database with available RNA sequencing data and clinical annotation. For the external validation set, pathologically diagnosed and RNAlater Stabilization Solution-stored tissue samples of 270 patients with stage I-III RCC were obtained from Fudan University Shanghai Cancer Center (FUSCC). The clinical characteristics of patients from FUSCC dataset were summarized in Table 1. This study was approved by the Ethical Committee of FUSCC, and written informed consent was obtained from all patients.

Developing early relapse associated gene signature
Early relapse was defined as the locoregional recurrence or distant metastasis within 2 years after surgery. Samples in the discovery set from TCGA were selected and divided into early relapse group and long-term survival group (no relapse after a minimum of 5 years follow-up). Propensity score (PS) matching analysis was performed between the two groups to adjust for stage and histological type, which were the most significant clinical factors associated with early relapse. After PS matching, 26 paired patients were finally selected to detect the changes of global gene expression profile between early relapse and long-term survival groups (Table 2). Next, using the linear models for microarray data (LIMMA) VOOM method for identification of differentially expressed genes (DEGs) with the threshold set as P < 0.05 and fold change ≥ 2.5, we found that 91 genes were differentially expressed between early relapse and long-term survival samples (Fig. 1A). Using LASSO Cox regression model [16] , the coefficient profiles of the 91 genes were obtained and shown in Figure 1B and then 17 mRNAs were picked out to construct the 17 mRNAs-based signature. Finally, we derived a formula to calculate the risk score for predicting the early relapse based on the individual expression of the 17 mRNAs weighted by the regression coefficient in the discovery set as follows: 17   To assess the 17 mRNA signature using quantitative reverse transcription polymerase chain reaction (RT-PCR) analysis, we recalculated the regression coefficients of the 17 mRNAs based on univariable Cox regression analysis from quantitative RT-PCR expression data in FUSCC population. The primers for RT-PCR were summarized in Supplemental Table 1  In the two formulas, same mRNA has consistent risk prediction directions, suggesting that the classifier could apply to both RNA-seq and RT-PCR data.

Statistical analysis
With risk score formula, patients from different sets were divided into high-risk and low-risk groups using the median risk score as the cutoff point. The difference between two groups was compared using x 2 test or Fisher's exact test for categorical variables and t test for numerical variables. Survival differences between the low-risk and high-risk groups in each set were assessed by the Kaplan-Meier estimate and compared using the log rank test. Multivariate Cox regression analysis and data stratification analysis were performed to test the independent prognostic role of risk score in predicting RFS. Time-dependent receiver-operating characteristic (ROC) analysis was used to investigate the predictive accuracy of each feature and multi-gene signature. All statistical analyses were performed with use of R (version 2.15.0, www.r-project.org). All statistical tests were two-sided, and P values < 0.05 were considered statistically significant.

Results
Among 610 patients from TCGA dataset, 428 and 182 patients were randomly assigned into the discovery set and internal validation set, respectively. In FUSCC population, 180 and 90 patients were respectively assigned into the training and validation sets.
In the discovery set from TCGA dataset, patients were divided into low risk group (n=214) and high risk group (n=214) using the median risk score (1.611) as cutoff point. As shown in the left panel of Figure  2A, the distribution of risk scores and survival status suggested that patients with higher risk scores tended to have earlier relapse than those with lower risk scores. Using time-dependent ROC analysis, the prognostic accuracy of the 17 mRNA signature for RFS at 2, 5, 7 years were respectively calculated and confirmed (AUC = 0.847, 0.862 and 0.905, respectively; Fig. 2A Fig. 2A, right panel). We furtherly applied the same analyses, same formula and cutoff point in the TCGA internal validation set and entire set, and obtained consistent results (Fig. 2B-C).
Moreover, multivariate analyses showed that the 17 mRNA signature remained a powerfully and independently prognostic factor for RFS in the discovery set [ (Table 3). Importantly, stratified analyses also suggested that the 17 mRNA classifier was still a clinically and statistically significant prognostic indicator for RFS in subset of patients with stage II (p < 0.001; Fig. 3A) and stage III (p < 0.001; Fig. 3A), patients with clear cell carcinoma  (p = 0.001; Fig. 3B) and papillary carcinoma (p = 0.002; Fig. 3B), patients with grade I -II (p < 0.001; Fig. 3C) and grade III -IV (p < 0.001; Fig. 3C). These evidences demonstrated that our 17 mRNA signature could screened out high risk patients from those with better clinic-prognostic factors (e.g. early stage, clear cell cancer and low grade) and low risk patients from those with poor clinicopathological variables (e.g. advanced stage, papillary cell cancer and high grade), and ultimately optimize the risk prediction of patients' early relapse and survival in clinical practice.    To further assess the robustness of the signature, we determined the 17-mRNA classifier by RT-PCR analysis in the FUSCC population. Due to the difference of RT-PCR quantification and RNA-seq technique, a new formula for the RT-PCR data were development in the training set and validated in the internal validation set from FUSCC population by the same method used in the discovery set from TCGA dataset. With this risk score formula, patients were stratified into high-or low-risk groups with a median risk score of 0.354 as cutoff point. Both in the training set and the internal validation set, patients with high-risk scores generally tended to have earlier relapse and worse RFS than those with low-risk scores (Fig. 5), and the prognostic accuracy of the 17 mRNA signature for RFS were furtherly confirmed by time-dependent ROC analysis (Fig. 5). In the univariable and multivariable Cox regression analyses, the 17 mRNAs signature was still an independent prognostic factor for RFS in FUSCC cohort (Table 4). Similarly, we found that the 17-mRNA classifier still showed superiority in predicting RFS compared with the existing clinic-prognostic factors both in patients with clear cell cancer and papillary cell cancer, and in entire patients (Fig. 6).   Finally, gene set enrichment analysis (GSEA) was performed to identify the 17-mRNA signature associated biological function and signal pathway. The risk score was accompanied with exceptional regulation of several important cancer-related networks, namely P53 signaling pathway, cell cycle, citrate cycle (TCA cycle), fatty acid metabolism, PPAR signaling pathway. The biological function of these 17 mRNAs in RCC should be investigated in further experimental studies.

Discussion
Postoperative relapse in localized RCC patients still occurs even after complete surgical resection and is closely associated with survival outcomes [17] . However, early and late relapse after surgery cannot be distinguished by TNM staging system which mainly depends on anatomical information instead of biological characteristics. The large variation in the relapse and prognosis of localized RCC patients with same clinicopathological features is attributed to the biological heterogeneity of cancer [10][11][12] . RCC patients with early relapse suffer from significantly poor OS rates comparing to those with late relapse [18] . Novel prognostic biomarkers for the detection of early postoperative relapse would make up for the deficiency of TNM staging system, and thereby assisting physicians in formulating more efficient therapeutic strategies at an earlier stage of patients' treatment [18][19][20] . In this study, we developed and validated a novel gene signature based on 17 mRNAs to improve the prediction of early relapse and relapse-free survival (RFS) after surgery for stage I-III RCC patients. This 17 mRNA signature was independent of known clinical predictors, suggesting that this established predictor adds additional prognostic information beyond currently available tumor characteristics.
Previous studies have tried to identify biomarkers for detection of early relapse in RCC patients. In 2012, Slaby et al [12] found that the expression levels of miR-145 and miR-126 were significantly associated with early relapse and survival in RCC patients. Additionally, it is also suggested in 2017 that CD8 + PD-1 + Tim-3 + Lag-3 + tumor-infiltrating lymphocytes, ICOS + tumor-infiltrating Treg cells may be as significant factors for postoperative early relapse in localized RCC patients [10] . Moreover, the recurrence score based on 16 genes was found to be a more accurate and individual predictor of clinical outcome in stage I-III clear cell RCC patients. However, these works have not focused on the postoperative early relapse. Little is known about mRNA expression penal and its involvement in the prediction of early relapse in stage I-III RCC patients using high-throughput expression profile datasets.
Importantly, we detected that RFS in patients with high risk of early relapse calculated using 17 mRNA signature were significantly worse than those with low risk of early relapse in the TCGA discovery set. It was also validated both in the internal validation series of TCGA dataset and in the independent set from FUSCC population, indicating that the good reproducibility of this 17 mRNA signature in RCC patients. Meanwhile, patients with same TNM stage (stage II or stage III), same pathological type (clear cell or papillary cell cancer), or same grade (grade I-II or grade III-IV) could be stratified into different risk groups based on the 17 mRNA classifier, which could lead to more personalized treatment for RCC patients to improve clinical outcomes. This findings implied that the 17 mRNA signature could be used to optimize the current risk stratification (e.g. TNM stage), and patients with high risk of early relapse might be benefit from more aggressive treatments [20] .
Several other groups have only focused on classifiers related with recurrence and death in clear cell RCC. Brooks and colleagues [13] found that a classifier, ClearCode34, demonstrated improved prognostic performance over baseline nomograms and a c-indices of 0.65-0.70. Another classifier, a 16-gene assay, was found to be independently associated with cancer recurrence with c-index of 0.81 [14] . In contrast to these signature development studies, the current study mainly focused on the early relapse both in clear cell and papillary RCC. And we found that our 17 mRNA signature demonstrated a AUC of 0.825 in TCGA and 0.880 in FUSCC cohorts in predicting early relapse at 2 years after complete resection of RCC. Even integrating this 17 mRNA signature with clinic-prognostic factors has the best prognostic accuracy (AUC = 0.877 and 0.940 in TCGA and FUSCC datasets, respectively) in our study. Therefore, this 17 mRNA signature developed in our study, which could help distinguish RCC patients with high risk of early relapse and then guide personalized management, is credible to be applied to clinic.
Previous studies have demonstrated that the most aggressive RCC are characterized by reduced angiogenic dependence [21,22] , deteriorative immune and inflammatory responses [23,24] , deregulated glycolysis and tricarboxylic acid cycle [25] , and increased cell proliferation [26] . Similarly, genes identified in our 17 mRNA signature are involved in biological pathways known to be important to the biology of RCC, namely P53 signaling pathway, cell cycle, citrate cycle (TCA cycle), fatty acid metabolism, PPAR signaling pathway. Some of the known biomarkers (e.g. von hippel lindau, hypoxia-inducible factor, MET, and PDL1) in RCC were not significantly correlated with early relapse at the RNA level, because it is known that changes in DNA are not always reflected by differential RNA expression and that mutations related with tumor are not necessarily associated with clinical outcome [27] . Thus, it is a plausible explanation for the association of the 17 mRNA signature with early relapse of RCC patients.
The heterogeneity of cancer has recently been discussed as a potential challenge in the use of genomic-based prognostic and predictive markers. Previous sequencing data have detected that the degree of tumor heterogeneity in RCC is substantial [28] . Genetic profiling of nine areas in the primary tumor and three metastatic sites in one individual patient indicated that 23% of identified mutations were restricted to that patient and not prevalent in RCC tumors in general. However, this result implies that approximately 77% of somatic mutations were common to other RCC. The consistent performance of early relapse score calculated by the selected genes across large cohorts and relevant subgroups supported low intra-tumor variability in the 17 assessed genes. Similar to ubiquitous somatic mutations in the study by Gerlinger et al [28] , the selected genes may represent early genetic changes in tumor development.
Limitations of our study warrant further discussion. Firstly, our research was based on the data from publicly available datasets, additional sets of independent samples from clinical trials are needed to prospectively confirm our findings. Meanwhile, several other important clinicopathological characteristics (e.g. SSIGN score) were not available in sets of the current study, and further analysis stratified by these features are necessary in the future research. Moreover, the mechanism behind the identified 17 mRNA signature on the early relapse in RCC is unclear, and further studies of these genes may provide more clues that leads to a better understanding of the early relapse and progression in RCC patients. We acknowledge that prospective, large-scale, multicentre studies are necessary to confirm our results before this 17 mRNA signature can be really applied in the clinic.
To the best of our knowledge, we firstly developed a robust mRNA signature that can effectively classify stage I-III RCC patients into groups with low and high risks of postoperative early relapse. Therefore, the 17 mRNA classifier, which can be combined with clinicopathological parameters, allows for risk assessment of early relapse in RCC patients and guides future clinical planning regarding patients' treatment and surveillance.