A novel gene signature combination improves the prediction of overall survival in urinary bladder cancer

Objectives: Bladder carcinoma is a clinical heterogeneous disease, which is with significant variability of the prognosis and high risk of death. This revealed prominently the need to identify high-efficiency cancer characteristics to predict clinical prognosis. Methods: Gene expression profiles of 93 bladder tumor patients from Gene Expression Omnibus data sets was performed in this study, along with 408 bladder tumor patients retrieved from The Cancer Genome Atlas database. The relationship of gene signature and overall survival was analyzed in the training cohort (n = 46). The validation for that was performed in an internal validation cohort (n = 47) and an external validation cohort (n = 408). Results: Four genes (TMPRSS11E, SCEL, KRT78, TMEM185A) were identified by univariable and multivariable Cox regression analysis. According to a risk score on the bases on the four-gene signature, we grouped these patients in high-risk group and low-risk group with significantly different overall survival in the training series and successfully validated it in both the internal and external validation cohorts. Subsequent studies demonstrated that the four-gene expression risk score was independent of radical cystectomy stage, chemotherapy and lymph node status. Higher rates of FAT4 mutation and MACF1 mutation in bladder tumors with high risk score were found compared with tumors with low risk score. Gene set enrichment analysis revealed high-risk score was associated with some tumor progression and recurrence associated pathways. Conclusions: This four-gene risk score might have potential clinical implications in the selection of high-risk urinary bladder cancer patients for aggressive therapy. The selected four genes might become potential therapeutic targets and diagnostic markers for urinary bladder cancer.


Introduction
As the seventh most widely diagnosed carcinoma in the male population around the world [1], urinary bladder cancer has become one of the main death causes of Urologic cancer. And in 2017, approximately 81,190 new cancer cases and 17,240 cancer deaths related to urinary bladder cancer are estimated to occur in United States [2]. Muscle-invasive urinary bladder cancer (MIBC) makes up for about 25% of initially diagnosed bladder carcinoma cases, whereas up to 10% to 15% of patients with non-muscle-invasive urinary bladder cancer (NMIBC) will progress to MIBC [3,4]. More importantly, approximately 5% to 15% of urinary bladder cancer patients are together with metastatic disease at the time of initial diagnosis [5]. Accurate prediction of tumor progression and survival is important to determine the appropriate diagnosis and therapy decision. Thus, there is a clear need for predictive and prognostic markers that can identify tumor characteristics and predict clinical behavior.
Recently, gene signature has been used widely for risk stratification of patients with cancer [6]. Many urinary bladder cancer gene signatures are reported to classify urinary bladder cancer patients into groups Ivyspring International Publisher with distinct clinical outcomes [7][8][9]. However, most current gene signatures for urinary bladder cancer relate to disease diagnosis only, with limited information about the prognostic value estimation, which will limit the clinical application of these signatures.
This current study is based on publicly available data sets, resulting in developing special genetic biomarkers associated with urinary bladder cancer survival closely, and evaluating the predictive value of this gene expression signature among urinary bladder cancer patients.

Gene profiles and microarray data
Raw CEL data and corresponding clinical data of microarray data from GSE31684 data sets [10], which used the Human Genome U133 Plus 2.0 chips platform of Affymetrix, were retrieved from Gene Expression Omnibus (GEO) database and then background adjusted by Robust Multichip Average [11]. Gene profiles and clinical data for bladder tumors were downloaded from The Cancer Genome Atlas (TCGA) data base (March 2018). Patient selection was restricted to the urinary bladder cancer patients with gene expression data and essential clinical data, including clinical outcome and staging information. Patients without clinical survival information were removed from the study. In addition, patients with an overall survival (OS) of less than 1 month were also eliminated owing to possible unrelated causes of death. Finally, 93 patients from GSE31684 series and 408 patients from TCGA database were included in our study. The urinary bladder cancer samples of GSE31684 series were stochastic grouped into a training cohort (n = 46) or an internal validation cohort (n = 47). Additionally, urinary bladder cancer samples in TCGA database were analyzed as an external validation cohort. Mutation annotation format (MAF) of the TCGA cohort was also analyzed by the package of "maftools" to summarize, analyze, annotate and visualize MAF files in an efficient manner [12]. The demographic, clinical and pathological information of bladder cancer patients from the GSE31684 and TCGA cohorts were shown in Table 1. The work has been reported in line with the REMARK criteria.

Construction of risk score
Univariable Cox regression analysis together with a permutation test [13] was used to explore the relationship of gene expression and OS. Genes would be considered with strong correlation with survival if they had a permutation p value less than 0.001 [14], and then analyzed in the training series by multivariable Cox regression analysis. Genes were finally selected if their p values were less than 0.05 in the multivariable Cox analysis. Thus, we could establish the risk score formula by weighting multivariable regression coefficients of each selected gene. Patients were split into high risk group and low risk group using the median risk score of each series as a cutoff point. Diagram of the construction of risk score was shown in Fig. 1. We performed a data stratification analysis and multivariable Cox regression analysis to verify the prediction power of risk score in bladder carcinoma patients. Receiver operating characteristic (ROC) curves and area under the curve (AUC) were calculated to contrast the specificity and sensitivity of OS status predictions at follow-up on the bases of genetic risk score, radical cystectomy (RC) stages and lymph node status.

Gene Set Enrichment analysis
Gene Set Enrichment Analysis (GSEA) was set up in TCGA series using a molecular signatures database (MSigDB) C2 CP: Canonical pathway gene set collection [15]. The GSEA, which was visualized in Enrichment Map software and Cytoscape [16], was applied to determine if the members of a given gene set were strikingly associated with our risk score. We carried out random sample permutations of 1000. The significance threshold was set at false discovery rate (FDR) < 0.01.

Statistical analysis
Kaplan-Meier estimate was used for comparing survival differences between the low-risk group and the high-risk group. Hazard ratio (HR) with a 95% confidence interval (CI) was used in Cox regression analysis. And a P value less than 0.05 was considered significant. All the data were analyzed by R program 3.3.2 (www.rproject.org) and SPSS 13.0 (SPSS Inc., Chicago, IL, USA).

Identification of prognostic genes and risk score formula in the training cohort
Firstly, we discovered a series of twenty-one genes with a parametric P value less than 0.001 by univariable Cox regression analysis. These twenty-one genes were further analyzed by applying multivariable Cox regression analysis in the training cohort. With this method, only four genes with P value less than 0.05 were selected as the predictors ( Table 2). Next, the risk score formula was established on the bases of the 4-gene expressions for OS prediction, as follows: risk score = (1.414*expression level of TMPRSS11E) + (2.471*expression level of SCEL) + (5.305*expression level of KRT78) + (-2.988* expression level of TMEM185A).

The relationship between four-gene signature risk score and OS in the training series
With the risk score formula above, the patients in the training cohort (n = 46) were grouped into a high-risk group (n = 23) or a low-risk group (n = 23) by the median risk score. And patients with high-risk score had remarkably shorter OS than those in the low-risk group (P < 0.0001, Fig. 2a).

Validation of this four-gene signature risk score for OS prediction in the internal and external validation series.
In order to confirm our findings, we repeated the survival analysis in the internal validation cohort (n = 47). Similarly, patients with a high risk score had a notable shorter OS than that in the low-risk group (P = 0.0204, Fig. 2b). We further validated our four-gene signature in the external TCGA validation cohort (n = 408). Specifically, patients could also be segregated into a low-risk group and a high-risk group by the median risk score of TCGA cohort (P = 0.0232, Fig.  2d).

The predictive accuracy of the four-gene signature risk score compared with other clinical factors
The Cox regression analysis revealed that our four-gene risk score was strikingly associated with survival as a continuous variable in the entire GSE31684 cohort and TCGA cohort (Table 3). In the entire GSE31684 dataset, only the risk score, muscle invasion and lymph node status were remarkably associated with OS. Next, we carried out multivariable Cox regression proportional hazards regression analysis to explore whether our four-gene risk score could act as an independent survival predictor. Similar results of Cox regression analysis were also observed in the TCGA cohort. These results demonstrated that our risk score was still significantly associated with the OS when adjusted by age at diagnosis, muscle invasion, lymph node status and chemotherapy, which suggests that the four-gene signature risk score could serve as an independent survival predictor for urinary bladder cancer.

Assessment of the risk prediction model
We next to perform ROC analysis to evaluate the specificity and sensitivity of the OS status prediction at ten-year follow-up by the four-gene signature risk score, RC stage and lymph node status in the entire GSE31684 data set patients. Notably, we found a better AUC of four-gene signature score compared with RC stages, even though without significant difference (0.761 versus 0.618, P = 0.098, Fig. 3). However, our risk score was strikingly superior to lymph node status (0.761 versus 0.542, P = 0.017).

The prognostic risk score is independent of RC stages and postoperative chemotherapy
Additionally, we performed data stratification analysis to find whether the four-gene signature was independent of RC stage and chemotherapy. Our results revealed that this risk score also could divide urinary bladder cancer samples into those with longer survival and those with shorter survival in each RC stage subgroup (Fig. 4). Similarly, among those patients with or without chemotherapy, the four-gene signature risk score could also distinguish between patients with the significantly different OS (Fig. 5).

Discussion
Tumor stages and tumor grades are important predictors of tumor prognosis. Nevertheless, histopathological classifications tend to be limited by observer variability [17,18]. Recently urinary bladder cancer gene expression signatures have been generally applied to predict cancer characteristics and outcomes. For example, Smith SC [8] found a twenty-gene signature with statistically significant correlation with disease development among patients with urinary bladder cancer. Heijden [9] developed a five-gene expression signature to predict progression in T1G3 urinary bladder cancer. However, there are few relevant gene signatures reported to directly predict OS for patients with urinary bladder cancer. Here, we found a four-gene signature risk score remarkably associated with the OS of urinary bladder cancer, which was independent of RC stages and postoperative chemotherapy.
Our study has some novel approaches and findings. Firstly, compared to previous studies related to disease diagnosis only, our study focused on direct overall survival prediction of urinary bladder cancer patients, which were poorly reported before.
Secondly, Cox regression analysis together with a permutation test (P value less than 0.001) and stratification analysis were performed to explore the relationship of gene expression and overall survival of bladder cancer. And thirdly, we firstly identified a set of 4 genes (TMPRSS11E, SCEL, KRT78, TMEM185A) that were significantly associated with poor overall survival of bladder cancer patients, some of which have not been investigated in urinary bladder cancer before. Moreover, we also revealed higher mutation rates of FAT4 and MACF1 in bladder tumors with higher risk score.
These four genes might become the potential therapeutic targets. They are worthy of further investigation to understand their role in the progression of bladder tumor. For example, the expression of TMEM185A (transmembrane protein 185A) was tended to be down-regulated clones among patients with stage III serous ovarian carcinoma [19]. TMPRSS11E (transmembrane protease, serine 11E) could suppress esophageal squamous cell carcinoma development by sensitizing cells to apoptosis under an apoptotic stimulus through downregulating the EGFR/AKT signaling pathway [20], but recently was found significantly upregulated in urinary bladder cancer patients [21], which was consistent with our study outcomes.
Higher rates of FAT4 mutation and MACF1 mutation in bladder tumors with high risk score were found compared with tumors with low risk score. GSEA analysis revealed that high-risk score tended to be accompanied by a number of up-regulated networks including cancer progression and recurrence associated pathways. For instance, TAP63 pathway and IL1 pathway were implicated in cancer progression and recurrence [22,23]. Since tumor progression and recurrence are important risk factors for OS, we next to explore the difference of risk scores of samples with and without progression/recurrence by the TCGA cohort. Notably, patient with recurrence/progression tended to get a risk score higher than those with disease free.
By applying the four-gene risk scores to GSE31684 training cohort, a distinct separation was found in survival curves between patients with high-risk or low-risk signatures. The urinary bladder cancer patients with high-risk gene signatures tended to get shortened overall survival, while those with a low-risk signature tended to possess prolonged survival. The usefulness of the four-gene risk score could be validated in both the internal and external validation cohorts, indicating good reproducibility of our risk score for urinary bladder cancer patients. Subsequent Cox proportional hazards regression analysis demonstrated that our predictive risk score was independent of some other importance prognostic clinical factors, including muscle invasive and lymph node status. Muscle-invasive disease is very important because the treatment and pathogenesis differ from MIBC to NMIBC [1,24]. As a result, it is of great importance to identify whether the prognostic value of this risk score is connected with the strong predictive clinical factor. Using multivariable Cox regression analysis, we identified that the prognostic value of our risk score was independent in MIBC patients.
Subsequently, stratification analysis suggested that the predictive value of the four-gene risk score was independent of RC stage and postoperative chemotherapy. An obvious OS benefit was found in a large retrospective cohort analysis in 3974 patients, with an HR of 0.75 for the high-risk subgroup with adjuvant chemotherapy [25]. In this research, we observed that patients with urinary bladder cancer could be grouped into high-risk groups or low-risk groups by the four-gene risk score despite of chemotherapy stratum. Similarly, among those patients in each RC stage subgroup, the four-gene signature risk score could also distinguish between patients with the significantly different OS. All of this above strongly demonstrated that the four-gene signature risk score might serve as a remarkably prognostic factor for urinary bladder cancer. The TNM staging system, which includes primary tumor stage, lymph node and distant metastasis, is the most accepted to be used to evaluate and predict the risk of tumor development and progression in clinical practice [26]. In this study, the survival predictive value of the four-gene risk score was stronger than that of the RC stage in the ROC analysis(AUC: 0.761 versus 0.618), although without statistical significance (P = 0.098), but was significantly stronger than that of lymph node status (AUC: 0.761 versus 0.542, P = 0.017). These results demonstrated the potential prognostic power of our gene signature risk score in clinical.
Our research has some limitations. Firstly, we have not verified this risk score in a clinical trial. Secondly, the median risk score was used as a cutoff point for classifying as previous reports [14,27], but a prior cutoff point should be found in further studies to more scientifically split patients in high-risk group or low-risk group. Finally, this bioinformatics analysis of the four-gene signature was not carried out in this study, and the biological roles of several genes in this signature were not clear, which should be investigated in further fundamental researches. Fig. 6. Alteration landscape for 204 bladder tumors with high risk score and 204 low-risk bladder tumors in TCGA cohort. Higher rates of FAT4 mutation and MACF1 mutation in bladder tumors with high risk score were found compared with tumors with low risk score.

Conclusion
In summary, we had identified a four-gene signature that was useful in overall survival prediction in urinary bladder cancer patients. The selected four genes might become potential therapeutic targets and diagnostic markers for urinary bladder cancer. Future studies will concentrate on functional approaches of the selected four genes and validation of our risk score in clinical trials.