Prognostic significance of age related genes in patients with lower grade glioma.

Objective: To analyze the prognostic effects of age in different tumor types and determine the prognostic significance of age related genes in patients with lower grade glioma (LGG). Methods: The relationships between age and tumor overall survival were determined by Kaplan-Meier survival analysis using The Cancer Genome Atlas (TCGA) dataset. The age related genes were identified using TCGA RNA-seq data. Univariate and multivariate cox regression were used to determine the prognostic significance of age related genes. The results derived from TCGA dataset were further validated using Gene Expression Omnibus (GEO) and Chinese Glioma Genome Atlas (CGGA) datasets. Results: Age at initial pathologic diagnosis was most associated with the overall survival of LGG patients than other types of tumor patients. Age related genes EMP3, IGFBP2, TIMP1 and SERPINE1 were highly expressed in old LGG patients. The hypo-methylations of EMP3 and SERPINE1 were contributing to the high expressions of EMP3 and SERPINE1 in old LGG patients. Also, EMP3, IGFBP2, TIMP1 and SERPINE1 were highly expressed in LGG tumor tissues, compared with normal brain tissues. Moreover, high expressions of IGFBP2, EMP3, TIMP1 and SERPINE1 were associated with the worse prognosis of LGG patients. Furthermore, we demonstrated that EMP3 and SERPINE1 were connected with each other and the combination of EMP3 and SERPINE1 had better prognostic effects in glioma patients. Conclusions: Age related genes IGFBP2, EMP3, TIMP1 and SERPINE1 have significant prognostic effects in LGG patients.


Introduction
Cancer is an age related disease. With the increasing of age, somatic mutations are accumulating, contributing to the high incidence of cancer in older adults [1][2][3][4]. Moreover, old tumor patients demonstrate different histological characteristics and prognostic outcomes as compared with young tumor patients [5][6][7][8][9][10][11][12]. Those observations highlight the prognostic importance of age of tumor patients at initial pathologic diagnosis. However, the prognostic effects of age across different tumor types and the age related genes in determining the clinical outcomes are unclear.
For the decade, The Cancer Genome Atlas (TCGA) program collects the clinical characteristics and molecular signatures of more than 11,000 human tumor patients across 33 different tumor types [13,14]. Molecular profiles, like gene expression portraits, DNA mutation signatures and DNA methylation Ivyspring International Publisher characteristics are quite associated with the clinical overall survival of tumor patients [15][16][17][18][19]. Similar dataset is established in the Chinese Glioma Genome Atlas (CGGA) to provide massive amounts of data for basic and clinical research of glioma [20][21][22]. With the available TCGA, CGGA and Gene Expression Omnibus (GEO) datasets, here, we systematically define the molecular characteristics of age related genes and reveal their prognostic effects in determining the clinical overall survival of tumor patients, particularly LGG patients.
Glioma is the most common type of brain tumor [23]. According to the World Health Organization classification system, LGG is grade II-III glioma, contrast to the high grade glioblastoma (GBM) (grade IV glioma) [24]. GBM and LGG demonstrate different clinical and molecular characteristics [25,26]. Unlike other tumor types, LGG is more likely developed in younger patients [27]. Indeed, nearly 90% LGG patients are developed under 60 years old, according to the TCGA clinical data. Furthermore, our results show that the initial pathologic diagnostic age is most significantly associated with the tumor incidence and clinical overall survival of LGG patients compared with other types of tumor patients.
Previous results suggested that EMP3 [28][29][30][31][32] and IGFBP2 [33][34][35] were critical biomarkers in glioma prognosis. However, the inner mechanisms of how EMP3 and IGFBP2 involving the glioma overall survival are unclear. In here, we show that both EMP3 and IGFBP2 are age related genes, and highly expressed in old LGG patients. Additionally, we identify two prognostic genes TIMP1 and SERPINE1. Similar to EMP3 and IGFBP2, TIMP1 and SERPINE1 are highly expressed in old LGG patients. And IGFBP2, EMP3, TIMP1 and SERPINE1 are all highly expressed in LGG tumor tissues compared with normal brain tissues. Moreover, higher expressions of IGFBP2, EMP3, TIMP1 and SERPINE1 are associated with worse prognosis of LGG patients. At last, we demonstrate the prognostic effects of combination of EMP3 and SERPINE1 genes in glioma patients.
Taken together, our results suggest that age related genes IGFBP2, EMP3, TIMP1 and SERPINE1 have significant prognostic effects in LGG patients and represent suitable biomarkers for prognostic or therapeutic strategies for LGG patients.

Data collection
The TCGA datasets were downloaded from the TCGA hub (tcga.xenahubs.net), where all TCGA clinical data and molecular data were available. In the TCGA clinicalMatrix dataset, solid primary tumor patients with age at initial pathologic diagnosis and overall survival time were selected for further studies. Tumor types with favorable clinical outcomes, like kidney chromophobe (KICH), thyroid cancer (THCA) were excluded, since most of tumor patients were survived during the following time.
The gene expression series matrix of normal and cancerous brain tissues was downloaded from the Gene Expression Omnibus (GEO) website (www.ncbi.nlm.nih.gov/geo), and included the GEO datasets GSE4920 [36], GSE16011 [37] and GSE50161 [38]. The gene expression and survival data of patients with glioma were downloaded from the GEO dataset GSE43378 [39].
The Chinese Glioma Genome Atlas (CGGA) datasets were downloaded from http://www .cgga.org.cn/index.jsp website, where all CGGA clinical data and gene expression data were available.

TCGA data processing
Gene expression profile between old and young tumor patients of each tumor type was analyzed using TCGA RNA-seq dataset. The different gene expression between old and young tumor patients was determined by Student's t test. The DNA methylation profile of age related genes in glioma patients was analyzed through TCGA Human Methylation450 microarray dataset.

GEO data processing
The GEO expression datasets were processed using R software. The matrix file of each dataset was annotated with corresponding platform. When multiple probes corresponded to the same gene symbol, the expression values were averaged using "plyr" package R software. The different gene expression between normal and glioma samples was determined using Student's t test.

Heatmap presentation
Heatmaps were created by R software "pheatmap" package. The "pheatmap" package and the basic usage were downloaded from bioconductor. The clustering scale was determined by "average" method.

Survival analysis
The overall survival curves were generated from prims5.0. Kaplan-Meier estimator was applied to determine the significance of different factors in determining tumor overall survival. P values were determined using Log-rank test.

The construction of transcriptional network
The transcriptional network of age related genes was created by Cytoscape GeneMANIA App.
Cytoscape is an open source software platform for constructing complex networks and could be download from Cytoscape website.

Univariate and multivariate cox regression
Univariate and multivariate cox regression were analyzed by R software "survival" package. The "survival" package and the basic usage were downloaded from bioconductor. Log-rank test was used to calculate the P values.

Spearman correlation
Spearman correlation was used to study the correlation between EMP3 and SERPINE1 gene expression in glioma patients through the "lm" method in R software.

Statistical analysis
The box plots were generated from prims5.0. Statistical analysis was performed using the Student's t test. P value less than 0.05 was chosen to be significantly different.

Older age is associated with the low overall survival of tumor patients
Age at initial pathologic diagnosis and tumor overall survival were extracted from the clinicalMatrixs which were launched by the TCGA network. Totally, 8,810 tumor patients from 20 tumor types were collected for further studies. The abbreviations of tumor types and the number of tumor patients in each tumor type were illustrated in Table 1. The abbreviation of each tumor type was further used. The detailed clinical parameters of 8,810 tumor patients, including gender, weight, height, pathologic stage were provide in supplementary table.
Collectively, the mean age of those tumor patients was 60.58 years old. Based on the age at initial pathologic diagnosis, tumor patients were divided into old (older than 60 years) and young group (equal or younger than 60 years). There were 4,169 tumor patients in old group and 4,641 tumor patients in young group (Table 1). Although, generally, cancer was an old related disease, we showed that some tumor types were preferentially developed in young adults. Particularly, more than three quarter of LGG and CESC patients were younger than 60 years ( Figure 1A), while, more than three quarter of BLCA and LUSC patients were older than 60 years ( Figure 1A). Next, we tested the associations between age and tumor overall survival. The Kaplan-Meier survival analysis showed that tumor patients in old group had lower overall survival compared with the tumor patients in young group (P<2e-16) ( Figure 1B). The median survival time of young tumor patients was 7.05 years, much longer than the 4.76 years of old tumor patients.

Age is most associated with the overall survival of LGG patients than other types of tumor patients
Next, we studied the association between age and tumor overall survival in each tumor type. The Kaplan-Meier survival analysis demonstrated that age at initial pathologic diagnosis was a critical prognostic factor in 8 out of 21 tumor types, including BLCA, BRCA, CESC, GBM, LGG, KIRC, OV and STAD ( Figure 1C). Particularly, age was most correlated with the overall survival of LGG patients (P=6e-15 and HR=19.8) ( Figure 1C). The median survival time of old LGG patients was 1.99 years, contrast to the 7.88 years in young LGG patients. Although not as significant as LGG, the overall survival of old BCAL, BRCA, CESC, GBM, KIRC, OV and STAD patients were also significantly lower than young BCAL, BRCA, CESC, GBM, KIRC, OV and STAD patients ( Figure 1C). However, age at initial pathologic diagnosis was not a prognostic factor in other tumor types, like COAD, ESCA, LIHC and LUAD ( Figure  1C).
Our results suggested that, in most of tumor types and across tumor types, age at initial pathologic diagnosis was a critical prognostic factor. Particularly, age was most associated with the overall survival of LGG patients than other types of tumor patients.

Identification of age related genes in TCGA dataset
Next, we tried to determine the inner mechanisms of how age influenced the overall survival of tumor patients. Gene expression profiles of old patients in each age related tumor type were analyzed. Compared with the young tumor patients, genes up or down regulated in old tumor patients were identified respectively (fold change >2). We found that 230 genes were differentially expressed in old LGG patients ( Figure 2). However, only 25 genes were differentially expressed in old GBM patients and 11 genes were differentially expressed in old STAD patients ( Figure 2). Compared with young BLCA, BRCA, CESC and OV patients, genes differentially expressed in old BLCA, BRCA, CESC and OV patients were also identified ( Figure 2). Consistent with the previous result that age was most associated with the overall survival of LGG patients than other types of tumor patients, we found that there were most age related genes in old LGG patients. So, in our further studies, we focused on LGG tumor type.

Validation of age related genes in glioma patients derived from GEO dataset
From the TCGA dataset, we found that old LGG patients were associated with low overall survival, and 230 genes were differentially expressed in old LGG patients. To further confirm those findings, we analyzed the functions of age related genes from GEO datasets.
GSE43378 included gene expression, age at initial pathologic diagnosis and overall survival of 50 glioma patients [39]. Similar with the results derived from the TCGA dataset, old glioma patients were associated with low tumor overall survival in GSE43378 dataset (P=0.0012) ( Figure 3A). Compared with the young glioma patients, genes up or down regulated in old glioma patients were also identified (fold change >2). Heatmap demonstrated 165 age related genes in glioma patients derived from GSE43378 dataset ( Figure 3B). Among them, 32 genes were also differentially expressed in LGG patients derived from TCGA dataset ( Figure 3C). We then focused on those 32 genes for our further studies.
First, the regulatory network of those 32 age related genes was constructed using Cytoscape ( Figure 3D). EMP3 was one of the hub genes in this network. Previous results suggested that EMP3 was a critical biomarker in glioma prognosis [28][29][30][31][32]. EMP was connected with TIMP1 and SERPINE1 and SERPINE1 was connected with IGFBP2 ( Figure 3D). It had been reported that IGFBP2 could promote glioma development and was also a critical biomarker in glioma prognosis [33][34][35]. However, the prognostic effects of TIMP1 and SERPINE1 in LGG patients were unclear. We showed that IGFBP2, EMP3, TIMP1 and SERPINE1 were all highly expressed in old LGG patients compared with young LGG patients in TCGA dataset ( Figure 3E). Similarly, old glioma patients were with high expressions of those genes in GSE43378 dataset ( Figure 3F).

DNA methylation profiles of age related genes in LGG patients
Secondly, we tested the DNA methylation of age related genes in old LGG patients. Heatmap demonstrated the DNA methylation intensity (Beta value) of the 32 age related genes ( Figure 4A). We found that some genes, for example TIMP1, NNMT and CHI3L2 genes were hyper-methylated in LGG patients, while, HOXC9, MEOX2, METTL7B, PRLHR, CXCL14 and IGFBP2 genes were hypo-methylated in LGG patients ( Figure 4A).
Previous results suggested that the expression of EMP3 in glioma patients was controlled by DNA methylation [29]. We found that DNA methylation intensity of EMP3 gene was lower in old LGG patients, compared with young LGG patients (P=3.71e-05) ( Figure 4B). Similarly, age related gene SERPINE1 was hypo-methylated in old LGG patients, compared with young LGG patients (P=5.35e-05) ( Figure  4B). We speculated that the hypo-methylations of EMP3 and SERPINE1 were contributing to the high expressions of EMP3 and SERPINE1 in old LGG patients ( Figure 3E and 3F). However, we did not find significant different methylation intensity of IGFBP2 and TIMP1 between old and young LGG patients ( Figure 4B). And how IGFBP2 and TIMP1 genes were increased in old LGG patients still needed further studies.

Expression profiles of age related genes in normal and glioma tissues
Next, using GSE4920 [36], GSE16011 [37] and GSE50161 [38] datasets, we tested the expressions of the 32 age related genes in normal brain tissues and glioma tumor tissues. As illustrated in the heatmaps, most age related genes were up regulated in glioma tumor tissues ( Figure 5A). Interestingly, IGFBP2, EMP3, TIMP1 and SERPINE1 were clustered into a small group of genes in GSE4920, GSE16011 and GSE50161 three datasets. And as illustrated in the box plots, IGFBP2, EMP3, TIMP1 and SERPINE1 were all highly expressed in glioma tumor tissues, compared with the normal brain tissues ( Figure 5B).

Prognostic significance of age related genes in glioma patients
Next, using univariate cox regression analysis, we determined the prognostic significance of age related genes in glioma patients in both TCGA and GSE43378 datasets. 25 genes out of 32 genes were significantly associated with tumor overall survival in both TCGA and GSE43378 datasets (P< 0.05) ( Table 2), suggesting that the age related genes played important roles in determining the overall survival of LGG patients. For example, high IGFBP2, EMP3, TIMP1 and SERPINE1 expressions were all positively associated with the overall survival of LGG patients, while, high SFRP2, IRX1, KLRC3 and LUZP2 expressions were all negatively associated with the overall survival of LGG patients ( Table 2).
The prognostic significance of age related genes IGFBP2, EMP3, TIMP1 and SERPINE1 were further demonstrated in Kaplan-Meier survival analysis in glioma patients. Our results showed that, high expressions of IGFBP2, EMP3, TIMP1 and SERPINE1 genes were unfavorable prognostic markers in TCGA ( Figure 6A) and GSE43378 patients ( Figure 6B). LGG patients with higher expression of IGFBP2, EMP3, TIMP1 and SERPINE1 had worse prognosis than patients with low expression of those genes.

Combined EMP3 and SERPINE1 genes in glioma overall survival prediction
We also used multivariate cox regression to reveal the connection of IGFBP2, EMP3, TIMP1 and SERPINE1 genes in determining the clinical overall survival of LGG patients. We found that IGFBP2, EMP3 and TIMP1 were independent prognostic markers in TCGA LGG patients ( Figure 7A). However, in GSE43378 dataset, IGFBP2, EMP3, TIMP1 and SERPINE1 genes were interconnected with each other, and those genes were not independent prognostic markers ( Figure 7A).
So, in both TCGA and GSE43378 datasets, SERPINE1 was not an independent prognostic marker. We thought that the combination of SERPINE1 with other age related genes, particular EMP3, could be used as better prognostic marker in LGG patients. To test this hypothesis, the average expression level of EMP3 and SERPINE1 was calculated. Based on the average expression level of EMP3 and SERPINE1, glioma patients were divided into four groups. We found that patients with both high EMP3 and SERPINE1 expressions were particular with lowest overall survival in glioma patients ( Figure 7B). Patients with the low expression of EMP3 or SERPINE1 had better clinical outcomes ( Figure 7B). At last, we tried to determine the connection between EMP3 and SERPINE1. Spearman correlation demonstrated a high correlation between EMP3 and SERPINE1 in glioma expression TCGA and GSE43378 datasets (( Figure 7C).

Validation of age related genes in Chinese LGG patients
At last, we tried to validate the age related genes in Chinese LGG patients. 140 LGG patients were collected from the Chinese Glioma Genome Atlas (CGGA) dataset. Similar with the previous results, old LGG patients (older than 60 years) were associated with low tumor overall survival in CGGA dataset (P<0.0001) ( Figure 8A). Also, EMP3, TIMP1 and SERPINE1 were all highly expressed in old LGG patients compared with young LGG patients ( Figure  8B). However, we did not find significant different expression level of IGFBP2 between old and young LGG patients. We also showed that the high expressions of IGFBP2, EMP3, TIMP1 and SERPINE1 genes were associated with the unfavorable prognosis of LGG patients in CGGA dataset ( Figure 8C). Moreover, EMP3 and SERPINE1 were highly correlated with each other ( Figure 8D) and the combination of EMP3 and SERPINE1 genes had better survival prediction in Chinese patients with LGG ( Figure 8E). All those results were quietly similar with the results derived from TCGA and GSE43378 datasets.

Discussion
With the increasing of age, high incidence of cancer is occurred. For example, more than 70% of LUSC, BLCA and COAD patients are diagnosed in old adults ( Figure 1A). However, some tumors are particularly developed in young adults, like LGG and CESC. Nearly 90% LGG and CESC patients are developed under 60 years old ( Figure 1A). Those observations highlight the age related disparities in different tumor types. Although, in general, old tumor patients have worse prognosis than young tumor patients ( Figure 1B), age only is not always associated with the overall survival in all tumor types [40,41]. For example, there is no difference in the clinical overall survival in old LUAD, LIHC, ESCA and COAD patients compared with young LUAD, LIHC, ESCA and COAD patients ( Figure 1C). However, LGG is one of the most age related tumor type. Age is most associated with the overall survival of LGG patients than other types of tumor patients ( Figure 1C). Moreover, there are most age related genes in old LGG patients (Figure 2). Those observations intrigue great interests for the further studies of age related molecular profiles in LGG patients.
First, the age related genes in LGG patients are identified from TCGA dataset (Figure 2), and validated using GSE43378 and CGGA datasets ( Figure  3 and 8). DNA methylation intensity ( Figure 4A and 4B) and prognostic effects (Table 2) of those age related genes are also demonstrated. Furthermore, the expression profiles of age related genes in normal and glioma tissues are revealed ( Figure 5A). Base on those results, we find four age related genes IGFBP2, EMP3, TIMP1 and SERPINE1 may represent suitable biomarkers for prognostic strategies of LGG patients.
The prognostic effects of EMP3 [28][29][30][31][32] and IGFBP2 [33][34][35] in glioma patients are extensively studied. Here, we find two EMP3 and IGFBP2 associated genes TIMP1 and SERPINE1 are also suitable biomarkers for prognosis of glioma patients. EMP3, IGFBP2, TIMP1 and SERPINE1 are highly expressed in old LGG patients ( Figure 3E and 3F). In LGG tumor tissues, EMP3, IGFBP2, TIMP1 and SERPINE1 are all highly expressed, compared with normal brain tissues ( Figure 5B). Furthermore, high expressions of IGFBP2, EMP3, TIMP1 and SERPINE1 are associated with low overall survival of LGG LGG patients in CGGA dataset. P value was generated from Log-rank test. (B) Box plots showed the IGFBP2, EMP3, TIMP1 and SERPINE1 expressions (FPKM) in CGGA LGG patients. (C) Relationships of IGFBP2, EMP3, TIMP1 and SERPINE1 expressions and overall survival were analyzed in CGGA dataset. Kaplan-Meier survival analysis was used to compare IGFBP2, EMP3, TIMP1 and SERPINE1 highly expressed LGG patients (red) with IGFBP2, EMP3, TIMP1 and SERPINE1 lowly expressed LGG patients (black). P values were generated from Log-rank test. (D) Spearman correlation between EMP3 and SERPINE1 expression in CGGA dataset was determined. (E) Kaplan-Meier plotters demonstrated the different overall survival of glioma patients with high expressions of EMP3 and SERPINE1 and glioma patients with low expressions of EMP3 or SERPINE1 in CGGA dataset. Log-rank test was used to determine the P values. patients ( Figure 6). Interestingly, EMP3 and SERPINE1 are connected with each other and the combination of EMP3 and SERPINE1 genes have better prognostic effects in glioma patients ( Figure 7B and 7C).
Our results also show some inner mechanisms of how EMP3 and SERPINE1 are increased in old LGG patients. We find that EMP3 and SERPINE1 are hypo-methylated in old LGG patients, compared with young LGG patients ( Figure 4B). Although the expression of IGFBP2 is controlled by epigenetic DNA methylation in glioma patients [42], we find no significant different methylation intensity of IGFBP2 and TIMP1 genes between old and young LGG patients ( Figure 4B). So, how IGFBP2 and TIMP1 genes are increased in old LGG patients still need further studies.
Overall, our results provide deep understandings of how age and age related genes influence the clinical overall survival of LGG patients. Although clinical further validations are needed, our analysis suggest that IGFBP2, EMP3, TIMP1 and SERPINE1 genes, particular the combination of EMP3 and SERPINE1, could be used as biomarkers to predict the overall survival of LGG patients.

Conclusions
Age related genes IGFBP2, EMP3, TIMP1 and SERPINE1 have significant prognostic effects in LGG patients.