Machine learning classifiers for predicting 3-year progression-free survival and overall survival in patients with gliomas after surgery

Background: To develop machine-learning based models to predict the progression-free survival (PFS) and overall survival (OS) in patients with gliomas and explore the effect of different feature selection methods on the prediction. Methods: We included 505 patients (training cohort, n = 354; validation cohort, n = 151) with gliomas between January 1, 2011 and December 31, 2016. The clinical, neuroimaging, and molecular genetic data of patients were retrospectively collected. The multi-causes discovering with structure learning (McDSL) algorithm, least absolute shrinkage and selection operator regression (LASSO), and Cox proportional hazards regression model were employed to discover the predictors for 3-year PFS and OS, respectively. Eight machine learning classifiers with 5-fold cross-validation were developed to predict 3-year PFS and OS. The area under the curve (AUC) was used to evaluate the prognostic performance of classifiers. Results: McDSL identified four causal factors (tumor location, WHO grade, histologic type, and molecular genetic group) for 3-year PFS and OS, whereas LASSO and Cox identified wide-range number of factors associated with 3-year PFS and OS. The performance of each machine learning classifier based on McDSL, LASSO, and Cox was not significantly different. Logistic regression yielded the optimal performance in predicting 3-year PFS based on the McDSL (AUC, 0.872, 95% confidence interval [CI]: 0.828-0.916) and 3-year OS based on the LASSO (AUC, 0.901, 95% CI: 0.861-0.940). Conclusions: McDSL is more reproducible than LASSO and Cox model in the feature selection process. Logistic regression model may have the highest performance in predicting 3-year PFS and OS of gliomas.


Introduction
Gliomas are the most common primary malignant central nervous system in adults [1], which account for 80% of malignant brain tumors [2]. Gliomas are associated with substantial mortality and morbidity [3]. The prediction of clinical behavior, response to treatment, and survival outcome of glioma is challenging. Prognostic assessment of gliomas is very crucial for patient counseling, treatment strategy planning, and disease monitoring [4].
Low-grade gliomas typically affect younger adults and carry a favorable prognosis, while Ivyspring International Publisher high-grade tumors generally have the worst progression-free survival (PFS) and overall survival (OS). The OS for WHO grades II, III, and IV astrocytomas is approximately 6-8 years, 2 years, and 15 months, respectively [5]. Additional factors associated with survival outcome are age, sex, Karnofsky Performance Status (KPS), tumor morphology, extent of tumor resection, and treatment methods (surgery, radiotherapy, and chemotherapy) [5][6][7]. However, the prognostic value of these factors depends on the tumor grade and histological subtype. What's more, some of these factors are correlated with or confounded by other factors, and higher tumor grade is correlated with more advanced age. The influence of extent of resection on survival could be confounded by tumor location, whether it is resectable or non-resectable, and/or by clinical judgement [8].
The tumor biology is associated with the status of different molecular markers, such as isocitrate dehydrogenase (IDH), 1p/19q codeletion, and telomerase reverse transcriptase (TERT) mutation [9]. As compared with the WHO 2007 classification, the new classification announced by the WHO in 2016 recognized several new entities of diffuse glioma based on genotypes (IDH mutation and 1p/19q codeletion) in addition to the histologic phenotypes of tumors [10][11][12]. On the basis of previous studies of tumor biology, Eckel-Passow JE et al. proposed five molecular groups according to the three biomarkers: triple-positive (mutations in both TERT and IDH plus 1p/19q codeletion), mutations in both TERT and IDH, mutation in IDH only, mutation in TERT only, and triple-negative [9]. Recent studies on gliomas using The Cancer Genome Atlas database have revealed the association of IDH mutation, 1p/19q codeletion, and TERT promoter mutation with OS [9,13,14]. Favorable prognosis has been observed in tumors with IDH mutation and/or 1p/19q codeletion but poor prognosis in those with TERT promoter mutation.
The success of precision oncology relies on accurately categorizing patients on the basis of their prognostic characteristics. Therefore, we aimed to develop machine learning based clinical models to predict the 3-year PFS and OS among patients with gliomas and explore the effect of different feature selection methods on the predictive performance of machine learning classifiers.

Patient population and data collection
This retrospective study was approved by the local Institutional Review Board before data collection and analysis, which waived the requirement for written consent. We included a total of 505 consecutive patients at the neurosurgery department between January 1, 2011 and December 31, 2016. Inclusion criteria were as follows: (i) patients aged ≥18 at the time of first surgery; (ii) patients were treated by surgical resection with or without postoperative chemoradiotherapy; (iii) patients who had infiltrative glioma of histologic grade II, III, or IV. Grade I tumor (pilocytic astrocytoma) are clinically and pathologically distant and therefore was not included; (iv) patients were followed up ≥36 months; and v) patients had information of molecular alterations, including IDH mutation and 1p/19q codeletion, and TERT promoter mutation. Those patients with recurrent gliomas or underwent biopsy only were excluded. The following data were obtained for each patient: age at initial diagnosis, sex, KPS on admission, tumor location, histologic type, WHO grade, extent of surgical resection of tumor, radiotherapy, chemotherapy, molecular alterations, and survival outcomes. All of the cases were evaluated by a neuro-radiologist and a neuro-oncologist. Disagreements between the two observers were resolved by consensus and, if necessary, discussion with a third observer. The extent of resection was determined by analysis of preand post-operatively acquired cranial MRI scans [7]. Ideally, the post-surgical MRI was performed up to 72 hours after surgery. Identification of IDH1 mutation was performed by pyrosequencing of an 88-bp-long fragment of the IDH1 gene with the mutation hotspot at codon 132 [15,16]. For IDH2 mutations, pyrosequencing was performed on an 83-bp-long fragment of the IDH2 gene with the mutation hotspot at codon 172 [15,16]. IDH1 and IDH2 are very similar to each other and hereafter collectively referred as IDH. 1p/19q codeletion was analyzed by fluorescence in situ hybridisation according to standard protocols [17]. Detection of TERT promoter was performed by direct sequencing as previously reported [18]. Based on the status of IDH mutation, 1p/19q codeletion, and TERT promoter mutation, gliomas can be classified into five molecular groups [9]: triple-positive (mutations in both TERT and IDH plus 1p/19q codeletion), mutations in both TERT and IDH, mutation in IDH only, triple-negative, and mutation in TERT only. The primary outcomes were 3-year PFS and OS. PFS was defined as the interval between date of initial diagnosis (date of first surgery) and either disease progression or death, censored at the last follow-up visit. OS was defined as the interval from the date of surgery until date of death, censored at the last follow-up visit. Figure 1 illustrates the workflow of this study.

Feature selection methods
Potential predictors including age at initial diagnosis, sex, KPS on admission, tumor location, histologic type, WHO grade, extent of surgical resection of tumor, radiotherapy, chemotherapy, IDH1 mutation, IDH2 mutation, IDH mutation, 1p deletion, 19q deletion, 1p19q codeletion, TERT promoter mutation, and molecular groups. The multi-causes discovering with structure learning (McDSL) algorithm, the least absolute shrinkage and selection operator regression (LASSO), and Cox proportional hazards regression model were applied to select the factors.

Multi-causes discovering with structure learning algorithm
In this study, the multi-causes discovering with McDSL algorithm [19,20] was used as a causal assumption inferring method to discover the risk factors combination of PFS and OS of gliomas patient.
The dataset has n factors X = {x 1 , …, x n } and one outcome y (label). The McDSL discovers the risk factors combination through two phases: 1) search and transform each factors combination S = {x i ,…, x j } into a factor x s , 2) infer the causal relationship between x s and label y with additive noise model (ANM).
Firstly, calculating the correlation R of each factor x i and label y with their joint distribution, as shown in the following equation.
Among which, R(x i ,y) is the correlation of x i and y, m is the sample size of data, vx i is the i-th value of x i , k x is the number of values of x i , vy j is the j-th value of y, k y is the number of values of y, P(x i = vx i , y = vy j ) is the joint distribution.
Secondly, selecting the combination S with described correlation R, and transforming it into factor x s with the value combination of factors. The scale |S| was increased from 1 to n.
Finally, ANM was employed to infer the causal relationship between each x s and label y. ANM was presented to deal with the causal relationship of binary factors, and it performed well in handling binary discrete synthetic data (accuracy >93%) [21] and inferring the relationship between transformed factor and label (accuracy >90% in the most of combination scales) [20]. In the process of causal relationship assumption inferring of ANM, an additive noise was considered as existed, which could been undiscovered or unrecorded factor, missing data, and human error, etc. The formulae are shown as follows: = ( ) + , and ⊥ Of which, f is the mapping from x s to y, N is noise, N⊥x s means N and x s are independent. To infer the influence of x s on y, the value of y was interfered with conditional probability P(y|x s ), the interfered label was denoted as � , and � was the residuals between y and �. The Pearson's chi-squared test was employed to calculate P-value of the dependence of x s and � with regression. If and only if x s and � were independent, reverse was not valid, then inferred "x s is causing y" and was denoted as x⟶y.
To discover the most correlative risk factors combination of recurrence and death, the features of survival status and survival time were transformed into a 3-class label. In this label, class 1 means survival status had not occurred after 36 months, '2' means survival status had occurred within 36 months, '3' means survival status had occurred after 36 months. McDSL algorithm was employed to select the risk factors combination in each training dataset, and the significance level was set at p <0.05. The factors combination which was inferred in all training data sets, had the maximal area under the curve (AUC) and was presented to build classifiers.

The least absolute shrinkage and selection operator algorithm
The regularized multivariate logistic regression with the LASSO penalty for discovering risk factors of 3-class label. The LASSO regularization involves a parameter λ to control the number of selected features where a larger λ retains more features. To obtain an optimal feature number and avoid over-fitting, we used 5-fold cross validation in the training cohort to choose the optimal λ. The λ value that maximizes the AUC in the training cohort was selected as the optimal regularization parameter, and the feature number was therefore determined automatically by the λ.

Cox proportional hazards regression model
A univariate Cox analysis was firstly used to select the significant prognostic factors affecting 3-class label with backwards regression, respectively. Those factors with p <0.10 were entered into the multivariate Cox analysis. Finally, only factors with p <0.05 were deemed independent risk factors of 3-class label.

Machine learning model training and validation
Glioma data were randomly divided into the training cohort (n = 354) and validation cohort (n = 151) with 5-fold cross-validation. The selected predictors were entered into eight machine-learning classifiers: XGBoost, Adaboost, k-Nearest Neighbor (KNN), Logistic Regression (LR), Naive Bayes (NB), Random Forest (RF), Support Vector Machine (SVM), and Back Propagation Neural Network (BPNN). Area under the curve (AUC) was used as the compared performance. The average performance of all models was obtained by bootstrapping for 1000 times. The detailed descriptions of the machine learning models are present in Document S1.

Statistical analysis
All statistical analyses were performed by Matlab 2017b for McDSL and LASSO, and by R version 3.6.0 (http://www.Rproject.org) for eight machine learning algorithms. The R packages were used as follows: "xgboost" for XGBoost, "fitcensemble" for Adaboost, "fitcknn" for KNN, "glmfit" for LR, "fitcnb" for NB, "TreeBagger" for RF and "fitcsvm" for SVM and "nnet" for BPNN. Six performance measures of AUC, F-score, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated for each model. The Youden index was used to identify the optimized threshold of predicted score that balanced sensitivity and specificity.

Performance of the machine learning models
McDSL identified four causal factors during the bootstrapping for 1000 times, including tumor location, WHO grade, histologic type, and molecular genetic group. However, LASSO and Cox analysis identified unstable number of factors of 2-17 and 1-8, respectively. Figure S1 illustrates the scale of selected features and number of each scale for McDSL, LASSO, and Cox. Figure S2 shows the variable ranking according to their importance. For McDSL, WHO grade ranked the first, followed by molecular genetic group and histologic type or tumor location; for LASSO, WHO grade ranked the first, followed by IDH mutation, and molecular group or 1p/19q codeletion; for Cox analysis, the WHO grade ranked the first, followed by radiotherapy and histologic type.

Discussion
Our purpose was to identify the predictors of 3-year PFS and OS probabilities in patients with gliomas based on clinical data and molecular genetic markers using three selection methods, and then develop eight machine learning models to predict 3-year PFS and OS probability. The results show most machine learning models achieved high performance in predicting 3-year PFS and OS probability, of which, LR model had the optimal predictive performance. Although the performance of machine learning models was not significantly affected by the variable selection methods, McDSL algorithm was more stable and interpretable than Cox analysis and LASSO in the feature selection process and therefore can be used as a new method to select risk factors in the future.        After an initial glioma diagnosis, standard treatment including maximal surgical resection with or without temozolomide-based chemotherapy and radiotherapy was done. Through this process, some important established prognostic factors were obtained, such as WHO grade, extent of surgical resection, and postoperative treatment [22]. In this study, we found tumor location, WHO grade, histologic type, and molecular genetic group were risk factors of 3-year PFS and OS probability in patients with gliomas. Survival of gliomas is highly variable, reflecting molecular heterogeneity of the disease. Among all of the known glioma-associated molecular alterations discovered to date, the status of an IDH mutation has the largest prognostic significance. IDH mutations were noted in the vast majority of grade II and III gliomas [23], which were associated with improved survival as compared to glioblastoma (GBM). Patients with an IDH mutation gliomas had a significantly longer OS as compared with those with an IDH wildtype gliomas, with a median OS of 1.7, 6.3 and 8.0 years for patients with IDH wildtype, patients with IDH mutation only (astrocytic gliomas) and patients with IDH mutation plus 1p/19q codeletion (oligodendroglial gliomas), respectively [13]. Additionally, in a recent study of grade III glioma patients treated with radiotherapy and either temozolomide or nitrosourea, IDH mutation status was found to be a significant prognostic factor for PFS (hazard ratio HR] = 0.59) and OS (HR = 0.42) [24]. The 1p/19q codeletion was observed in tumors of the oligodendroglial lineage. The association between 1p/19q codeletion and prolonged OS has been observed in many previous studies. The median OS of patients with 1p/19q codeleted tumors was 11.9 years, significantly longer than that of 8.1 years for patients with 1p/19q intact tumors [25]. Regardless of treatment protocol, patients with combined 1p/19q codeletion and IDH mutation had the longest PFS at 62 months as compared with 48 months for IDH mutant alone and 20 months for IDH wildtype [26]. TERT promoter mutations were found in approximately 80% of IDH wildtype GBM, and in the majority of IDH mutant, 1p/19q codeleted oligodendrogliomas [9,27,28]. In GBM, TERT promoter mutations presented worse prognosis as compared with that of patients with IDH wildtype GBM [29][30][31]. Grade II/III gliomas with TERT promoter mutations alone harbored worse prognosis as compared with tumors with all three alteration [9]. When three molecular alterations alone and their combinations (i.e., molecular group) entered the McDSL, only molecular group was selected as a cause of multiclass of survivals, suggesting the molecular group was better than single molecular genetic marker due to the molecular group provides comprehensive genetic alterations characterization of gliomas.
Machine learning has been used to capture patterns within complex data that are beyond human perception and these patterns has been adopted to make data-driven prediction [32]. Overall results of machine learning models demonstrated good predictive performance for 3-year PFS and OS probability of glioma patients. In the glioma studies, machine learning algorithm has been applied directing toward discernment of MRI characteristics of tumors or a huge number of high dimensional radiomic features extracted from MRI [33]. Only one non-imaging-focused study has adopted machine learning for GBM outcome prediction [34][35][36][37][38]. However, the reliability and robustness of these models were influenced by various factors, such as interpretation of MRI images, tumor segmentation, feature and extraction, and repeatability of multiparametric features. Hence, machine learning based radiomics were insufficient for reliable clinical usage at this stage. LASSO and Cox model are commonly used to select features, but the numbers of selected features are varied after the resampling or bootstrapping. As indicated by this study, LASSO and Cox model selected a wide-range of feature number during the bootstrapping for 1000 times, in contrast, McDSL only selected four factors. Because in models constructed by LASSO and Cox model, independent variables are not all independent risk factors of survival, they are just associated with survival. However, the variables identified by McDSL have causal relationship with survival and therefore are true risk factors.
There were also some limitations should be acknowledged. Firstly, we did not consider the tumor size and intratumor features, such as necrosis size and texture features extracted from images. Previous studies showed no relationship between tumor size, necrosis size and patient survival [39][40][41]. It is challengeable for texture features used in predicting survival outcomes due to unsolved issued of reproducibility and interpretability before application in clinical setting. Secondly, we did not perform external validation with independent datasets for generalization. Further prospective study is needed to integrate the molecular genetic markers based machine learning models into clinical practice. Thirdly, we did not consider the effects of various treatment strategies on tumor progression after standard chemoradiotherapy in OS analyses. However, this pitfall may be mitigated by performing PFS analyses.
In conclusion, our study results implied that machine learning models based on clinical profiles could obtain high performance in predicting 3-year PFS and OS probability of patients with gliomas. We demonstrated the importance of diverse sources of features in predicting survival outcomes of gliomas. The predictive model presented in this study is a preliminary step in a long-term plan of developing personalized treatment plans for glioma patients.