J Cancer 2018; 9(21):3971-3978. doi:10.7150/jca.26649
Multiple Machine Learnings Revealed Similar Predictive Accuracy for Prognosis of PNETs from the Surveillance, Epidemiology, and End Result Database
1. Department of General Surgery, Guangdong Second Provincial General Hospital, Guangzhou, China
2. Department of Anesthesia, The First Affiliated Hospital of Sun Yat-sen University, Guangzhou, China
3. Zhongshan School of Medicine, Sun Yat-sen University, Guangzhou, China
# Song and Gao contributed equally to this work and should be regarded as co-first authors.
Song Y, Gao S, Tan W, Qiu Z, Zhou H, Zhao Y. Multiple Machine Learnings Revealed Similar Predictive Accuracy for Prognosis of PNETs from the Surveillance, Epidemiology, and End Result Database. J Cancer 2018; 9(21):3971-3978. doi:10.7150/jca.26649. Available from http://www.jcancer.org/v09p3971.htm
Background: Prognosis prediction is indispensable in clinical practice and machine learning has been proved to be helpful. We expected to predict survival of pancreatic neuroendocrine tumors (PNETs) with machine learning, and compared it with the American Joint Committee on Cancer (AJCC) staging system.
Methods: Data of PNETs cases were extracted from The Surveillance, Epidemiology, and End Result (SEER) database. Statistic description, multivariate survival analysis and preprocessing were done before machine learning. Four different algorithms (logistic regression (LR), support vector machines (SVM), random forest (RF) and deep learning (DL)) were used to train the model. We used proper imputations to manage missing data in the database and sensitive analysis was performed to evaluate the imputation. The model with the best predictive accuracy was compared with the AJCC staging system using the SEER cases.
Results: The four models had similar predictive accuracy with no significant difference existed (p = 0.664). The DL model showed a slightly better predictive accuracy than others (81.6% (± 1.9%)), thus it was used for further comparison with the AJCC staging system and revealed a better performance for PNETs cases in SEER database (Area under receiver operating characteristic curve: 0.87 vs 0.76). The validity of missing data imputation was supported by sensitivity analysis.
Conclusions: The models developed with machine learning performed well in survival prediction of PNETs, and the DL model have a better accuracy and specificity than the AJCC staging system in SEER data. The DL model has potential for clinical application but external validation is needed.
Keywords: machine learning, pancreatic neuroendocrine tumor, prognostic prediction, SEER database