Development and Validation of an Individualized Nomogram for Predicting Survival in Patients with Esophageal Carcinoma after Resection

An accurate estimation of prognosis of the esophageal carcinoma patients after surgery is urgently needed. Clinical nomogram has been developed to quantify risk by incorporating prognostic factors for individual patient. Based on the Surveillance, Epidemiology, and End Results (SEER) database from 2004 to 2013, a total of 4566 patients were selected. Of those, 3198 patients were assigned to training set to construct the nomogram, which incorporated age, gender, histology, grade, T stage, N stage, nodes examined, radiation and chemotherapy. The calibration curve for probability of survival showed good agreement between prediction by nomogram and actual observation. The C-index of the nomogram was 0.71(95%CI 0.70-0.72), which was statistically higher than the TNM staging system. The results were then validated using bootstrap resampling and a validation set of 1368 patients in the SEER database. Besides, in the esophageal squamous cell carcinoma and esophageal adenocarcinoma subgroups, the nomogram discrimination was superior to the TNM staging system. It is likely that these results would play a supplementary role in the current staging system and help to identify the high risk population after surgery.


Introduction
Esophageal cancer represents a heterogenous entity that is associated with high morbidity and mortality. It is the eighth most common cancer worldwide, with roughly 450,000 new cases reported per year [1]. Also, it ranks as the sixth most common cause of cancer-related death worldwide, with more than 80% of patients eventually succumbing to this disease [2].
Surgery is the primary treatment for patients diagnosed with resectable esophageal carcinoma. Nonetheless, the 5-year survival rate remains relatively modest at less than 40%. [3,4] It is of great importance to define the patient populations with higher risk to relapse or metastasis, who may receive more benefit from post-operative therapy.
An accurate estimation of survival rates of the esophageal carcinoma patients after surgery is therefore needed. The tumor node metastasis (TNM) classification is the most widely used staging system. However, several important prognostic factors, such as age and number of examined lymph nodes, are not included in TNM system [5]. In addition, increasing Ivyspring International Publisher evidences showed the unsatisfactory discriminative ability of TNM system in prognostic prediction [6][7][8].
Clinical nomogram has been developed with intuitive graphs to quantify risk by incorporating all known prognostic factors for individual patient [9]. Nomogram has been widely used in different cancer types, and shown to be more accurate than the TNM staging systems for predicting prognosis [10][11][12].
The present study was designed to develop a prognostic nomogram for patients with non-metastatic esophageal carcinoma (nMEC) who underwent surgery based on National Cancer Institute , s Surveillance, Epidemiology, and End Results (SEER) database, to determine whether this model provided more-accurate prediction of patient survival when compared with TNM system. In addition, we assessed the performance of this model in esophageal adenocarcinoma (EAC) and esophageal squamous cell carcinoma (ESCC) population, respectively.

Data
The Surveillance, Epidemiology, and End Results (SEER) database is a population-based cancer registry that included a sample (about 27.8%) of the national population. We used the data based on the recent SEER 18 registries research database from 2004 to 2013.
We collected information on patient characteristics (age, gender and race), primary tumor features (location, histology, grade, T stage, N stage and nodes examined), treatment approaches (radiation and chemotherapy) and clinical outcomes (cancer specific survival and overall survival).

Inclusion Criteria
We selected patients from the SEER database following International Classification Disease for Oncology, 3rd Edition (ICD-O3) topography codes for anatomic location in the esophagus: proximal esophagus (15.

Exclusion Criteria
We excluded patients with IVA, IVB and IV NOS stage (n=8702), patients who did not receive surgery (n=6924), patients with indeterminate TNM stage (n=1077) and patients with indeterminate nodes examined (n=260) (Figure 1). We also excluded those cases with any missing or unknown information in terms of the prognostic factors included in the final model.

Statistical Analysis
Data were analyzed using SPSS version 17.0 (SPSS Inc., Chicago. USA). For all statistical testing, we used a 2-sided significance level (alpha) of 0.05. Survival curves were depicted using the Kaplan-Meier method and compared using the log-rank test. Multivariate analyses were based on Cox regression analysis.
We selected the optimum cutoff score for the number of lymph nodes examined using X-tile plots (version 3.6.1; Yale University School of Medicine, New Haven, CT, USA) [13].
For the development of nomogram, we randomly divided 70% of the whole data into a training cohort (n = 3198) and 30% into a validation cohort (n = 1368). A nomogram was formulated grounded on the results of multivariate analysis with the package of rms in R version 3.5.0.
All the factors included in the nomogram met the proportional-hazard assumption after reviewing the curves showing log[-log(S(t))]~t.
The median follow-up time was estimated by the method of reverse KM estimator. The performance of the nomogram was assessed by concordance index (C-index) as well as by comparing nomogram-predicted versus observed Kaplan-Meier estimates of survival probability. Bootstraps with 1,000 resample were used for these Comparisons.The differences between the nomogram and the TNM stage systems were detected using the rcorrp.cens function in the R package Hmisc.

Clinicopathologic Characteristics of Patients
A total of 4566 patients with non-metastatic esophageal carcinoma who had undergone surgery were included. Demographic and clinicopathologic characteristics of the study population were summarized in Table 1. ESCC and EAC accounted for 79.7% and 20.3% of the whole group,respectively. The 5-year survival rate was 40.5%, and the median follow-up time was 78.0 months (95%CI 75.9-80.1 months).

Independent Prognostic Factors in the Training Cohort
A total of 3198 patients were assigned into the training cohort. Multivariate analyses demonstrated that age, gender, histology, grade, AJCC T stage, AJCC N stage, nodes examined, radiation and chemotherapy were independent risk factors for cancer-specific survival (CSS) ( Table 2).

Prognostic Nomogram for Cancer-specific Survival
The prognostic nomogram that integrated all significant independent factors for CSS in the training cohort was shown in Figure 2. The C-index for CSS prediction was 0.71 (95% CI, 0.70 to 0.72).
The calibration plot for the probability of survival at 3-and 5-year after surgery showed an optimal agreement between the prediction by nomogram and actual observation ( Figure 3A and 3B).

Comparison of Predictive Accuracy Between Nomogram and TNM Staging System
Our nomogram showed better accuracy in predicting 3-and 5-year survival in the training cohort. The C-index of the nomogram was 0.71, which was significantly higher (P<0.001) than the AJCC seventh edition staging system (0.67), the AJCC sixth edition staging system (0.64). The results suggested that the nomogram was a useful predictor for survival of patients with esophageal carcinoma in the training cohort.

Validation of Predictive Accuracy of the Nomogram
A total of 1368 patients were assigned into the validation cohort. The C-index of the nomogram for predicting CSS was 0.70 (95% CI, 0.68 to 0.72), and a calibration curve showed good agreement between prediction and observation in the probability of 3-and 5-year survival ( Figure 3C and 3D).  Nomogram-predicted probability of overall survival is plotted on the x-axis; actual overall survival is plotted on the y-axis.
The C-index of nomogram was significantly higher (P< 0.001) than the AJCC seventh editing staging system (0.66), and the AJCC sixth edition staging system (0.65).

Prognostic Nomogram for CSS in EAC and ESCC subgroups
In the EAC cohort, the prognostic nomogram that integrated all significant independent factors for CSS was displayed in supplementary figure 1. The C-index was 0.72 (95% CI, 0.71 to 0.73), which is significantly higher (P<0.001) than the AJCC seventh editing staging system (0.68), and the AJCC sixth edition staging system (0.66).
Similarly, the prognostic nomogram for CSS in the ESCC cohort was shown in supplementary figure 2. The C-index was 0.67 (95% CI, 0.65 to 0.70), which was significantly higher (P< 0.001) than the AJCC seventh edition staging system (0.62), and the AJCC sixth edition staging system (0.60).
The calibration curve for the probability of survival at 3 or 5 year suggested a satisfactory agreement between the prediction by nomogram and actual observation (Supplementary Figure 3A-D).

Discussion
In this study, a prognostic nomogram based on large population database for patients with nMEC after surgery was constructed. The nomogram performed well in predicting 3-and 5-year cancer specific survival, which was supported by the C-index (0.71 and 0.70 for the training and validation cohorts, respectively) as well as the calibration plot. When compared with AJCC TNM staging systems, the nomogram demonstrated superior predictive accuracy for CSS.
Debate continued about the best strategies for the construction of nomogram in patients with esophageal cancer. In 2016, based on the SEER database, Cao et al. [8] constructed a nomogram for patients with esophageal cancer who underwent esophagectomy. The prognostic model included age, race, histology, tumor site, tumor size, grade, depth of invasion, number of positive nodes and the retrieved nodes. It exhibited a good survival prediction for those patients (C-index= 0.716). However, the main weakness of that study was the inadequate median follow-up time (28 months, range 3 to 276 months), which weakened its ability to predict long-term prognosis. Besides, the author made no attempt to evaluate performance of the nomogram in different histology subtypes. Moreover, the nomogram did not include information about radiation and systemic chemotherapy.
In contrast to the earlier finding, the median follow-up time in the present study was 78 months.
Besides, we excluded race and tumor site from the nomogram due to their lack of statistical significance with survival. Lastly, considering the impact of radiation and chemotherapy on prognosis, we included these factors in order to provide more accuracy in survival prediction.
ESCC and EAC represented two primary histological subtypes of esophageal malignancy, with significant differences in epidemiology, tumor characteristics and genetic features [14,15]. Therefore, it is essential to assess the performance of the nomogram in the two subtypes separately. The C-index of nomogram in patients with EAC and ESCC was 0.72 and 0.67, respectively, which was both significantly higher than TNM staging system.
The conclusion of this study should be interpreted in caution because of the inherent limitation of SEER database such as the unrecorded variables, variations in data coding and reporting, and migration of patients in and out of SEER registry areas. Additionally, an entire sector of cancer management is unaccounted for in this database, as pharmaceutical information like chemotherapy. Besides, some confounding bias and selection bias also should be taken into account when interpreting the results from this observational study based on SEER [16].

Conclusion
We developed and validated a prognostic nomogram to provide an individual survival prediction for nMEC patients who underwent surgery. Compared with the TNM staging system, our nomogram exhibits a better prognostic discrimination and survival prediction. It is likely that these results would play a supplementary role to the current staging system and help to identify the high risk population after surgery.