Prognosis of colorectal cancer patients is associated with the novel log odds of positive lymph nodes scheme: derivation and external validation

Background and aim: To construct proper and externally validate cut-off points for log odds of positive lymph nodes scheme (LODDS) staging scheme in colorectal cancer (CRC). Patients and methods: The X-tile approach was used to find the cut-off points for the novel LODDS staging scheme in 240,898 patients from the Surveillance, Epidemiology and End Results (SEER) database and externally validated in 1,878 from the international multicenter cohort. Kaplan-Meier plot and multivariate Cox proportional hazard models were performed to investigate the role of the novel LODDS classification. Results: The prognostic cut-off values were determined as -2.18, and -0.23 (P< 0.001). Patients had 5-year cancer-specific survival rates of 83.8%, 57.4% and 24.4% with increasing LODDS (P< 0.001) in the SEER database. Five-year overall survival rates were 77.2%, 55.0% and 26.7% with increasing LODDS (P< 0.001) in the external international multicenter cohort. Multivariate survival analysis identified both the LODDS classification, the patient's age, the T category, the M status, and the tumor grade as independent prognostic factors in both two independent databases. The analyses of the subgroup of patients stratified by tumor location (colon or rectum), number of retrieved lymph node (< 12 or ≥ 12), TNM stage III, lymph node-negative also confirmed the LODDS as independent prognostic factors (P< 0.001) in both two independent databases. Conclusions: The novel LODDS classification was an independent prognostic factor for patients with CRCs and should be calculated for additional risk group stratification with pN scheme.


Introduction
The presence of lymph node metastases (LNM) and the number of lymph node metastasis, also called the number of positive lymph node (pN), are robust risk factors in patients with colorectal cancers (CRCs) [1], which may determine subsequent adjuvant therapies and surveillance strategies [2,3]. Based on Ivyspring International Publisher the number of involved LNM, CRCs could be classified as pN0, pN1 (1-3 tumor invaded LNM), and pN2 (4 and more tumor invaded LNM) cancers [4].
Adequate lymph node histopathological assessment is of significant impact on accurate pN staging but the minimum of recommendations in the literature ranged significantly [5][6][7]. In order to avoid stage migration effect, adequate evaluation of the lymph node status is of great importance and, to date, the widely accepted minimum number of retrieved lymph nodes is 12 [8][9][10]. However, it was surprising that the reported numbers of retrieved lymph nodes (rN) in CRCs varied widely in published literatures, median ranging from 6 to 13 [11,12], though precise recommendations and guidelines are available.
Considering inadequate examination of lymph nodes in nearly half of CRC patients, it is urgent to recommend new measures of lymph node status with combination of rN. To the best of our knowledge, two measures, namely lymph node ratio (LNR) [13], log odds of positive lymph nodes (LODDS) [14] have been proposed. Multiple studies have shown superiority of LNR to pN in accurately predicting patient's survival [13,15]. Although LNR seems to be a superior predictor of survival in Stage III colorectal cancer [7,[15][16][17][18], the results remained controversial, particularly in CRC with no LNM and inadequate rN. In node-negative CRCs accounting for more than half of CRCs [19], LNR0, same to pN0 classification, does not provide any more additional prognostic evaluation than pN0. In this situation, patients with no LNM may be at high risk of understaging with lack of adequate of rN and incorrect choice of postoperative adjuvant treatments after surgery may be made.
LODDS has been recently proposed as a new prognostic index in CRCs [14,[20][21][22][23], showing powerful ability to classify patients into different groups with homogeneous survival, regardless of lymph node status and count. However, different methods were used in different studies to determine their cut-off LODDS values. Three studies used statistical methods to calculate cut-off values in limited numbers of patients [14,21,23]. Two studies even used arbitrary classification for investigation [20,22]. Though Song's study [23] used the statistical method to calculate cut-off values in relative large number of CRC patients, its cut-off values were not validated in other populations.
In the present study, we aimed to find optimal categorization of LODDS values using X-tile approach in a large population-based database involving 240,898 CRCs and validate our determined LODDS cut-off values in the international multicenter cohort, providing a more precise lymph node staging scheme for patients with CRCs.

Surveillance, Epidemiology and End Results Database
Patients with CRCs from Surveillance, Epidemiology, and End Results (SEER) database were created and collected through query to the latest version of the SEER 18 Regs Research data , released in April 2017 with the SEER*Stat 8.3.5 software. The inclusion criteria for selected patients were as follows: 1) Patients aged 18 years old or more diagnosed between 1988 and 2010; 2) Patients with CRCs diagnosed as the only primary cancers without multiple primary cancers elsewhere; 3) Patients with cancers diagnosed microscopically, in whom surgery for primary cancers and regional lymph node resection had been performed with pathological examination of at least one lymph node; 4) International Classification of Diseases for Oncology third edition (ICD-O-3) codes were used as 8010-8231 and 8255-8576 for CRCs; 5) Patients with active follow-up for at least 2 months. Patients were excluded if they received radiotherapy before surgery or CRC was not the only one primary carcinoma or the number of retrieved lymph nodes and positive lymph nodes were missing.
For SEER database, cancer-specific survival (CSS) was defined as death due to CRC and OS was defined as death regardless of any causes. The primary outcome was CSS with OS and CSS considering competing death due to non-CRC death as the secondly outcome. Survival time was defined as the time from diagnosis to the date of death or last contact or Nov 2016.
Since SEER database is public-use data, no institutional review was required, and we have been allowed to access SEER database for only research using the private SEER ID (zhangqw).

International multicenter cohort
An independent international multicenter cohort from three medical centers was used as validation group using the same inclusion and exclusion criteria. Rome, the patients were followed up until death or study end (30 th April 2018) except for those lost to follow-up according to the European Society of Medical Oncology guidelines [24].
This study was approved by the Institutional Review Board in the all participating hospital.

Definitions of node staging scheme
Two values, namely pN and the number of negative lymph node (nN), were needed to calculate LODDS values. The pN value indicates the number of positive lymph node. rN value is defined as the number of retrieved lymph nodes for histological examination of lymph node metastasis status. The nN value is the absolute number of negative lymph node, which is calculated by subtracting pN value from the total number of rN value. The LODDS value is defined as loge ([pN + 0.5]/ [nN + 0.5]). As a continuous variable, LODDS values are then classified as a novel five-subgroup LODDS classification using proper cut-off points using the X-tile [25].

Statistical analysis
For descriptive statistics, the absolute number with proportion for categorical variable, mean and standard deviation for continuous variable with Gaussian distribution and median and interquartile range (IQR) for continuous variable with non-normally distribution were used respectively. The chi-square test for categorical variable, Student's t-test for continuous variable with Gaussian distribution and the nonparametric Kruskal-Wallis rank sum test for continuous variable with non-normally distributed data were used for comparisons among different patient groups respectively. The above descriptive statistics and exploratory comparisons were done using CBCgrps package [26].
For survival analysis, Kaplan-Meier method was performed to calculate and show survival rates in different patient groups with log-rank test used for statistical comparisons. Meanwhile, multivariate Cox regression models with variable selection procedures were used to explore potential risk factors associated with patient's survival. Besides, cumulative probability of CRC-specific death and multivariate regression modeling of subdistribution functions in competing risks were also performed for sensitivity analysis for our findings using cmprsk package [27].
Five-year survival rate and hazard ratios (HRs) were calculated with 95% confidence intervals (CIs).
To determined optimal categorization of LODDS values, X-tile technique was used to define the optimal cut-off points by the log-rank test [25]. Firstly, X-tile technique would divide the population into low-, medium-and high-level LODDS value by every possible cut-off. Then, survival between all possible divisions according to LODDS values were tested by the log-rank test. Finally, the optimal LODDS cut-off would be selected by selecting the highestχ 2 value.
Statistical analyses and plotting graphics were conducted using R software package (version R-3.4.3, the R Foundation for statistical computing). All statistical comparisons were considered significant with P< 0.05.

Clinical characteristics of patients from SEER database and the international multicenter cohort
With defined inclusion and exclusion criteria (Supplementary Figure 1), a total of 240,898 patients with CRCs were finally identified from the SEER database. As shown in the Table 1, the median number (IQR) of retrieved lymph nodes was 12 (7,18) for the total patient group. The median number (IQR) of continuous LODDS value for the total patient group was -2.51 (-3.3, -1.21) and the follow-up time was 65 (26, 116) months.
With the same inclusion and exclusion criteria as SEER database, we identified 1,878 patients from the international multicenter cohort. In the international multicenter cohort, the median number (IQR) of retrieved lymph nodes was 10 (5, 14) for the total patient group. The median number (IQR) of continuous LODDS value for the total patient group was -2.51 (-3.3, -1.21) and the follow-up time was 48 (21, 75) months.
The remaining clinicalpathological characteristics for SEER database and the international multicenter cohort could be seen in the Table 1.

Clinical characteristics and survival rate among different novel LODDS group: derivation and validation
The X-tile analysis finally identified optimal thresholds of LODDS. The novel LODDS group classified by the cut-off values of -2.18, -0.23 showed the highestχ2 value for the CSS. Therefore, a novel LODDS classification subgroup was determined in this study using the above LODDS cut-off points: LODDS1 (-2.18 or less), LODDS2 (more than -2.18 to -0.23) and LODDS3 (more than -0.23).
We further explored whether the established LODDS classification could classify patients into groups with homogeneous survival. As is shown in the Table 3, there were no patients with N0 disease had an LODDS larger than -0.23. However, within the N0 subgroups, one can see a difference in 5-year OS between patients with LODDS1 (72.0% in the SEER database and 77.1% in the international multicenter cohort) and patients with LODDS2 (64.6% in the SEER database and 62.9% in the international multicenter cohort). These difference were highlighted in Table 3 and show the importance of the stratification of patients by our novel established LODDS classification.

Multivariate Cox analysis: role of novel LODDS classification in patients' survival
As is shown in the Figure 2, multivariate Cox model, which included LODDS classification and all potential risk factors, identified the LODDS classification (P< 0.001), sex (P< 0.001), race (P< 0.001), age (P< 0.001), tumor location (P< 0.001), grade (P< 0.001), histology (P< 0.001), T stage (P< 0.001), M stage (P< 0.001), and tumor size (P< 0.001) as independent prognostic factors. Similar results could be obtained using Multivariate Cox analysis with OS as outcome (Supplementary Figure 2) and multivariate regression modeling of subdistribution functions in competing risks (Supplementary Table 1). As is shown in the Figure 3, it was validated in the international multicenter cohort that LODDS classification, and all prognostic factors were identified as independent prognostic factors.
We next analyzed whether LODDS classification was also an independent risk factors in different subgroups using multivariate Cox regression in the SEER database and the international multicenter database. As is shown in Figure 4A, LODDS classification was identified as an independent risk factor in CRC with number of examined lymph nodes >=12, number of examined lymph nodes <12, 7 th TNM stage III, colon cancer, rectal cancer or cancer without lymph node involvement. Similar results could be obtained in the international multicenter cohort ( Figure 4B).
Studies have shown that pN category did not show powerful prognostic impact in patients with TNM stage III colorectal cancers [15,28].

Discussion
In this study, we presented a population-based analysis of 240,898 patients in SEER database and international multicenter analysis of 1,878 patients in three medical centers. Our analyses showed that LODDS was a powerful prognostic factor for CSS or OS in patients with CRCs in both SEER database and international multicenter cohort. We identified cut-off values -2.18 and -0.23 for LODDS classification. We demonstrated that the novel LODDS classification had significant prognostic impact on CSS or OS in SEER database and the present study was the first study to validate prognostic impact of the novel LODDS classification on survival in an independent cohort from 3 medical centers.
Although pN classification of AJCC TNM classification is the most commonly used staging system for CRCs, it only relies on the number of positive lymph node without the number of retrieved lymph node or the number of negative lymph node, which are also associated with survival [7,13,29]. Therefore, only when the rN is 12 or more, pN category could be regarded as accurate staging [30,31]. However, case with less than 12 rN are not unusual in clinical practice, which lead to development of new lymph node staging schemes incorporating all the two lymph node information in one single variable. Among the schemes, LNR and LODDS are most promising classifications [13,14]. During the last one decade, multiple studies have poured out that LNR was comparable and even superior to that of well-established prognostic factors, such as TNM classification, in CRCs [13,15]. Unfortunately, LNR has some drawbacks: it do not provide any meaningful information in node-negative CRCs; it do not predict survival well in patients without adequate rN [13]; it also cannot discriminate survival difference among patients with all lymph nodes invaded. The other scheme LODDS could solve the above mentioned drawbacks of LNR.
To date, several groups have reported the prognostic impact of LODDS in colorectal cancer (Supplementary Table 2) [14,[20][21][22][23]. Of them, 3 studies used statistical methods to calculate cut-off values of LODDS for optimal discrimination. The first study [23] used running log-rank statistics to calculate using OS as primary outcome for CRCs in Chinese single-center cohort of limited number of 1297 patients. Our previous study [14] used log-rank test for optimization also using OS in only colon cancers in single-center cohort of limited number of 258 patients. The last study used regression trees technique for classification also using OS in CRCs in Chinese single-center cohort of limited number of 192 patients [21]. The other 2 studies [20,22] even used arbitrary cut-off values for optimization. None of LODDS cut-off values in these studies were validated in another independent cohort. Therefore, there is still lack of proper and accurate LODDS classification. This study was the first study to use the largest number of CRCs for optimization of LODDS using minimal P approach on X-tile software [25] with additionally external validation of determined LODDS cut-off values in an independent international cohort. Different from the above mentioned studies, our study used CSS as primary outcome instead of OS, since prognostic risk factors for CSS could more truly reflect death due to CRCs. Besides, we also tested our cut-off values for OS and CSS under competing risk model with positive results.
One of reason why LODDS is superior to pN or LNR is its prognostic classification of node-negative CRCs patients (pN0 or LNR0). In our study, we found our determined LODDS cut-off values could classify patients with node-negative CRCs into two homogeneous groups with significant prognostic difference, which were consistent with results in other studies [14,22]. In the future, we will evaluate the three lymph node schemes in predicting survival in patients with CRCs to explore whether LODDS is superior to pN or LNR and potential mechanism deeply. Besides, we will try to build a new TNM stage system using our determined LODDS classification for survival optimization of CRCs patients.
Limitation should be discussed in this study. Firstly, it was a retrospective exploratory study based on SEER database and an independent multicenter cohort, clinical and histological characteristics may differ among different registers or hospitals. However, it was actually representative of clinical practice in the real world. Secondly, the follow-up time and number of patients in the independent multicenter cohort were shorter than those in SEER database. It would be better to externally validate our findings in another large multicenter-based cohort or population-based register with longer follow-up time. Thirdly, the study includes patients between 1988 and 2010 when the standard of care for these patients has improved and thus the prognosis has changed. However, we did a subgroup analysis stratified by the time of diagnosis, and results showed patients with LODDS2 or LODDS3 still had poorer prognosis than patients with LODDS1, which indicated that time of diagnosis had limited impact on LODDS classification for prognosis of CRCs. However, our study was different from other studies about LODDS classification in CRCs of which only three [14,21,23] used the statistical method to find the optimal cut-off values for LODDS and none tested whether their LODDS classification still had significant impact on prognosis in the other independent validation datasets. We used the largest number of CRCs for optimization of LODDS using minimal P approach on X-tile software. We also used the international multicenter cohort to test whether our established LODDS classification still had significant impact on prognosis in the other independent validation datasets with positive results. In the future, we would also develop a novel prediction model based on our established LODDS classification.
In summary, the present study involved the largest number of CRCs to identify the optimal thresholds of LODDS and was the first study to evaluate the determined cut-off values of LODDS in an independent multicenter cohort. The LODDS classification, with cut-off values -2.18, and -0.23, was an independent prognostic factor in patients with CRCs regardless of tumor location, number of retrieved lymph node, stage III and node-negative cancers. LODDS could improve the prognostic power of current staging systems and should be documented additionally for cancer staging of CRCs.