Auxiliary diagnosis of Lung Cancer on the basis of a Serum Protein Biomarker Panel

Objectives: In this study, we established a serum protein biomarker panel (consisting of Pro-SFTPB, CA125, Cyfra21-1, and CEA) and evaluated the feasibility and performance for the auxiliary diagnosis of lung cancer in the Chinese population. Materials and Methods: The current study was a single-center study based on the Chinese population and performed in two cohorts (training cohort and validation cohort). Serum concentrations of Pro-SFTPB, CA125, Cyfra21-1, and CEA were measured by a bead-based flow fluorescence immunoassay. The discrimination performance of the model was assessed using sensitivity, specificity, and the area under the receiver operating characteristic (ROC) curve (AUC). Results: For the biomarker panel model, the AUC was 0.88 (95% CI, 0.85-0.91) in the training cohort and 0.90 (95% CI, 0.86-0.92) in the validation data cohort, which was significantly greater than the AUC of each biomarker alone. For the nodule risk model, the AUC was improved to 0.96 (95% CI, 0.94-0.98) in the training cohort and 0.95 (95% CI, 0.93-0.97) in the validation cohort. In addition, the biomarker panel model yielded an AUC of 0.78 (95% CI, 0.74-0.81) for stage I & II lung cancer, better than the performance of individual biomarker alone. Conclusions: It was demonstrated that 4-protein biomarker panel had a significant performance in identifying lung cancer patients from healthy controls, especially combining with the nodule size. Specifically, it yielded excellent discrimination for identifying early-stage lung cancer patients than individual biomarker alone. A future large-scale study is underway to further define the clinical application of this method for the early diagnosis of lung cancer among Chinese populations.


Introduction
Lung cancer is one of the most malignant forms of human cancers, and one of the leading causes of cancer mortality worldwide [1,2]. Lung cancer is highly correlated with tobacco exposure [3], with 2.1 million new lung cancer cases and 1.8 million deaths among smokers annually [1]. Besides, its incidence among non-smokers remains a significant global health problem, especially in East Asia and most predominantly among women patients [3]. In 2015, it was estimated that there were 610,200 lung cancer-associated deaths in China [4]. Among them, more than 25% of the population were female lung cancer patients. The high lung cancer incidence rate among Chinese women, despite their low smoking prevalence, was thought to reflect increased exposures to environmental hazards, such as smoke from burning of charcoal for heating and cooking [3]. Thus, developing effective methods of early lung Ivyspring International Publisher cancer detections among both smoking and nonsmoking populations remains a critical public health task, not only in China but also in the world.
Also in China, approximately 88.2% of the lung cancer patients are asymptomatic at initial and routine annual health screening. On the other hand, once diagnosed, approximately 65.3% of the lung cancer patients were already at stage III or stage IV [5].
Several techniques, including thoracic radiography [6], sputum cytology [7], and computed tomography (CT) [8] are commonly adapted as auxiliary tools for lung cancer diagnosis. While thoracic radiography and sputum cytology have failed to provide adequate levels of sensitivity for early-stage lung cancer diagnosis, CT screening is often considered as a better imaging-based detection method and is recommended for lung cancer diagnosis among heavy smokers by the US Preventive Services Task Force (USPSTF) [9]. On the other hand, the limitations of CT scanning are also well documented, including a high false-positive rate and inability to distinguish benighted nodules versus tumorigenic nodules [10]. Thus, other auxiliary detection methods, such as biological and genetic screenings are much needed to efficiently identify lung cancer patients at early stages.
Among many of the auxiliary lung cancer detection methods, protein-based biomarker diagnostics seemed to be promising for early-stage cancer identification [11]. During the past decades, protein-based biomarker investigations have been conducted in serum, tissue, and sputum, with serum being the least invasive and hence, most desirable testing matrix [12]. Several serum biomarkers, such as carcinoembryonic antigen (CEA) [13], cytokeratin 19 fragment (Cyfra21-1) [14], tissue polypeptide antigen (TPA) [15], squamous cell carcinoma antigen (SCC) [16], stem cell factor (SCF) [17], granulocytemacrophage colony-stimulating factor (GM-CSF) [18], and vascular endothelial growth factor (VEGF) [19] had been identified to be specifically associated with lung cancer. However, none of these protein biomarkers has achieved the desired sensitivity and specificity to be warranted as an independent diagnostic prospect. In that regard, a number of multianalyte panels comprised of both circulating proteins and tumor-associated autoantibodies have been developed and yielded encouraging results [20,21]. Unfortunately, most of these panels are still in pre-clinical stages and commercial products are not available.
In the current work, our single-center study of a lung cancer biomarker panel found that 4 biomarkers in combination, including pro-surfactant protein B (Pro-SFTPB) [22,23], carbohydrate antigen 125 (CA125) [11], Cyfra21-1 [14], and CEA [13], can dramatically improve the accuracy and sensitivity of lung cancer diagnosis. Specifically, this panel exhibits superior performance for early-stage lung cancer detection than individual biomarker alone in both the training and validation cohorts. Notably, in combination with the nodule size, the detection performance was significantly improved. We hope the results of our study would serve as a promoting factor to facilitate a large-scale multiple-center clinical trial to further characterize the clinical feasibility of using this 4-protein biomarker panel for early diagnosis of lung cancer in China.

Study design
The current study was aimed to evaluate the feasibility and performance of a combination of 4-protein biomarker panel (Pro-SFTPB, CEA, CA125, and CYRFRA21-1) for the lung cancer diagnosis. Two models were established: in one, patients were assessed by the 4-biomarker panel model (biomarker panel model); in the other, patients were assessed by the model combined 4-biomarker panel with the nodule size (nodule risk model). Subsequently, scores derived from the logistic regression analysis in the training cohort were validated in a validation cohort.

Blood samples
Blood samples were collected separately from two branches at the First Affiliated Hospital of Anhui Medical University. For the training cohort, a total of 180 patients with lung cancer and 360 matched controls were enrolled from October 2019 to June 2020. For the validation cohort, a total of 135 patients with lung cancer and 289 healthy controls were enrolled from November 2019 to July 2020. Patients were included if the following criteria were met: (a) no family history of lung cancer or personal history of malignant cancer; (b) not received chemotherapy or radiotherapy; and (c) no currently known extrathoracic malignant diseases. Upon retrieval, serum samples were immediately separated and stored at -80 °C, before further processing. All clinical data, including age, sex, medical history, pathological diagnosis, and imaging findings were collected and independently entered into a secured database. This study conformed to the ethical guidelines of the Declaration of Helsinki and was approved by the Medical Ethics Committee of the First Affiliated Hospital of Anhui Medical University (No. Quick-PJ 2020-13-13).

Serum concentration analysis
To determine the serum concentrations of Pro-SFTPB, CA125, CEA, and Cyfra 21-1, a beadbased Luminex flow fluorescence capture sandwich immunoassay was performed on the MAGPIX® platform (Luminex Corporation, Austin TX). The principle was shown in Fig. 1. Briefly, 2000 coupled beads were incubated with 20 μl of a serum sample for 1 hour, washed, and incubated with 50 μl of the detection antibody (4 μg/ml) for 30 minutes. The beads were washed again and incubated with 50 μl streptavidin-phycoerythrin (SA-PE, 4 μg/ml) for 15 minutes, followed by suspension in 125 μl of assay buffer and examination on the MAGPIX® instrument according to manufacturer's protocol. All experiments were carried out in duplicates, at room temperature, and protected from light. Calibration curves were established using 8 calibrators in a 2-fold dilution series. The coefficients of variation (CVs) within and between replicates were, 4.55% and 7.62% for Pro-SFTPB, 5.68% and 8.54% for CA125, 3.54% and 8.32% for CEA, 5.62% and 8.81% for Cyfra 21-1, respectively.

Nodule assessment
The low-dosage chest CT examination was performed using the multiple contiguous sequential axial imaging procedure through the thorax. The nodule size was analyzed with the PneuView system (Myrian, Paris, France). The maximum dimension on axial CT images was recorded. Histological diagnosis was performed by at least two independent histologists. The clinical staging was determined according to the criteria of the 2004 World Health Organization classification of lung tumors [24], and the TNM staging criteria in the seventh edition of the American Joint Committee on Cancer staging manual [25].

Statistical analysis
In this study, the averaged data were presented as mean ± the standard deviation (SD). Two predictive models, the biomarker panel model, and nodule risk model were initially developed within the training cohort. Then, their predictive performances, including discrimination (ability to classify) and calibration (ability to compare between predicted and observed probabilities) were further evaluated using the validation cohort. Model discrimination was performed using the receiver operating characteristic (ROC) analysis for lung cancer risk, including area under the ROC curve (AUC), sensitivity, and specificity. Statistical analysis was performed using the unpaired two-tail Student t-test (for age, nodule size, and serum level) and chi-square test (for sex and smoking status) on the MedCalc software (Version 19.4.1, MedCalc, USA) and the SPSS statistical software (version 22.0, IBM, USA). P< 0.05 was considered statistically significant.

Subject characteristics
In the training cohort, a total of 180 patients with lung cancer (case group) and 360 controls (control group) were enrolled, including 112 non-smokers in the cases group and 233 non-smokers in the control group. The number of cases identified at TNM stage I& II and stage III & IV were 83 and 85, respectively. No significant difference was found in age and sex between the case and control group. The mean (±SD) nodule size was significantly larger for cases (2.3±2.0 cm) than that of controls (0.4±0.2 cm) (P<0.001). Compared with the cases, the smoking status was significantly different in the controls (P<0.001). The detailed subject characteristics for the training cohort are listed in Table 1.
In the validation cohort, a total of 135 patients with lung cancer and 289 controls were enrolled, including 83 non-smokers in the cases group and 195 non-smokers in the control group. In the case group, there were 55 subjects at early stage (I&II) and 69 subjects at advanced stage (III/IV). No significant difference was found in sex between the case and control group, while the mean (±SD) age of cases was older than that of controls (62.2±10.8 vs. 58.0±7.7, P<0.001). From November 2019 to July 2020, there were not enough older subjects enrolled in the control group. Besides, there was a significant difference in terms of nodule size and smoking status between the case and control group (P<0.001). The detailed subject characteristics for the validation cohort are listed in Table 1.

Serum biomarker levels
A bead-based Luminex flow fluorescence capture sandwich immunoassay was used to detect the serum protein concentrations of Pro-SFTPB, CA125, Cyfra21-1, and CEA in the training cohort and validation cohort. And the detection results were shown in Table 2. It was demonstrated that, in the training cohort, the concentrations of Pro-SFTPB, CA125, Cyfra21-1, and CEA were significantly upregulated in the case group than in the control group (Fig. 2, P<0.001). In lung cancer patients, no correlation was found between 4 biomarker levels and age, sex (data not shown), smoking status, or nodule size ( Table 2). Furthermore, serum levels of the 4 protein biomarkers in advanced stage (stage III&IV) cases were markedly higher than those in the early stage (stage I-II) (Table 2, P<0.001). The level of Pro-SFTPB, CA125, and Cyfra21-1 were higher in patients with squamous cell carcinoma (SCC) than those with adenocarcinoma (ADC) and small cell cancers (SCLC). On the contrary, a lower level of CEA was found in patients with SCC ( Table 2). Similar results were observed in the validation cohort ( Table  2).

Diagnosis performance of biomarker panel model
Serum levels of 4 biomarkers (Pro-SFTPB, CA125, Cyfra21-1, and CEA) were combined and produced a score by the logistic regression algorithm [ Table 3, the logistic regression equation was logit (P) = -3.69+1.45×ln (Pro-SFTPB) +2.28×ln (CA125) +1.38×ln (Cyfra21-1)-0.32×ln (CEA)]. In addition, age, sex, smoking status, and medical history were not significant variables in the logistic regression (data not shown). The sensitivity and specificity for the 4-protein biomarker panel were calculated. In the training cohort, the sensitivity of the 4-protein biomarker panel was 57.8% at 95% specificity and 54.8% at 99% specificity. Similarly, the sensitivity of the 4-biomarker panel was 60.7% at the 95% specificity and 47.4% at the 99% specificity in the validation cohort (Table 4). Furthermore, the performance of individual biomarker, as well as in combination, in discerning lung cancer patients from healthy controls was assessed by the ROC analysis. It was demonstrated that the diagnostic performance of the biomarker panel was AUC=0.88 (95% CI, 0.85-0.91), significantly greater than that of each biomarker alone [Fig. 3A  Moreover, there were a similar AUC magnitude between smokers and nonsmokers in both training and validation cohorts, owing to a lower proportion of smokers in the lung cancer patients (data not shown). Notably, we also assessed the diagnosis performance of the individual biomarker and biomarker panel for the small malignant nodules (maximum diameter of nodule <1 cm). As shown in Fig. 5, the biomarker panel model yielded an AUC of 0.89 (95% CI, 0.82-0.96) with a 70.7% sensitivity at 95% specificity in the training cohort (Fig. 5A), as well as an AUC of 0.91 (95% CI, 0.86-0.97) with a 65.7% sensitivity at 95% specificity in the validation cohort (Fig. 5B), showing the potential candidate markers for distinguishing the small malignant pulmonary nodules from normal individuals.

Diagnosis performance of nodule risk model
The nodule risk model was developed by combining the biomarker panel with the nodule size. Logistic regression algorithm was performed to assess the detection performance of the nodule risk model [ Table 5, logistic regression equation was logit (P) = -10.10+1.12×ln (Pro-SFTPB) +3.60×ln (CA125)+ 0.036×ln (Cyfra21-1)+0.48×ln (CEA) +7.63×(Nodule diameter)]. In the training cohort, the sensitivity of the nodule risk model for detecting lung cancer was 86.4% at the 95% specificity and 73.9% at the 99% specificity, which was validated in the validation cohort ( Table 4). As compared with the biomarker panel model, the nodule risk model had significantly higher diagnostic accuracy for distinguishing lung cancer from normal individuals in either the training cohort [ Fig. 3A

Discussion
In recent decades, lung cancer has quickly become one of the most predominant and malignant cancer forms in China [1,26]. Though tremendous advances have been made in terms of targeted therapy and immunotherapy for treating lung cancer patients, efficient clinical modules of early detection or prevention, which may bear significant social and economic benefits, are substantially lacking [3].
Previous studies had demonstrated that serum biomarkers of Pro-SFTPB, CA125, Cyfra21-1, and CEA performed well on an individual basis in the discrimination of lung cancer from healthy controls [11,22]. Particularly, pro-SFTPB showed excellent performance with high AUC values for discerning various subtypes of lung cancers [22,23]. In a recent study, a serum biomarker panel including Pro-SFTPB, CA125, Cyfra21-1, and CEA was demonstrated to be efficiently predicting the short-term (6~12 months) lung cancer risk in an ever-smoking patient population [11]. Thus, the goal of our study was to determine whether this 4-protein serum biomarker panel may have improved performance, as compared with individual biomarker, for detecting lung cancer in Chinese patients. To do so, two models, namely the biomarker panel model and nodule risk model, were independently developed (in the training cohort) and validated (in the validation cohort) for the diagnosis of lung cancer. Our results showed that the performance of the biomarker panel was significantly better than the individual marker in discerning lung cancer patients from a healthy subject. In addition, the sensitivity, specificity, and AUC were almost consistently higher in the nodule risk model than those in the biomarker panel model, in line with an early study showing additional nodule assessment significantly improved the performance of serum protein biomarker panel to identify high-risk lung cancer patients [5].
The most important finding of this study was that we discovered biomarker panel had significantly better AUC than those of individual markers alone for detecting early-stage cancer patients in both training and validation cohorts, thus implicating a clinical potential for lung cancer early diagnosis. Compared with the previous study based on the western population [11], the diagnosis performance of the 4-protein biomarker panel in our study was slightly better (AUC=0.88 in our study vs. AUC=0.80 in the previous study), especially for detecting the early-stage cancer patients (AUC of 0.78 in our study vs. AUC=0.68 in the previous study). It is worth noting that, in our study, early-stage cancer patients represented nearly half of all cases evaluated. Though we do not believe it is the factor to confound the conclusion of this study, future clinical trials with a larger number of participants and heterogeneous sample populations would further refine and develop the 4-protein biomarker panel to be suitable for early lung cancer diagnosis.
In addition to the excellent discrimination for identifying early-stage lung cancer patients, 4-protein biomarker panel was demonstrated to be effectively diagnosing the small malignant pulmonary nodules from normal individuals (AUC=0.89 in the training cohort and AUC=0.90 in the validation cohort). A recent study assessed the performance of the 4-protein biomarker panel in distinguishing benign from malignant pulmonary nodules [27]. The performance of the 4-protein biomarker panel for the small nodules (nodule size<1cm) (AUC=0.70) was lower than that in our study (AUC=0.89 in the training cohort and AUC=0.90 in the validation cohort). However, in a small subset (contained 12 cases and 15 controls with nodule size ≤ 6 mm), the 4-protein biomarker panel performed well, with an AUC of 0.95 [27]. As our sample size of small nodules subgroup (36 cases in the training cohort and 30 cases in the validation cohort) was not large enough, the findings in the current study should be further validated in a larger population.
In comparison to a previous study showing 4 protein biomarkers that can predict short-term lung cancer risk [11], the major difference in our study was that we did not find any individual protein biomarker or prediction model to specifically identify lung cancer patients among current or ever smokers. A major contributing factor, as we noted, maybe that non-smoking lung cancer patients comprised a great proportion (> 60%) in our training and validation sets. However, based on our analysis of individual biomarker and biomarker panel for smoking history, the altered distribution does not appear to account for the difference in performance. Currently, a large-scale lung cancer screening study, based on a 4-protein biomarker panel model, has been launched and we are expending the enrollment to allow us to more precisely analyze the smoking and non-smoking participants within various lung cancer sub-categories. We expect it to lead us to identify more efficient early diagnosis modules to benefit both smoking and non-smoking lung cancer patients.