Proteomic profiling reveals a signature for optimizing prognostic prediction in Colon Cancer

Previous studies developed prognostic signatures largely depended on transcriptome profiles. The purpose of our present study was to develop a proteomic signature to optimize the evaluation of prognosis of colon cancer patients. The proteomic data of colon cancer patient cohorts were downloaded from The Cancer Proteome Atlas (TCPA). Patients were randomized 3:2 to train set and internal validation set. Univariate Cox regression and lasso Cox regression analysis were performed to identify the prognostic proteins. A four-protein signature was developed to divide patients into a high-risk group and low-risk group with significantly different survival outcomes in both train set and internal validation set. Time-dependent receiver-operating characteristic at 1 year demonstrated that the proteomic signature presented more prognostic accuracy [area under curve (AUC = 0.704)] than the American Joint Commission on Cancer tumor-node-metastasis (AJCC-TNM) staging system (AUC = 0.681) in entire set. In conclusion, we developed a proteomic signature which can improve prognostic accuracy of patients with colon cancer and optimize the therapeutic and follow-up strategies.


Introduction
Colon cancer is one of the most common malignancies worldwide [1]. Radical surgery alone or combined with adjuvant chemotherapy is the standard regimen for management of colon cancer without distant metastasis. However, about 25%-40% patients will suffer recurrence and metastasis after receiving standardized treatment [2]. The early detection and management of relapse contribute to improve prognosis. To date, prognostic prediction is largely depended on the tumor, lymph node, metastasis (TNM) staging system [3]. On this basis, novel prognostic models have been developed to improve prognostic prediction and optimize the therapeutic and follow-up strategies using clinicopathologic and genetic factors [4,5]. Notably, with the advance of genome-sequencing technologies, gene signatures at mRNA level presented an excellent prediction of colon cancer prognosis [6][7][8]. However, only limited studies developed signatures at the level of protein to guide patients' prognostic stratification [9,10].
Mass spectrometry-based proteomics can detect global protein abundance and post-translational modifications and provide comprehensive biological perspectives, which could not be replaced by genomic analysis alone [11,12]. In the present study, we identified robust prognostic proteins and constructed a proteomic classifier in colon cancer using The Cancer Proteome Atlas (TCPA) database. To our best knowledge, the proteomic signatures have not been reported in colon cancer previously.

Materials and Methods
The proteomic data (Level 4) of colon cancer patient cohorts (COAD) were downloaded from The Cancer Proteome Atlas (TCPA) (https://www.

Development and validation of a proteomic signature
Firstly, patients were randomized 3:2 to train series and internal validation series. Univariate Cox regression analysis was performed to identify the biomarkers for prognosis in train set. Proteins with significant differences (p<0.05) were selected for LASSO Cox regression model. Finally, a multivariate Cox regression analysis was conducted to construct a multi-protein-based classifier for prognostic prediction of colon cancer patients. According to specific risk score formula, patients were divided into high-risk and low-risk groups with significantly different survival outcomes using the median value of the train series as the cutoff point.

Statistical analysis
Kaplan-Meier curve was depicted to compare survival differences between high-risk group and low-risk group. Multivariate Cox regression analysis was conducted to identify independent prognostic factors. Receiver-operating characteristic (ROC) curve was plotted to evaluate the prognostic or predictive accuracy of the proteomic signature and clinicopathological factors. All statistical analyses were performed with R (version 3.6.1, https://www. r-project.org/).

Development of a proteomic signature from the train set
A total of 315 colon cancer patients with complete proteomic profiling and survival data were included in our study. Patients were randomized 3:2 to train set and internal validation set. Twenty-five robust prognostic proteins were identified using univariate Cox regression analysis ( Figure 1A). Lasso Cox regression and stepwise multivariate Cox regression were performed to construct a proteomic signature. Finally, a four-protein signature was developed and forest plot was presented ( Figure 1B). The risk score = (0.834071775625132 × expression level of EGFR) + (0.471960975428313 × expression level of IGFBP2) + (-0.810781951083818 × expression level of SRC) + (-0.563796255046605 × expression level of SRC_pY527). Kaplan-Meier curves were plotted for each protein using the median value of the protein as the cutoff point. High expression of EGFR or IGFBP2 was associated with poor prognosis while high expression of SRC or SRC_pY527 predicted superior survival in colon cancer ( Figure 2).

The prognostic value of the four-protein signature in train set and internal validation set
Patients were divided into a low-risk group and high-risk group using the median risk score as the cutoff value. Patients with higher risk scores had a worse prognosis as compared to those with lower risk scores in train set, internal validation set and entire set ( Figure 3). Stratified analysis revealed that the four-protein signature still had prognostic values in stage I+II, stage III+IV, lymph node positive and lymph node negative subgroups ( Figure 4). The distribution of the proteomic risk score, the survival status of patients and heatmap of the proteomic expression profiles were also presented (Figure S1, S2 and S3).

Independence and accuracy of the proteomic signature in predicting prognosis
Multivariate analysis showed that our proteomic signature remained an independent prognostic factor in entire set (Table 1). Clinicopathological characteristics of colon cancer patients in TCPA database were detailed in Table S1. Additionally, we performed ROC analysis to compare the sensitivity and specificity of prognostic prediction among proteomic signature, single protein, age, gender, T stage, N stage and AJCC stage. Time-dependent receiver-operating characteristic at 1 year demonstrated that the proteomic signature presented more prognostic accuracy [area under curve (AUC = 0.704)] than the American Joint Commission on Cancer tumor-node-metastasis (AJCC-TNM) staging system (AUC = 0.681) in entire set ( Figure 5A). The AUC values in train and internal validation set were also presented and compared ( Figure 5B, 5C).

Constructing protein co-expression networks
To identify what proteins were significantly associated with the expression of EGFR, IGFBP2, SRC and SRC_pY527 (R>0.2 or R<-0.2, p<0.05), sankey diagram was plotted ( Figure 6). EGFR had more co-expressive proteins than the others.

Discussion
Integrated transcriptome profiling of colon cancer has increased our knowledge of molecular features relevant to carcinogenesis. A number of studies developed and validated multigene signatures to predict prognosis based on global mRNAs profiling. Recently, global proteomic data which provided novel insights into the comprehensive understanding of cancers, have become focus of attention [13]. Compared with a single proteomic biomarker, the combination of the prognostic proteins may have better predictive efficacy.
In this study, we established a novel proteomic signature (including EGFR, IGFBP2, SRC and SRC_pY527) for prognostic prediction of colon cancer using TCPA database. The survival curves revealed a significant separation between low-risk and high-risk patients in both training set and internal validation set. Stratified by AJCC stage and lymph node status, the proteomic signature remained an excellent prognostic model. Time-dependent ROC at 1 year demonstrated that our signature had the most significant accuracy in predicting prognosis as compared to other indicators, indicating that the risk model developed from the four proteins could be a useful tool for colon cancer survival prediction.
Epidermal growth factor receptor (EGFR), a member of the subclass I of the receptor tyrosine kinase super-family, is overexpressed in 49% to 82% of colorectal cancer [14][15][16]. EGFR is one of the most promising targets for the management of metastatic colorectal cancer. However, EGFR testing for colorectal cancer patients has no predictive value of response to EGFR inhibitors [17,18]. The RAS/RAF/ MAPK pathway is downstream of EGFR. Evidence indicated that RAS and BRAF status had predictive value of response to cetuximab or panitumumab therapy [19][20][21].
Insulin-like growth factor-binding protein 2 (IGFBP2) is a member of the IGFBP family that bind IGFs with high affinity [22]. Previous studies suggested that IGFBP2 expression was upregulated in multiple tumors [23][24][25]. Recently, Liu et al reported that IGFBP2 promoted vasculogenic mimicry formation via targeting CD144 and MMP2 expression in glioma [26]. Gao et al indicated that IGFBP2 could drive epithelial-mesenchymal transition and invasive character by activating the NF-κB pathway in pancreatic ductal adenocarcinoma [27]. Nevertheless, the function and mechanisms of IGFBP2 in colon cancer remains unclear. Our study revealed that high expression IGFBP2 indicated poor prognosis.
Proto-oncogene tyrosine-protein kinase Src contains an SH3 domain, an SH2 domain, a proteintyrosine kinase domain, and a regulatory tail and participates in multiple biological processes [28]. Src could be activated by upstream signaling pathways to form phospho-Src (p-Src), and p-Src could activate downstream signaling pathways by phosphorylating the target proteins [29,30]. Recent studies suggested that Src family kinases were involved in carcinogenesis. Hu et al demonstrated that the expression of Src and p-Src was significantly upregulated in osteochondroma and could be used as robust indicators to predict prognosis [31]. Singh et al found that Src and p-Src could promote colon cancer invasion and metastasis [32]. Intriguingly, our study indicated that high expression of SRC and SRC_pY527 was associated with superior prognosis in colon cancer patients. Further precise regulation mechanisms of SRC and SRC_pY527 in colon cancer are needed to be explored.
Finally, we also identified several proteins which significantly associated with the expression of EGFR, IGFBP2, SRC and SRC_pY527. These proteins should also be further explored. Several limitations are needed to address in our present study. Firstly, only over two hundred proteins were identified in TCPA database, the information about a lot of critical proteins were missing. Secondly, the lack of external validation resulted in limited clinical value of our signature. Further research regarding its external validation and clinical utility are needed. Lastly, molecular biology experiments are necessary for clarifying the underlying molecular mechanism of our proteomic signature.
In conclusion, our study established a novel proteomic signature for improving prognostic prediction in colon cancer, which may assist to develop individual therapeutic and follow-up strategies.

Supplementary Material
Supplementary figures. http://www.jcancer.org/v12p2199s1.pdf design, data collection, and analysis, decision to publish, or preparation of the manuscript.

Ethics approval and consent to participate
This study was approved by the Ethical Committee and Institutional Review Board of the FUSCC. This study was performed based on the publicly available TCPA/TCGA database. We did not use personal identifying information. The informed consent was not necessary for this study.

Author Contribution Statements
XL, YM and RW contributed to conception and design. ZS and DL improved the study design and contributed to the interpretation of results. YM collected the data. SC performed data processing and statistical analysis. DL and ZS wrote the manuscript. YM revised the manuscript. All authors approved the final version.

Data Statement
The datasets included in our current study are available in the TCGA-COAD (https://portal.gdc. cancer.gov) and The Cancer Proteome Atlas (https://tcpaportal.org/).