Construction and Validation of Protein Expression-related Prognostic Models in Clear Cell Renal Cell Carcinoma

Objective: To construct a prognostic evaluation model for clear cell renal cell carcinoma (ccRCC) patients using bioinformatics method and to screen potential drugs for ccRCC. Methods: ccRCC RNA sequencing data, clinical data, and protein expression data were downloaded from the TCGA database. Univariate Cox and Lasso regression analyses were performed on the combined data to screen out the proteins related to the prognosis, and they were included in a multivariate Cox proportional hazard model. The patients were divided into high and low-risk groups for a survival difference analysis. The predictive power of the model was evaluated on the basis of overall survival, progression-free survival, independent prognostic, clinically relevant receiver operating characteristic (ROC) curve, C-index, principal component, and clinical data statistics analyses. GSEA enrichment and immune function correlation analyses were performed. The samples were divided into different subtypes based on the expression of the risk proteins, and survival analysis of the subtypes was performed. The risk-related protein and RNA sequencing data were analyzed to screen out sensitive drugs with significant differences between the high and low-risk groups. Results: A total of 469 ccRCC-related proteins were screened, of which 13 proteins with independent prognostic significance were screened by univariate Cox, Lasso, and multivariate Cox regression analyses to construct the prognostic model. The sensitivity and accuracy of the model in predicting the survival of patients with ccRCC were high (1 year: 0.811, 3 years: 0.783, 5 years: 0.777). The 13 proteins were closely related to immunity, and the model proteins were different between kidney and tumor tissues according to the HPA database. The samples were divided into three subtypes, and there were obvious clinical characteristics of the three subtypes in the grade and T, N and M stages. According to the IC50 values, CGP-60474, vinorelbine, doxorubicin, etoposide, FTI-277, JQ12, OSU-03012, pyrimethamine, and other drugs were more sensitive in the high-risk group. Conclusions: A prognostic model of protein expression in ccRCC was successfully constructed, which had good predictive ability for the prognosis of ccRCC patients. The ccRCC-related proteins in the model can be used as targets for studying the pathogenesis and targeted therapy.


Introduction
Renal cell carcinoma (RCC), also known as renal adenocarcinoma or renal carcinoma, originates from the urinary tubular epithelial system of the renal parenchyma accounts for 80-90% of malignant renal tumors [1].Clear cell renal cell carcinoma (ccRCC) accounts for about 70-80% of renal cancers.More patients with renal cell carcinoma have been detected clinically with continuous improvement in the Ivyspring International Publisher diagnosis, increasing awareness of medical care, and the gradual extension of the national average life expectancy [2,3].The incidence of renal cell carcinoma in China has shown a gradual upward trend [4].
The main treatment for ccRCC is surgery [5].However, because the symptoms of early ccRCC are not obvious at the time of the diagnosis, and the disease is already in the advanced stage or has metastasized, the opportunity for surgery is missed [6][7][8].The effect of radiotherapy and chemotherapy is poor for these patients.Molecular targeted therapy can significantly improve the objective response rate of patients with metastatic ccRCC, prolong progression-free survival (PFS) and overall survival (OS), and significantly prolong the life of patients compared with traditional cytokine therapy [9][10][11].However, the response time to treatment and survival benefits vary greatly among patients [8].
With the development of sequencing technology, many studies have shown that although a variety of mRNA, miRNA, lncRNA, ceRNA and other genetic markers have a good predictive ability on the prognosis of ccRCC, there is still a lack of specific and sensitive biomarkers for diagnosis and treatment.Protein expression plays a key role in different stages of tumorigenesis, but no protein prognostic model has been studied in ccRCC.Protein-level research is more advantageous than RNA-level research for clinical application; therefore, this study proposed using protein levels to analyze ccRCC.To provide important reference data for the accuracy of clinical diagnosis, treatment and prognosis of ccRCC, a prognosis model of multiple protein-bound prognosis will be constructed from the perspective of protein.

Data download and sample collation
ccRCC RNA sequencing, clinical, and protein expression data were downloaded from the TCGA database (https://portal.gdc.cancer.gov).The control and patient clinical data included age, gender, survival time and survival state, tumor classification, and T, N, and M stages.Perl software was used for data collation, identifying cases with complete clinicopathological information of the clincal samples, converting the IDs of the RNA sequencing data.

Construction of the prognostic model
The limma and impute packages in R-studio software (The R Foundation for Statistical Computing, Vienna, Austria) were used to merge the protein expression data with the clinical data (survival time and survival status).The survival, caret, glmnet, survminer and timeROC packages were used to perform the univariate Cox regression analysis (filtering criteria: P < 0.05) on the combined data to screen for proteins related to the prognosis of patients with ccRCC.Lasso regression analysis was performed to reduce the overfitting of the data and to screen the key proteins.The Lasso regression used crossvalidation to select the parameters, and the Lasso regression coefficient spectrum was drawn.Finally, multivariate Cox regression analysis was performed to establish the risk protein model for the prognosis of ccRCC, which was displayed in the form of a nomogram.The risk protein model was constructed based on the multivariate Cox regression analysis, and the risk score equation was: Risk score = ∑coefficienti × EXP (protein)I.The data were randomly divided into two groups (training and test groups), and the group samples were divided into a high-or low-risk group according to the median risk score, respectively.

Evaluation and clinical value of the prognostic model
R-studio software was used for the statistical analysis of the training group and the test group.The dplyr, ggplot2, and ggrepel packages were used to analyze the results of the univariate Cox analysis (defined significance: P < 0.05, HR > 1 was high risk, HR < 1 was low risk), and a volcano plot was drawn.The corrplot, circlize, ggalluvial, ggplot2, and dplyr packages were used for the co-expression analysis of the risk protein data to clarify the correlation between the proteins in the model and the co-expression relationship between the proteins in the model and other proteins (Cor value was set to 0.4).Principal component analysis (PCA) analysis was performed using the limma and scatterplot3d packages to verify whether the proteins involved in constructing the model could distinguish patients in the high and low-risk groups.The survival and survminer packages were used to analyze OS and PFS of the risk protein data.According to the expression levels of the proteins in the model, the samples were divided into high and low-expressing groups, and the survival analysis of the proteins in the model was performed.The survival, survminer, and timeROC packages were used to perform the independent prognostic analysis of the risk protein and the clinically relevant data, and a receiver operating characteristic (ROC) curve analysis was performed.The dplyr, survival, rms, and pec packages were used to perform the C-index analysis of the risk gene data and the clinically relevant data.The regplot, survival, and rms packages were used to draw a nomogram between the risk protein data and the clinically relevant data to predict the survival of patients with ccRCC through the nomogram.The survival, survminer, limma, and ggpubr packages were used to verify the clinical grouping model and the clinical correlation analysis of the risk protein data, and the clinically relevant data were used to verify whether the constructed model was suitable for patients in the different clinical groups.The proteins and risk scores in the model were analyzed to detect significant differences between the clinical groups.

Enrichment analysis of the risk proteins
The limma, clusterProfiler, org.Hs.eg.db, and enrichplot packages were used to perform the GO enrichment and KEGG analyses on the RNA sequencing and risk protein data, respectively.The results were visualized, and the top five enriched pathways with high and low risk were plotted.

Immune-related functional analysis of the risk proteins
The limma package in R-studio software was used to sort the RNA sequencing data and CIBERSORT was run to obtain the immune cell infiltration results.The ggpubr and limma packages were used to analyze the differences in immune cell infiltration and the risk protein data.Furthermore, a radar map was drawn using the fmsb package.The National Library of Medicine (https://www.ncbi.nlm.nih.gov) was accessed to identify the standard names of the proteins in the model, and the model proteins were analyzed by immunohistochemistry using the HPA database.RNA sequencing data uploaded to TIDE database (http://tide.dfci.harvard.edu/),get the TIDE score, and then through limma, ggpubr package to analyze risk protein data, clear TIDE score between high-risk and low-risk groups if there is a difference.

Survival analysis and clinical correlation analysis of the sample types
The limma and ConsensusClusterPlus packages were used to classify the samples according to the expression of the model proteins, and the samples were divided into different subtypes.The survival and survminer packages were used to analyze the survival of the subtypes to determine whether there was a difference in survival time between the subtypes.Finally, the ggplot2 package was used to analyze the clinical correlation between the subtypes and the clinical data (age, gender, grade, and T, N, and M stages).

Screening of potential drugs for ccRCC
The limma, ggpubr, pRRophetic, and ggplot2 packages were used to analyze the RNA sequencing and risk protein data, and the drugs with significant differences between the high and low-risk groups were screened.The screening criterion was P < 0.001.

Construction of the prognostic model
R-studio software was used to combine the protein data related to ccRCC with the clinical data (survival time and survival status).First, 178 proteins related to the prognosis of patients with ccRCC were initially screened by univariate Cox regression analysis (P < 0.05), and Lasso regression analysis was used to reduce overfitting of the data.Nineteen proteins that were more valuable for the prognosis of ccRCC patients were screened (Fig. 1 and Supplementary Table S1).Finally, 13 proteins valuable for the prognosis were screened out through the multivariate Cox regression analysis (Table 1), and the prognostic model was constructed according to the risk scores of the 13 proteins.The samples were randomly divided into the training and test groups, and the samples of these groups were further divided into high and low-risk groups according to the median value of the risk score formula:

Evaluation and clinical value of the prognostic models
R-studio software was used for the clinical statistical analysis of the training and test groups.The results revealed no significant differences in any of the clinical traits between the training group and the test group, indicating no deviation in the clinical traits when the samples were randomly grouped (P > 0.05) (Table 2).Co-expression analysis was performed and a co-expression circle diagram was drawn to determine the correlation between each protein in the model.The correlation between the model protein and other proteins was determined (Fig. 2 and Supplementary Table S2).PCA analysis discriminated between the high and low-risk groups of the constructed model proteins, indicating that patients in the high and low-risk groups could be distinguished by the proteins in the model (Fig. 3).The OS and PFS analyses of the risk protein data showed significant differences in OS and PFS between the high and low-risk groups in the training and test groups.The OS and PFS of patients in the high-risk group were shorter than those in the low-risk group (Fig. 4).The grouping of the high and low-risk groups (the median of the risk score) was visualized with a risk curve by analyzing the risk protein data of the training and test groups.The survival state and heatmaps revealed that the death cases also increased with increased risk of the patients.MITF, IGFBP2, ACC1, RRM2, PEA15, and 4EBP1_pT37T46 were high-risk proteins in the training and test groups.SHP2_Py542, AMPKALPHA_pT172, P70S6K_pT389, MAPK_ pT202Y204, UGT1A, BRAF_pS445, and Vinculin were identified as low-risk proteins (Fig. 5).The univariate and multivariate Cox analyses suggested that the constructed model could be used as an independent prognostic factor for the prognosis of patients with ccRCC, independent of other clinical traits (P < 0.001 for the risk score) (Fig. 6).The ROC curve showed that the sensitivity and accuracy of the model in predicting the survival of patients with ccRCC were high (AUC at 1 year: 0.811, AUC at 3 years: 0.783, AUC at 5 years: 0.777).The ROC curve combined with the clinically relevant data indicated that the constructed model predicted the survival of patients (AUC = 0.811) better than age, gender, grade, or stage (Fig. 7A, B).The constructed model predicted the survival of patients with high accuracy according to the C-index curve (Fig. 7C).The calibration plot showed that the predicted probability was consistent with the actual probability.The risk protein data were combined with the clinically relevant data to draw a survival nomogram for patients with ccRCC.The nomogram was used to score the patient's risk and predict survival rates at 1, 3, and 5 years.(Fig. 8A, B).The risk protein and clinically relevant data indicated significant differences in survival time between the high and low-risk groups for age > 65, age ≤ 65, gender, G1-2, G3-4, Stage I-II, Stage III-IV, T1-2, T3-4, M0, M1, and N0.This result shows that the constructed model was suitable for predicting survival in the clinical groups (Fig. 9).

Enrichment analysis of the risk proteins
GO and KEGG enrichment analyses were performed on the RNA sequencing and risk protein data, and the enriched pathways in the high and low-risk groups were analyzed.The GO analysis showed that the risk proteins in the high-risk group were mainly enriched in GBP-humoral immune responses mediated by circulating immune factors, GBP-immunoglobulin production, GOBP-phagocytosis recognition, the GOCC-immunoglobulin complex, GOMF-antigen binding, and other related pathways.The risk proteins in the low-risk group were mainly enriched in the GOBP-spliceosomal snRNP assembly, the GOBP-spliceosomal TRI snRNP complex assembly, the GOCC-SM-like protein family complex, the GOCC-spliceosomal snRNP complex, the GOCCspliceosomal TRI snRNP complex, and other related pathways.KEGG enrichment analysis showed that the risk proteins in the high-risk group were mainly enriched in cytokine-cytokine receptor interactions, hematopoietic cell lineage, the nod-like receptor signaling pathway, the p53 signaling pathway, and primary immunodeficiency and other related pathways.The risk proteins in the low-risk group were mainly enriched in fatty acid metabolism, propanoate metabolism, reclamation of bicarbonate in the proximal tubule, pyruvate metabolism, valine leucine and isoleucine degradation, and other related pathways (Fig. 10).

Immune-related functional analysis of the risk proteins
The National Library of Medicine (https:// www.ncbi.nlm.nih.gov) was accessed to identify the standard names of the proteins in the model, and the model proteins were analyzed by immunohistochemistry using the HPA database (Table 1).MITF expression was moderately positive in normal renal tissues (no glomerular cells were detected, and moderate staining was found in the renal collecting duct and distal tubules), but weakly positive in tumor tissues using the CAB002578 antibody.PTPN11 was moderately positive in most normal renal tissues according to CAB005377 staining (25-75% of glomerular and tubular cells were moderately stained) and was moderately positive in > 75% of the tumor tissues.PRKAA1 expression was moderately positive in normal renal tissues (moderate in renal tubular cells) and weakly positive in tumor tissues according to CAB005050 staining.PRKAA2 was strongly positive in most renal tissues (highly positive in renal tubular cells), according to HPA044540 staining, but weakly positive in most tumor tissues.Acetyl-coenzyme A carboxylase α (ACACA) was strongly positive in most normal renal tissues after staining with the HPA063018 antibody (highly positive in renal tubular cells) but moderately positive in most tumor tissues.Ribosomal protein S6 kinase B1 (RPS6KB1) was weakly or moderately positive in normal renal tissues according to CAB018346 staining but moderately or strongly positive in tumor tissues.Ribonucleotide reductase adjust the M2 (RRM2) was not detected in normal or tumor tissues by HPA056994 antibody staining, but survival analysis of this gene from the HPA database indicated that the 5-year survival rate of patients with high RRM2 expression (39%) was significantly lower than that of patients with low RRM2 expression (77%).Proliferation and apoptosis adaptor protein 15 (PEA15) expression was weakly positive in normal renal tissues but moderately or strongly positive in tumor tissues according to HPA070820 staining.However, a survival analysis from the HPA database showed that the 5-year survival rate of patients with high RRM2 expression was significantly longer than that of patients with low RRM2 expression (P = 0.013).MAPK3 was weakly positive in most normal renal tissues according to CAB002683 staining but moderately positive in most tumor tissues.UGT1A6 was strongly positive in normal renal tissues (renal tubular cells were highly stained) but weakly or moderately positive in tumor tissues after CAB009819 staining.BRAF was strongly positive in most normal renal tissues according to CAB004552 staining (renal tubular cells were highly stained) and weakly or moderately positive in most tumor tissues.EIF4EBP1 was moderately positive in most normal renal tissues according to CAB005032 staining but strongly positive in most tumor tissues.VCL was moderately positive in normal renal tissues according to HOA002131 staining (both glomerular and tubular cells were moderately positive) but weakly positive in tumor tissues.The IGFBP2 survival analysis in the HPA database revealed that the 5-year survival rate of patients in the high-expression group (55%) was significantly lower than those in the low-expression group (72%) (Fig. 11).
There were significant differences in the content of T cells regulatory (P < 0.001), monocytes (P < 0.05), macrophages M2 (P < 0.001), T cells follicular helper (P < 0.05) and mast cells resting (P < 0.01) between the high and low expression groups.Among them, the content of regulatory T cells and follicular helper T cells was higher in the high-expression group than in the low-expression group.Monocytes, resting mast cells, and M2 macrophages were higher in the low-expression group than in the high-expression group (Fig. 12A).Immunotherapy differed between the high and low-risk groups (P < 0.001).The TIDE score of the high-risk group was higher, and the immunotherapeutic effect was worse than that of the low-risk group considering the greater potential of immune escape in the high-risk group (Fig. 12B).

Survival and clinical correlation analyses of the sample types
The PAM algorithm was used for unsupervised clustering of the expression levels of the risk proteins.The matrix segmentation effect was good when K = 3, noise interference was low, and the data were divided into three subtypes.The correlation analysis of each subtype and the clinical data (age, gender, grade, and T, N and M stage) showed differences in the clinical characteristics among the three subtypes in grade and T, N and M stage, among which cluster3 subtype showed a high proportion in grade, stage and TNM stage.This result indicates that cluster3 subtype tumors have stronger invasive and proliferative abilities.A significant difference in survival time was detected between the subtypes.The survival time of the cluster3 subtype was shorter than that of the cluster2 subtype, and the survival time of the cluster2 subtype was shorter than that of the cluster1 subtype (P < 0.001).With the progression of the tumor, the survival time of the patients was shorter, and the prognosis was worse, indicating that the classification results were consistent with the clinical evidence and the classification results were closely related to the prognosis of the patients (Fig. 13).

Screening of potential drugs for renal clear cell carcinoma
The gene expression and risk protein data were used to screen potential drugs, and the screening condition was P < 0.001.According to the IC50 value, CGP-60474, vinorelbine, doxorubicin, etoposide, FTI-277, JQ12, OSU-03012, pyrimethamine, and other drugs had significantly different effects in the high and low-risk groups, as the high-risk group was more sensitive (Fig. 14).

Discussion
Renal clear cell carcinoma is one of the most common cancers of the urinary system, and its diagnosis and treatment have a definite curative effect in the clinical work.However, recurrence and progression may occur despite surgical treatment, so exploring new ccRCC biomarkers and screening high-risk groups, early for individualized treatment of this class of people, and screen potential drugs [12].To improve the survival rate of renal clear cell carcinoma is necessary.Scholars have identified kidney cancer biomarkers, such as nicotinamide N-methyl transferase, serum amyloid protein, thymidine phosphorylase, and other biomarkers that can be used to diagnose renal cancer [13][14][15][16].Biomarkers, such as the S100 family proteins and heat shock proteins, were reported to predict the development, staging, treatment, and prognosis of renal cell carcinoma [17,18].However, few studies have developed ccRCC-related protein prognostic models.Therefore, a ccRCC-related protein prognostic model was established based on the TCGA database to predict the prognosis and identify new biomarkers for individualized treatment of high-risk populations.
In this study, 13 proteins with independent prognostic significance were screened by univariate Cox regression, Lasso regression, and multivariate Cox regression analyses to construct the prognostic model.The protein encoded by MITF is a transcription factor that contains basic helix-loophelix and leucine zipper structural features involved in the lineage-specific regulation of melanocytes, osteoclasts, and mast cells.Recent studies have shown that MITF promotes cell growth, migration, and invasion of ccRCC by activating the RhoA/YAP signaling pathway [19].PTPN11 encodes a protein that is a member of the protein tyrosine phosphatase (PTP) family.PTPs are signaling molecules that regulate various cellular processes, including cell growth, differentiation, the mitotic cycle, and oncogenic transformation.Studies have reported that a mutation in this gene is the cause of Noonan syndrome and acute myeloid leukemia [20,21].In addition, PTPN11 is hypomethylated in patients with gastric cancer, and PTPN11 hypomethylation may lead to the upregulation of PTPN11 transcripts.The correlation between PTPN11 hypomethylation and the incidence of gastric cancer may be specific to male patients, alcoholic patients, patients with poorly differentiated tumors, and patients with TNM stage III+IV.PTPN11 hypomethylation is a biomarker for recurrence in gastric cancer patients aged ≤ 60 years [22].IGFBP2 is a protein-coding gene that promotes tumor development by inducing alternative polarization of macrophages through the STAT3 pathway in pancreatic ductal adenocarcinoma [23].It also upregulates ZEB through the NF-κB signaling pathway to promote the progression of hepatocellular carcinoma [24].The protein encoded by PRKAA1 (AMP-activated catalytic subunit α1) and PRKAA2 belong to the Ser/Thr protein kinase family and plays a key role in regulating cellular energy metabolism through phosphorylation.Studies have shown that PRKAA1 increases proliferation and inhibits apoptosis of gastric cancer cells by activating the JNK1 and Akt pathways [25].In addition, cyclic CPM promotes chemoresistance in gastric cancer by activating PRKAA2-mediated autophagy [26].ACACA is a protein-coding gene expressed at higher levels in advanced prostate cancer patients than in lower-grade patients.After ACACA knockdown, the proliferation ability of tumor cells decreases, and the downregulation of ACACA prevents the malignant progression of prostate cancer by inhibiting mitochondrial potential [27].BCL2-related protein A1 (BCL2A1) is a member of the BCL-2 protein family, and its related pathways include apoptosis, autophagy, and ALK signaling in cancer.Studies have shown that BCL2A1 expression is closely related to the occurrence and development of cancers, such as colon cancer, ovarian cancer, and breast cancer [28,29].High expression of RPS6KB1 in tumor tissues indicates a poor prognosis with poor survival in esophageal cancer patients [30].RRM2 has been reported in many types of cancer and is associated with the development of tumors.A study showed that RRM2 maintains glutathione synthesis in liver cancer cells and plays a role in the resistance to iron die, and through the stable ANXA1 and activation of AKT pathway to regulate kidney to chougny sensitivity for blocking and PD-1 [31,32].PEA15 is a 15 kDa multifunctional phosphoprotein involved in various biological processes, such as the proliferation and apoptosis of cancer cells.Studies have shown that microrNA212-regulated PEA15 promotes the progression of ovarian cancer by inhibiting cell apoptosis [33].The proteins encoded by MAPK1 and MAPK3 are members of the MAP kinase family, and serine/threonine kinases are important components of the MAP kinase signal transduction pathway.MAPK1/ERK2 and MAPK3/ERK1 are two MAPKs that play important roles in the MAPK/ERK cascade and are also involved in the signaling cascade initiated by activated KIT and KITLG/SCF.Phosphorylation of ULK1 by MAPK1/ERK2-MAPK3/ERK1 kinases triggers an interaction with BTRC and subsequent K48-linked ubiquitination and proteasomal degradation, while accumulation of damaged, reactive oxygen species-producing mitochondria leads to activation of the NLRP3 inflammasome.
Thus, abnormal soluble cytokine secretion is induced, which, in turn, promotes the differentiation and maturation of osteoclasts, eventually leading to bone metastasis [34].The UGT1A gene family plays important roles in pharmacology and toxicology, leading to differences in drug disposition.Some studies have reported that the differentially expressed UGT1A gene family functions in pancreatic cancer tissues are mainly related to the glucuronylation pathway, cytokine-cytokine receptor interactions, and the ILK signaling pathway.The UGT1A1/3/8/9/10 expression level is positively correlated with the activity of tumor-infiltrating immune cells, particularly B cells.UGT1A6/9 expression is negatively correlated with the level of macrophage infiltration [35].BRAF belongs to the family of RAF serine/threonine protein kinases, and proteins that regulate the MAP kinase/ERK signaling pathway play a role in affecting cell division, differentiation, secretion, and gene mutations.The most common mutation is the V600E mutation, which is most often found in melanoma and a variety of other cancers, including brain tumors, colorectal cancer, and other tumor diseases [36][37][38].VCL is a cytoskeletal protein associated with cell-cell and cell-matrix junctions.It is thought to be one of several interacting proteins involved in anchoring F-actin to the membrane.It has been proposed that VCL-ALK RCC developed in a 14-year-old girl with the sickle cell trait in ALK-rearranged renal cell carcinoma.Moreover, VCL-ALK RCCS differs from non-VCL-ALK RCCS in that solid structures, and cytoplasmic vacuoles are significantly more frequent in VCL-ALK RCCS than in non-VCL-ALK RCCS [39].Therefore, the model proteins were closely related to the tumor, and survival analysis was carried out on the TCGA database.The ccRCC samples showed that the OS and PFS of the low-risk group were superior to those in the high-risk group, as seen from the survival state and the risk score chart.The number of deaths increased with the risk value, and the risk score was higher.According to the ROC curve and calibration chart, the model had high sensitivity and accuracy for predicting the prognosis of ccRCC, which will provide a potential direction for clinical research.
The GO enrichment analysis revealed that most pathways in the high-risk group were related to immunity, whereas the KEGG enrichment analysis showed that risk proteins in the high-risk group were mainly enriched in cytokine-cytokine receptor interactions, hematopoietic cell lineage, the nod-like receptor signaling pathway, the p53 signaling pathway, primary immunodeficiency, and other related pathways.The model protein was significantly associated with various immune cells, such as regulatory T cells, T cell follicular helper monocytes, resting mast cells, and M2 macrophages.These results show a potential correlation between the model proteins and immune infiltration, and the expression of the model proteins was significantly different between renal tumor tissues and normal tissues according to the immunohistochemical analysis.The TIDE score indicated that the high-risk group had greater potential for immune escape and a poorer immunotherapeutic effect than the low-risk group.Three ccRCC subtypes were identified according to the expression levels of the risk proteins, and the subtype classification was significantly correlated with the clinical prognosis of the patients.
Most of the proteins included in the prognostic risk model were related to the proliferation or death of tumor cells, and some proteins played important roles in the pathogenesis of renal cell carcinoma.This is consistent with the fact that this study is based on ccRCC tumor cell-related risk proteins, which may provide targets and molecular markers for ccRCC-targeted precision therapy in the future.Although the prognostic model must be verified in clinical trials, the prognostic model based on ccRCC-related risk proteins predicted the survival of ccRCC patients more sensitively and accurately than traditional pathological staging.The high-risk ccRCC population can be screened more accurately using this model, and the sensitivity of anti-tumor drugs can be screened by the IC50 value to provide an important reference for individualized treatment, and to provide a research direction and theoretical basis for subsequent clinical and experimental research.Some of the previous related studies mainly focused on identifying one single gene as a therapeutic target in ccRCC.For instance, Lin et al. described the potential significance of NUDT1 as a prognostic biomarker and therapeutic target in ccRCC [40] and Miao et al. reported that HSD11B2 could serve as a potential biomarker and therapeutic target for ccRCC metastasis [41].Comparatively, for studies that reported on a panel of genes/proteins, most were limited in terms of either reporting only on a few genes/proteins, assessed only one type of survival (i.e., overall survival only), focused on a singles aspect of ccRCC (i.e., metabolism, ferroptosis, cuproptosis, etc.), or did not thoroughly established potentially significant treatments that could be used to treat the different risk groups of patients their model could stratify.For instance, Tang et [46], Zhao et al. described their finding on 3 metabolic genes that were used to build a risk score model [47], and Peng et al. proposed a 3-gene methylation signature that could be used as a risk stratification tool to predict patient's outcomes and treatment response [48].Compared these previous literature, to our existing knowledge, this is the first study to thoroughly investigate a panel of 13 proteins in ccRCC based on which we established an algorithm able to differentiate high-risk from low-risk patients, predict their 1-, 3-and 5-year OS, perform immunerelated functional analysis to investigate the potential effect of immunotherapeutics in these 2 groups of patients and screen for potential drugs that could have be effective in the high and low-risk groups.
Despite the interesting findings reported in this study, there were some limitations that should be clarified.First, this study was based on bioinformatics analysis retrieved from the online TCGA database, and despite validating the results in a test set, the results should be externally validated.Second, in vivo and in vitro experiments should be performed to validate the actual significance of these potentially promising proteins reported in this study.Third, the survival analysis of this study should be validated in immunohistochemistry studies from clinical ccRCC patient tissues.

Conclusion
In conclusion, we report the potential clinical usefulness of 13 ccRCC-related proteins (MITF, SHP2_pY542, IGFBP2, AMPKALPHA_pT172, ACC1, P70S6K_pT389, RRM2, PEA15, MAPK_pT202Y204, UGT1A, BRAF_pS445, 4EBP1_pT37T46 and Vinculin) that could be used as a guidance to classify patients into high-and low-risk groups, predict their clinical outcomes and strategize individualized treatments.Although further validations are required to confirm the clinical impact of these proteins and our findings, this study provides a referential basis for improving the outcomes of ccRCC patients.

Figure 1 .Figure 2 .
Figure 1.Screening for proteins with greater prognostic value in patients with ccRCC.(Lasso regression analysis)

Figure 3 .
Figure 3. PCA analysis of all proteins associate to ccRCC (A), PCA analysis of model proteins (B).

Figure 4 .Figure 5 .Figure 6 .Figure 7 .Figure 8 .
Figure 4. OS and PFS analysis between high and low risk groups in Train group and Test group.(A1: OS analysis of all proteins associate to ccRCC, B1: OS analysis of Test group, C1: OS analysis of Train group, A2: PFS analysis of all proteins associate to ccRCC, B2: PFS analysis of Test group, C2: PFS analysis of Train group)

Figure 9 .
Figure 9. Correlation analysis between risk proteins and clinical relevant data

Figure 12 .
Figure 12.Immune-related function analysis of risk proteins.(A: Analysis of immune cell differences, B: Analysis of TIDE immunotherapy)

Figure 13 .
Figure 13.Survival and clinical correlation analyses of the sample types.

Figure 14 .
Figure 14.Screening of potential drugs for ccRCC (partial examples)

Table 1 .
Multivariate Cox analysis of proteins associated with ccRCC and the standard names of the proteins

Table 2 .
Clinical characteristics of the study groups.