SOX4 as biomarker in hepatitis B virus-associated hepatocellular carcinoma

Background: Hepatitis B virus infection is associated with liver disease, including cancers. In this study, we assessed the power of sex-determining region Y (SRY)-related high-mobility group (HMG)-box 4(SOX4) gene to predict the clinical course of hepatocellular carcinoma (HCC). Methods: To evaluate the differential expression of SOX4 and its diagnostic and prognostic potential in HCC, we analyzed the GSE14520 dataset. Stratified analysis and joint-effect analysis were done using SOX4 and clinical factor. We then designed a nomogram for predicting the clinical course of HCC. Differential SOX4 expression and its correlation with tumor stage as well as its diagnostic and prognostic value were analyzed on the oncomine and GEPIA websites. Gene set enrichment analysis was explored as well as candidate gene ontology and metabolic pathways modulated by in SOX4 HCC. Results: Our analysis revealed that the level of SOX4 was significantly upregulated in tumor issue (P <0.001). This observation was validated through oncomine dataset and MERAV analysis (all P <0.05). Diagnostic receiver operating characteristic (ROC) analysis of SOX4 suggested it has diagnostic potential in HCC (GSE14520 dataset: P <0.001, area under curve (AUC) = 0.782; Oncomine: (Wurmbach dataset) P = 0.002, AUC = 0.831 and (Mas dataset) P <0.001, AUC = 0.947). In addition, SOX4 exhibited high correlation with overall survival of HBV-associated HCC (adjusted P = 0.004, hazard ratio (HR) (95% confidence interval (CI)) = 2.055 (1.261-3.349) and recurrence-free survival (adjusted P = 0.008, HR (95% CI) = 1.721 (1.151-2.574). These observations which were verified by GEPIA analysis for overall survival (P = 0.007) and recurrence-free survival (P= 0.096). Gene enrichment analysis revealed that affected processes included lymphocyte differentiation, pancreatic endocrine pathways, and insulin signaling pathway. SOX4 prognostic value was evaluated using nomogram analysis for HCC 1, 3, and 5-year, survival. Conclusion: Differential SOX4 expression presents an avenue of diagnosing and predicting clinical course of HCC. In HCC, SOX4 may affect TP53 metabolic processes, lymphocyte differentiation and the insulin signaling pathway.


Introduction
Cancers affecting liver tissues have been on the rise, making liver cancer the fourth cause of deaths and sixth most prevalent cancer globally [1]. Specifically, hepatocellular carcinoma is the most common type of primary liver cancers. Liver cancer is estimated to be the fourth commonest cancer among Chinese males [2]. Majority of liver cancers have been associated hepatitis B virus (HBV) infection. While advances in diagnostic and treatment strategies have improved HCC clinical outcomes, its 5-year survival remains low (<15%) [3]. Early detection and more effective management of liver cancer is therefore Ivyspring International Publisher necessary. While some HCC prognostic biomarkers have been recommended, such as α-fetoprotein (AFP) [4] and PIVKA-II [5], HCC survival is still poor. Better understanding of the mechanisms of HCC development and progression, as well as the identification of novel prognostic biomarkers is needed.
Sex-determining region Y (SRY)-related high-mobility group (HMG)-box(SOX) genes are evolutionarily conserved and are thought to be regulate in cell fate determination during development [6]. During embryogenesis, this family of genes participates in the development of neuronal tissue, nervous system and as well as skeletal tissue [6]. SOX4 comprises three domains -a serine-rich region, a glycine-rich region and an HMG box [7]. This gene participates in tumorigenesis and progression. It has also been verified that SOX4 regulates lymphocyte differentiation and development, and drive endocardial ridge development [8]. It is emerging that SOX4 is markedly upregulated in various human cancers, including breast cancer [9,10], colorectal cancer [11], gastric cancer [12] and HCC [13,14]. SOX4 expression has also been associated with prognosis of some cancer types [7,15]. Lack of SOX4 expression in normal adult liver does not affect normal liver function [16]. Nevertheless, the exact role of SOX4 gene along the clinical course of HCC is yet to be fully uncovered. This study evaluated the prognostic and diagnostic value of SOX4 in HBV-associated HCC.

Bioinformatic analysis and SOX4 diagnostic potential
In order to investigate the biological function and pathways associated with SOX4, we performed a gene ontology (GO) term analysis of SOX4 using the bionetwork gene ontology tool (BinGO) in Cytoscape_ version3.4.0. GeneMANIA (http://www.genemania. org/, accessed December 17, 2017) [20,21] and STRING (https://string-db.org/, accessed December 17, 2017) [22,23] were used to investigate SOX4 genegene and protein-protein interactions, respectively. In order to explore the diagnostic value of the SOX4, we used T-test analysis to compare its expression in tumor vs non-tumor tissues in the GSE14520 dataset and then conducted a receiver operating characteristic (ROC) analysis. Diagnostic value was considered statistically significant when P<0.05 and area under curve (AUC) >0.7.

Survival analysis, joint-effect and stratified analysis
For survival analysis, patient data were divided into 2 categories on the basis of median SOX4 mRNA expression. RFS (recurrence-free survival) and OS (overall survival) were estimated using Cox proportional hazards regression and Kaplan-Meier models. Clinical factors found to be statistically significant were adjusted for survival analysis and joint effects survival analysis for SOX4. SOX4 gene was then combined with AFP for survival analysis. Furthermore, SOX4 gene expression was subclassified to effectively perform analyses for clinical factors. Next, factors found to be significant were included in multivariate analysis.

Nomogram construction of SOX4 and prognosis-related clinical factors
Nomogram analysis was used to 1, 3-, and 5-year OS and RFS. The nomogram was constructed using prognosis-related clinical factors and SOX4 expression. Different factors and genes had different expression scores.

Statistical analysis
Data were analyzed using SPSS version 24.0 (IBM corporation, Armonk, NY, USA) and R 3.6.0. The log-rank P and median survival time (MST) were determined using Kaplan-Meier method. The hazard ratio (HR) and 95% confidence interval (CI) were estimated using univariate and multivariate Cox proportional hazards regression models. Differential SOX4 expression between tumor and non-tumor tissue was analyzed by T-test. P < 0.05 was considered significant.

Differential expressions and diagnostic analysis
Our analysis of the GSE14520 and MERAV dataset revealed elevated SOX4 expression in HCC tissues relative to normal tissue (P <0.001, Figure  1A-B). The ROC analysis of SOX4 in the GSE14520 dataset, HBV-related HCC cohort indicated that SOX4 had a high accuracy of distinguishing tumor tissues from adjacent non-tumor liver tissues (P <0.001, AUC of the ROC curves = 0.782; Figure 1C).

Survival analysis of SOX4 in OS and RFS
In order to avoid the batch effect of microarray data, only the dataset of Affymetrix HT Human Genome U133A Array of GSE14520 was included in the current study. Because most of the patients in GSE14520 were HBV-related HCC, we excluded those patients without HBV infection reports and survival information. As a result, there were 212 HBV-related HCC patients were included in the current study, and all of the 212 HBV-related HCC patients and had prognosis information. Our Figure 2). Similar results were obtained from multivariate OS analysis (adjusted P = 0.004, HR = 2.055, 95% CI = 1.261-3.349; Table 1, Figure 2). Univariate analysis of RFS revealed that SOX4 expression significantly correlates with survival (crude P = 0.001, HR = 1.896, 95% CI = 1.307-2.750;  Figure 2) and similar results were obtained by multivariate RFS analysis (adjusted P = 0.008, HR = 1.721, 95% CI=1.151-2.574; Table 1, Figure 2).

Stratified analysis and joint-effect analysis
Stratified analysis of how SOX4 influences OS and RFS indicated age (≤60), being male and single nodular significantly correlate with HCC OS (P = 0.024, 0.005 and 0.013, respectively; Figure 3; Table 2). An age of >60 years, tumor size >5 cm, single nodular and AFP >300 ng/mL were associated with a longer RFS relative to others (P = 0.025, 0.019, 0.012 and 0.007 respectively; Figure 4; Table 2).
Analysis of survival on the GSE14520 cohort revealed that SOX4 expression is significantly associated with HCC OS and RFS. Previous studies have reported that AFP is associated with the HCC diagnosis and prognosis. We therefore investigated the combined role of SOX4 and AFP expression on HCC OS and RFS. Analysis of the GSE14520 cohort indicated that the risk of death and recurrence was significantly higher in patients exhibiting high AFP and SOX4 expression when compared to those with low ( Figure 2; Table 3).

Prognostic nomogram for survival prediction
Next, we constructed a nomogram for OS based on the following clinical features: BCLC stage, cirrhosis, serum AFP level, tumor size and SOX4 expression. The following clinical features were used to construct a nomogram for RFS: BCLC stage, cirrhosis, gender and SOX4 expression. The nomograms may enable individualized prognosis prediction. Nomogram analysis was performed for the probabilities of 1-, 3-and 5-year OS ( Figure 7) and RFS ( Figure 8). These analyses revealed that SOX4 expression levels were correlated with the patients' clinical prognosis.

Bioinformatics analysis of SOX4 gene
Go term analysis indicated that SOX4 gene is involved in the modulation primary alcohol metabolic processes, fatty acid beta oxidation, lipid oxidation, cellular respiration, alpha amino acid metabolic process, small molecule biosynthetic process, organelle inner membrane, mitochondrial matrix and microbody ( Figure 5). KEGG functional analysis indicated that the SOX4 gene is involved in various signaling pathways, including insulin and adipocytokine signaling etc. (Figure 6). Detailed representations of the GSEA results are shown in Figure 5 and 6. The visualized interactions of GO terms were constructed using BinGO (Figure 11). This analysis revealed that SOX4 may be involved WNT signaling, lymphocyte differentiation and pancreatic endocrine development. Analysis of gene-gene interaction found that SOX4 is associated with TP53 etc. (Figure 1D) while analysis of protein-protein interaction found that SOX4 is associated with CTNNB1 and TP53 etc. (Figure 1E).

Analysis of correlation between SOX4 expression and tumor stage
Analysis of the GSE14520 dataset for SOX4 expression at various BCLC stages revealed significantly elevated expression in each BCLC stage P <0.001, Figure 9A), but least expressed in the BCLC stage C. Next, we combined BCLC stage 0 and A to constitute the early-stage cancer category and BCLC stage B and C to constitute the advanced-stage cancer category. Interestingly, there was significance lower in former one. Similar results were obtained by GEPIA analysis (P = 0.00373; Figure 10C).

Differential expression, diagnostic and prognostic validation analysis
Next, we analyzed SOX4 expression in the Wurmbach and Mas liver datasets and found markedly elevated SOX4 mRNA levels in tumor tissue in relative to normal tissue (P = 0.003, <0.001, respectively; Figure 9D, F). Moreover, the potential diagnostic value of SOX4 expression was revealed by ROC analysis of these two databases (AUC = 0.831, 0.947 respectively; P = 0.002, <0.001, respectively; Figure 9C, E). SOX4 expression was also found to be significantly upregulated in tumor tissue following GEPIA analysis ( Figure 10A and B). Analysis of the possible impact of SOX4 expression on survival indicated that patients with low SOX4 expression levels in the GEPIA analysis, exhibit longer OS relative to those with high expression (P = 0.007, Figure 10D). Similar results were obtained for RFS (P = 0.096, Figure 10E), although this was not statistically significant. In addition, differences in SOX4 gene at various stages of HCC were statistically significant (P = 0.004; Figure 10C).

Discussion
Here, we assessed the relationship between SOX4 levels and various parameters of HBVrelated HCC. Results reveal that SOX4 gene possesses significant value for HCC diagnosis, a finding that is in agreement with previous reports (Wurmbach E et al. and Mas VR et al.) [26,27]. In addition, we find that low SOX4 expression correlates with better HCC prognosis. Next, we carried out joint-effect and stratified analyses of the value of SOX4 as a prognostic indicator in HCC. GSEA analysis indicated that SOX4 positively modulates primary alcohol metabolic process, fatty acid beta oxidation, lipid oxidation, cellular respiration, small molecule biosynthetic process, alpha amino acid metabolic process, organelle inner membrane, mitochondrial matrix and microbody.
The SOX4 gene belongs to group C SOX transcription factors [30]. The products of these genes consist of three domains: a serine-rich region (SRR, aa 333-397), which encodes a protein of 474 amino acids (aa), a glycine-rich region (aa 152-227) and an HMG box (aa 57-135) [30,31]. The HMG box possessed DNA binding, which has been take part in various developmental processes through its transcriptional activity, while SRR domain acts as a deactivation domain. Glycine-rich central region (CD), located between the SRR region and the HMG box is a recently identified functional region that promotes apoptosis [15,31].  The SOX4 gene modulates tumor development and growth, epithelial-mesenchymal transition and metastasis [14,[32][33][34]. Furthermore, SOX4 drives several components of the RNAi machinery, transcriptional regulators, and cellular proteins [35][36][37]. Thus, SOX4 is a momentous transcription factor that regulates various cellular functions.
Multiple studies have reported the action of SOX4 as an oncogene in solid tumors [7,38,39]. It has been reported that SOX4 is upregulated in various malignancies, including HCC, pancreatic cancer, bladder carcinoma, prostate cancer, breast cancer, colorectal cancer, gastric cancer and melanoma [11,12,14,32,[40][41][42][43][44], raising the potential of this gene as a diagnostic marker. The Mas and Wurmbch liver cancer datasets have reported that SOX4 is highly expressed in hepatitis C virus-associated HCC [26,27], which is consistent with our results. Moreover, various reports suggest that SOX4 can aid in predicting marker in some cancer types. High SOX4 expression has been associated with poor prognosis in prostate cancer, gastric cancer, colorectal cancer, breast cancer and HCC [11,12,14,32,45]. On the contrary, low expression has been associated with better prognosis in bladder carcinoma and melanoma [41,43].
Majority of HCCs are attributable to HBV infection [46,47]. Shang et al. reported that HBV increases expression of SOX4 gene by upregulating transcription factor YY1 via the mitogen-activated protein kinase pathway, epigenetically suppressing miR-203, miR-335, and miR-129-2 by protecting SOX4 from HBsAg mediated degradation [48]. On the other hand, SOX4 has been shown to promote HBV replication by stimulating viral DNA replication and protein expression in liver cancer cells [49]. As a consequence, SOX4 interacts with HBV and synergistically promotes the occurrence and development of HCC. It was initially found that SOX4 acted as a transcription factor that drive B and T lymphocyte differentiation [8,50] [58].
Tribbles homolog3 is a pseudo-kinase that disrupts the insulin signaling pathway in the liver by binding to Protein Kinase B and blocking its activation [59,60]. However, none has verified the link between TP53, insulin signaling and SOX4. Based on our findings, we hypothesis that SOX4 may modulate TP53 activity and insulin signaling pathway. However, the mechanism still needs further investigated.
Consistent with the aforementioned studies, our data show that SOX4 might influence WNT signaling, lymphocyte differentiation and TP53 activity.
Herein, we report that SOX4 is elevated in HCC with BCLC stage B+C than with BCLC stage 0+A. The OS and RFS nomograms indicated that SOX4 is associated with HCC prognosis. Previous studies have shown that SOX4 expression is upregulated in breast cancer [9] and promotes HCC metastases [14], suggesting it might lead to poor metastasis-free survival. It has been reported that SOX4 contributes to hepatocarcinogenesis and its expression can reflect the clinical course of HCC after surgical resection [15]. This study is limited by the small sample size, consisting of 212 HBV-associated liver cancer cases. Future studies should utilize a larger sample size. Our analysis was limited to HBV-associated HCC. It is necessary to explore the diagnostic and prognostic value of SOX4 all the HCCs, irrespective of HBV status. Since the data of the two cohorts in this study are from public databases, there is no additional validation cohort. This study still needs to be independently verified in an additional cohort. Relative to past studies, this study only assessed the relationship between SOX4 RNA levels and HCC clinical course. Thus, further investigation is advocated to provide better understanding.

Conclusions
This study found that SOX4 expression is significantly upregulated in HCC tumor tissues. Our data indicate that this gene has potential value in HCC diagnosis. Further survival analysis of SOX4 gene in two cohorts suggests that it significantly correlates with HCC OS and RFS. Bioinformatics analysis suggested that SOX4 may affect HCC prognosis by modulating TP53 activity, lymphocyte differentiation, pancreatic endocrine development and insulin signaling.