Diagnostic and prognostic biomarkers of Human Leukocyte Antigen complex for hepatitis B virus-related hepatocellular carcinoma

Background: Hepatitis B virus infection had been identified its relationship with liver diseases, including liver tumors. We aimed to explore diagnostic and prognostic values between the Human Leukocyte Antigen (HLA) complex and hepatocellular carcinoma (HCC). Methods: We used the GSE14520 dataset to explore diagnostic and prognostic significance between HLA complex and HCC. A nomogram was constructed to predict survival probability of HCC prognosis. Gene set enrichment analysis was explored using gene ontologies and metabolic pathways. Validation of prognostic values of the HLA complex was performed in the Kaplan-Meier Plotter website. Results: We found that HLA-C showed the diagnostic value (P <0.0001, area under curve: 0.784, sensitivity: 93.14%, specificity: 62.26%). In addition, HLA-DQA1 and HLA-F showed prognostic values for overall survival, and HLA-A, HLA-C, HLA-DPA1 and HLA-DQA1 showed prognostic values for recurrence-free survival (all P ≤ 0.05, elevated 0.927, 0.992, 1.023, 0.918, 0.937 multiples compared to non-tumor tissues, respectively). Gene set enrichment analysis found that they were involved in antigen processing and toll like receptor signalling pathway, etc. The nomogram was evaluated for survival probability of HCC prognosis. Validation analysis indicated that HLA-C, HLA-DPA1, HLA-E, HLA-F and HLA-G were associated with HCC prognosis of overall survival (all P ≤ 0.05, elevated 0.988 and 0.997 multiples compared to non-tumor tissues, respectively). Conclusion: HLA-C might be a diagnostic and prognostic biomarker for HCC. HLA-DPA1 and HLA-F might be prognostic biomarkers for HCC.


Introduction
In less developed countries, liver cancer was the second leading cause of cancer-related deaths worldwide in the male population [1]. It was estimated that roughly 782,500 cases of new liver cancer and 745,500 deaths occurred in 2012, with China accounting for 50% of all the new cases and deaths [1]. Hepatocellular carcinoma (HCC) accounted for 70% to 90% of primary liver cancers worldwide [2]. Aetiologically, many factors, such as dietary aflatoxin exposure, alcohol consumption [3], hepatitis B virus (HBV) infection, hepatitis C virus infection, diabetes mellitus, obesity [4] and cirrhosis [5], had been reported to be associated with the development of HCC. Meanwhile, many treatments had been applied, such as radical hepatectomy, liver transplantation, percutaneous ethanol injection, radiofrequency ablation and transarterial chemoembolisation [5]. Even with these advances, the prognosis of HCC patients remained poor, with the 5-year survival rate less than 15% [6,7]. Currently, the diagnosis of HCC had relied on α-fetoprotein (AFP) levels. However, the sensitivity and specificity of AFP Ivyspring International Publisher levels were not sufficient for HCC diagnosis, as patients with cirrhosis and chronic hepatitis could show elevated AFP levels [8]. Therefore, it was of significance to identify new biomarkers for the early diagnosis of HCC.
Human Leukocyte Antigen (HLA) complex, ~4Mb and on chromosome 6 p21 in humans, is HLA plays a key role in antigen presentation to T cells and the basic formation of host defence mechanisms against pathogens [9,10]. HLA, encoding major histocompatibility complex (MHC), is also important in vaccine development and has a determining role in transplantation outcomes [11]. Members of this complex have been investigated for disease initiation and progression. A good clinical outcome is associated with high-solution HLA-matching in haematopoietic stem cell transplantation [12,13]. HAL-B Bw4-80lle, combined with the KIR3DS1 gene, can significantly affect outcomes of chronic hepatitis B patients who were treated with alpha interferon [14]. It had been documented that HLA-G, a nonclassical HLA class I molecule, positivity was related to the disease in breast cancer, renal cell carcinoma, lung cancer and malignant melanoma, also indicating differential expressions in lobular and ductal subtypes [15]. HLA-G played a pivotal function in maternal-foetal tolerance during pregnancy [16], and aberrant expression of HLA-G was observed in multiple malignant cell types, which might be related to the procedure of escape host immunosurveillance [17]. HLA-DRB1*01 had been observed to be associated with hepatic hypersensitivity reactions [18]. Given previous studies on members of the HLA complex with tumours, we, therefore, conducted an investigation that aimed to find relationship a between the HLA complex and HCC.

Data collection
Profiling data of the GSE14520 dataset was obtained from the Gene Expression Omnibus (GEO, https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?a cc=gse14520, accessed May 5, 2018) website. This dataset contains two platforms: GPL571 (Affymetrix Human Genome U133A 2.0 Array) and GPL3921 (Affymetrix HT Human Genome U133A Array) [19,20]. Only GPL3921 data were used in our study to avoid batch effect. Patients with HBV infection were included in the present study.

Association and interaction analysis
Pearson correlation analysis among HLA family genes was performed using R version 3.5.0 (https://www.r-project.org/). A co-expression interactive network of gene-gene was constructed using the geneMANIA plugin of Cytoscape software version 3.6.0 [25,26]. A protein-protein interaction (PPI) network was constructed using STRING (https://string-db.org/cgi/input.pl, accessed May 8, 2018) website [27]. Visualised enrichment analysis of GO was conducted using the BiNGO plugin of Cytoscape software version 3.6.0 [28].

Diagnostic and survival analysis
Overall survival (OS) and recurrence-free survival (RFS) were calculated using Kaplan-Meier and Cox proportional hazards regression models. Gene expressions were categorised into low expression and high expression groups at a cut-off of median expression level. Statistically significant factors were adjusted for survival analysis. OS and RFS-related genes were further analysed for joint-analysis. Validation of prognostic values of HLA family genes were further conducted in the Kaplan-Meier Plotter (http://kmplot.com/analysis/, accessed May 12, 2018) website [29].

Expression model and nomogram construction
To further explore prognosis-related genes in both univariate and multivariate analyses for HCC survival, we further constructed expression models for OS and RFS prediction. Nomograms were constructed using clinical factors and gene expressions. Different factors and gene expressions showed different points. Taken together, total points can predict HCC patient probability of survival at 1 year, 3 years and 5 years.

Stratified and joint-effect analysis
Prognosis (OS and RFS)-related genes were further stratified for analysis in clinical factors. Genes related to OS and RFS in the multivariate analysis were further stratified for analysis with demographic and clinical factors. In addition, prognosis (OS and RFS)-related genes were joined for combination analysis. Expressions that indicated a good prognosis were conferred a score of 1, whereas bad prognoses were conferred a score of 0.

Statistical analysis
Survival analyses were performed using SPSS software version 16.0 (IBM, Chicago, IL). Median survival time (MST) and log-rank P were calculated by Kaplan-Meier method, as well as 95% confidence interval (CI) and hazard ratio (HR) were calculated by univariate and multivariate Cox proportional hazards regression models. Box plots and survival plots were obtained using GraphPad software version 7.0. P value ≤ 0.05 was statistically significant.

Demographic and clinical characteristics of HCC patients
A total of 212 HBV-related HCC patients were included in the present study. Tumour size, cirrhosis, AFP and BCLC stage showed significance in OS (P = 0.002, 0.041, 0.049 and < 0.0001, respectively). Gender, cirrhosis and BCLC stage showed significance in RFS (P = 0.002, 0.036, and < 0.0001, respectively). Other factors did not show significance (all P > 0.05, Supplementary Table 1). Transcriptional analysis indicated that all the members consistently showed higher transcripts per millions in tumour tissues compared with normal tissues (Figure 2).

Joint-effect analysis and stratified analysis
Expressions that indicated a good prognosis were conferred a score of 1, whereas bad prognosis was conferred a score of 0. In the joint-effect analysis of OS, the 2 scores group exhibited significant P value compared with the 0 score group (adjusted P = 0.001, Table 3). In the joint-effect analysis of RFS, the 2 scores group exhibited significant P value compared with the 0 score group (adjusted P = 0.001, Table 3). Groups of 2 scores, 3 scores and 4 scores showed significance compared with the 0 score group (adjusted P = 0.005, 0.012 and < 0.001, respectively).
In the stratification of HLA-DQA1 for OS analysis, high expression showed significance in males, age ≤ 60 years, active viral replication-chronic carriers of HBV, cirrhosis, single nodular, low AFP levels (≤ 300 ng/ml) and A stage of the BCLC system compared with low expression ( Table 4). In the stratification of HLA-A for RFS analysis, high expression showed significance in males, age > 60 years, chronic HBV carriers, tumour size > 5 cm, cirrhosis, low AFP levels (≤ 300 ng/ml) and C stage of the BCLC system compared with low expression. Detailed results of the stratified analysis were shown in Table 4 and 5.

Expression model and nomogram construction
Expression models were constructed for OS and RFS prognosis in Figure 6 and 7, respectively.
Expressions, survival statuses and heatmaps were shown in Figure 6A and 7A. Prognostic receiver operating characteristic (ROC) curves were shown in Figure 6B and 7B (P = 0.041 and 0.021, respectively). AUCs at 1 year, 3 years and 5 years were 0.862, 0.942 and 0.993, respectively, in the OS risk score model and 0.511, 0.533 and 0.568, respectively, in the RFS risk score model.
In addition, clinical factors and prognosis-related genes were further constructed in nomograms. High expression levels always led to low points. The same points indicated a highest probability of survival at 1 year and a lowest probability of survival at 5 years. Survival probability at 3 years was seated in the middle (Figure 8).

GSEA analysis
GSEA results of the OS-related gene HLA-F indicated that GO and pathways were involved in positive regulation of the immune response, leukocyte cell-cell adhesion, chemokine signalling pathway and focal adhesion (

Interaction and co-expression networks and enrichment analysis
Comparison between low and high levels of expression were shown in Figure 11A and B.
Significant P values were exhibited in all HLA family members (all P ≤ 0.05). Matrices showed Pearson correlations among HLA members ( Figure 11C). Co-expression interaction and PPI networks showed relationships among HLA members (Figure 11D and E).
The top 10 GO terms and KEGG pathways were exhibited in Figure 12. Detailed GO terms and KEGG pathways were shown in Supplementary Table 1

Validation of prognostic values of HLA family members
Prognostic values of HLA family members were further validated in the whole population. HLA-C, HLA-DPA1, HLA-E, HLA-F and HLA-G showed significance in OS (all P ≤ 0.05, Figure 13C, H, P-R). However, other genes did not show significance (all P > 0.05, Figure 13).

Discussion
In the present study, we conducted an investigation on the relationships between the HLA complex and HBV-related HCC patients. We found that members of the HLA complex, The MHC complex, also known as the HLA in humans, consists of more than 200 genes on chromosome 6 and can be categorised into three groups: class I, class II and class III [30]. Class I, which is characterised by CD8+ T cells, is composed of three genes: HLA-A, HLA-B and HLA-C [30]. Class II, which is characterised by CD4+ T cells, is composed of six main genes: HLA-DPA1, HLA-DPB1, HLA-DQA1, HLA-DQB1, HLA-DRA and HLA-DRB1 [30]. HLA genes were documented as numerous and highly polymorphic in order to bind many kinds of peptides originating from self or foreign antigens [30]. More than 1500 alleles of the HLA-B gene had been identified [31]. Variants of the HLA complex played a pivotal role in determining the susceptibility to autoimmune diseases and infections [32]. Meanwhile, they were crucial in the field of transplantation surgery, where HLA-matching compatibility was a precondition in the donors and recipients [32]. An association between the presence of HLA-B*57:01 alleles and abacavir hypersensitivity was observed in Australian and British cohorts [33,34]. Genome-wide association studies showed that the HAL-A *31:01 allele had a strong association with carbamazepine-induced hypersensitivity in Northern Europeans, Japanese and Koreans (OR = 25.93, 10.8 and 7.3, respectively) [35][36][37][38].  Alterations of amino acids bringing in structural and functional dissimilarities between HLA-DPB1 alleles revealed a strong median impact of alloreactive responses to these molecules [39]. It had been observed that a high frequency of the HLA-DRB1*15:01-DRB5*01:01-DQB1*06:02 haplotype in patients with amoxicillin clavulanate-induced drug-induced liver injury compared with normal healthy controls (57.1% (case) versus 11.7% (controls), P < 10 -6 ). A study focusing on HLA-DQB1 and HLD-DRB1 in the Tunisian population revealed the involvement of rs6457617 locus as a risk variant for susceptibility/severity to rheumatoid arthritis and highlighted gene-gene interaction between the two genes [40]. Recently, many researches had been performed to investigate the relationships among initiation and progression of tumours and the HLA family. Single nucleotide polymorphisms of HLA-A and amino acid variants were associated with nasopharyngeal carcinoma in Malaysian Chinese [41]. HLA-B-associated transcript 3 polymorphisms were suggested as risk factors for lung cancer in a meta-analysis [42]. An association was observed between an increase in HLA-C1/KIR2DL2 and HLA-C1/KIRDL3 pairs and invasive cervical cancer patients at high-risk from human papillomavirus infection [43]. Hu et al. reported, for the first time, that genetic variants in the HLA-DP/DQ loci might be marker polymorphisms for both HBV infection and risk of developing HCC [44]. HLA-DPB1 polymorphisms increased the risk for cervical squamous cell carcinoma in Taiwanese women [45].  HLA-DQA1 gene copy number polymorphism was associated with gastric cancer susceptibility in the Chinese population [46]. Rs17879599 in the second exon of the HLA-DRB1 gene had been suggested as an independent leading contributor to HCC risk in Han Chinese [47]. Expression of HLA-E and HLA-G were found differently upregulated in HCC tissues [48]. However, our present study found that HLA-E and HLA-G were found differently upregulated in non-HCC tissues, and a statistically significant difference was shown only in HLA-E.   To date, the HLA family had been widely explored with regard to their prognostic values in multiple tumours. However, associations between the HLA family and HCC had not been fully explored until now. We, for the first time, explored diagnostic and prognostic values among the HLA complex and HCC. HBV infection had widely been recognized as a risk factor for HCC. Two HLA-DRB1-DQB1 haplotypes, such as DRB1*15:02-DQB1*06:01 and DRB1*13:02-DQB1*06:04, and three DPB1 alleles, such as DPB1*02:01, *04:02, and *05:01, were found associations with chronic HBV infection in Japanese population [49].      [51]. This finding is consistent with our present study of association between HLA-DQA1 and HCC.
Hyon-Suk Kim et al. found that serum level of soluble HLA-G (sHLA-G) was correlated with the progression of HBV infection [52]. In addition, AUC of sHLA-G for distinguishing HCC from liver cirrhosis was higher than that of AFP and would be a diagnostic biomarker for HCC [52]. Moreover, they indicated that sHLA-G should not to be considered as severity of HBV infections and HCC but rather reflects phases of diseases including HBV-related HCC and concluded that increased sHLA-G expression could be one of the immune escape mechanisms of both HBV infection and HCC [52]. Nonetheless, our present study did not find clinical significance of HLA-G for HCC. This contradiction might be attributed to the difference of study population and ethnicity.
There were some limitations in the present study that need to be recognised. First, larger population cohorts were warranted to further validate our findings. Second, multivariate analyses were needed to generalise our results, a HBV-related HCC cohort. Third, functional trials were needed in future studies to explore the properties of prognosis-related genes in tumour proliferation, metastasis and angiogenesis.

Conclusions
In the present study, we conducted investigations for associations between the HLA complex and HCC. We found that some genes had diagnostic values for HCC. Among them, HLA-C was the most diagnostic biomarker for HCC. In addition, HLA-DQA1 and HLA-F had prognostic values for OS, whereas HLA-A, HLA-C, HLA-DPA1 and HLA-DQA1 had prognostic values for RFS. GSEA found they were involved in positive regulation of the immune response, antigen processing, chemokine signalling pathway and toll like receptor signalling pathway. Risk score models and nomograms were used to evaluate values of prognosis-related genes for HCC. Further validation in the Kaplan-Meier Plotter website indicated that HLA-C, HLA-DPA1, HLA-E, HLA-F and HLA-G were associated with HCC prognosis in OS. Therefore, we concluded that HLA-C, HLA-DPA1 and HLA-F expression were associated with HCC prognosis, and HLA-A and HLA-DQA1 gene expression were associated with prognosis of HBV-related HCC. HLA-C might be a diagnostic biomarker for HCC.    Supplementary figures and table. http://www.jcancer.org/v10p5173s1.pdf