A Practical Model is Equivalent to the BALAD or BALAD-2 Score in Predicting Long-term Survival after Hepatectomy in Chinese Patients with Hepatocellular Carcinoma

Aim: To evaluate the predictive value of the BALAD and BALAD-2 scores on long-term survival after hepatectomy in Chinese hepatocellular carcinoma (HCC) patients and to attempt to establish a more practical or effective model. Methods: A total of 251 HCC patients underwent hepatectomy were recruited. The BALAD and BALAD-2 scores were calculated with total bilirubin, albumin, alpha-fetoprotein, Lens culinaris agglutinin-reactive fraction of alpha-fetoprotein and des-gamma-carboxyprothrombin. The associations of the two scores and their components with the overall survival were analyzed. Finally, three prediction models were explored and constructed. Results: We observed that HCC patients had 5-year survival rates that worsened with increasement of BALAD and BALAD-2 scores. The BALAD and BALAD-2 scores demonstrated fine value in predicting overall survival with Harrell-C statistics of 0.665 (0.618-0.712) and 0.603 (0.554-0.636). After two variables, largest tumor size and BMI, were included in BALAD [0.720 (0.671-0.769)] or BALAD-2 [0.701 (0.649-0.751)] multivariate models, the Harrell-C statistic increased significantly than BALAD (P=0.048) or BALAD-2 (P<0.001) alone. Taking into account availability and expense, an equivalent BAA-BS model was established based on total bilirubin, albumin, AFP, BMI and largest tumor size. The Harrell-C statistic of BAA-BS model [0.723(0.674-0.772)] was similar to that of BALAD (P=0.820) or BALAD-2 (P=0.209) multivariate model. And, the continuous net reclassification index and integrated discriminatory improvement were not statistically different. Finally, a nomogram of the equivalent BAA-BS model was constructed to assist surgeons and patients in predicting 5-year survival rates. Conclusion: Both BALAD and BALAD-2 scores were highly suitable for predicting long-term survival after hepatectomy in Chinese HCC patients. A significant increase in predictive efficacy was observed after the addition of largest tumor size and BMI to BALAD or BALAD-2 score. Even if AFP-L3 and DCP are not detected, an equivalent BAA-BS model also obtained an excellent discriminatory performance.


Introduction
With approximately 466,100 new cases and 422,100 deaths annually [1], liver cancer now has the second largest cancer DALY (disability-adjusted life years) burden in China [2]. According to the global Ivyspring International Publisher data on liver cancer, more than half of the world's new cases and deaths are in China [3]. Hepatocellular carcinoma (HCC) is regarded as the main pathological type of liver cancer, comprising 75%-85% of liver cancer cases [4]. Currently, curative therapy modalities for HCC, including local ablation, liver transplantation and hepatectomy, are determined mainly by tumor characteristics and liver function [5]. Hepatectomy is routinely performed for early-stage HCC, however, the 5-year overall survival rate is just 50% [6]. To improve overall survival, it is important to accurately predict long-term prognosis and subsequently apply effective adjuvant strategies after hepatectomy.
In recent years, some miRNAs and lncRNAs have been identified as independent predictors of survival in HCC patients, and the accuracy of prediction has greatly improved [7][8][9][10][11]. Taking clinical accessibility into consideration, however, alphafetoprotein (AFP) is the most extensively utilized biomarker for predicting the prognosis of HCC [12][13][14]. Subsequently, the combination of Lens culinaris agglutinin-reactive AFP (AFP-L3) [15,16] with des-gamma-carboxy prothrombin (DCP) [17][18][19] in addition to AFP obtained an excellent predictive performance [20]. In addition, the deterioration of liver function represented by total bilirubin and albumin is associated with unfavorable postoperative outcomes [21,22]. The BALAD score (the acronym refers to bilirubin, albumin, AFP-L3, AFP and DCP), a model that incorporates the use of the 5 aforementioned objective biomarkers based on the application of conventional cut-off points, was originally developed as a predictor of the survival for patients with HCC in Japan, which has been validated in the UK and Hong Kong [23,24]. After a reassessment using the Japanese data in a continuous format, the BALAD-2 score also offered clear discrimination and has been externally validated in the UK, Germany, and Hong Kong [25,26]. However, etiologies of HCC in China are obviously dissimilar to those of HCC in Japan and European countries the main regions in which the two models were built and validated. Approximately 75%-80% of HCC cases in China are attributable to persistent hepatitis B virus infection, in contrast with the approximately 70% of HCC cases in Japan and European countries mainly attributed to hepatitis C virus infection [27]. Moreover, those studies lack specificity for the hepatectomy population because they targeted the total HCC population. Although 27 and 36 patients underwent hepatectomy from Hong Kong, respectively, were involved in two confirmatory studies of BALAD or BALAD-2 [24,26], there is still not sufficient efficacy to justify the feasibility of the two scores in China. Accordingly, this study furtherly evaluated the predictive value of the BALAD score, BALAD-2 score and their components on long-term survival after hepatectomy in Chinese HCC patients.
Because few laboratories at present can simultaneously perform 3 tumor biomarker assays (AFP, AFP-L3, DCP) in China, the accessibility of the two scores is limited. Furthermore, the detection of 5 biomarkers (total bilirubin, albumin, AFP, AFP-L3 and DCP) is bound to require extra costs, so the cost-effectiveness must be considered. Therefore, a more practical model is needed for Chinese HCC patients after hepatectomy to account for clinical operability.

Subjects
A total of 277 patients were recruited for the study from March 2009 to May 2018 at the First Hospital of Jilin University. Inclusion criteria were as follows: (1) hospitalized for potential hepatectomy; (2) had not undergone any tumor-related treatment before hepatectomy; (3) voluntarily supplied preoperative blood samples; (4) histologically diagnosed with HCC by pathologists. Among the 277 HCC patients, 26 were excluded for one of the following reasons: (1) distant metastasis; (2) positive surgical margins; (3) received anticoagulants such as warfarin; (4) died of perioperative complications; and (5) lost to follow-up at the first interview. Written informed consent was obtained from each patient, and the study protocol was approved by the Ethics Committee of the First Hospital of Jilin University.

Data collection
Information on general demographic and clinicopathological variables suspected to be risk factors for survival was collected for each patient. Hepatitis B virus (HBV) infection was defined by HBV sero-markers or a history of antiviral HBV treatment [28]. Hepatitis C virus (HCV) infection was confirmed by HCV-Ab positivity or a history of antiviral HCV treatment. The largest tumor size and number of tumors were determined from the most recent imaging report prior to hepatectomy. The Child-Pugh class and BCLC stage calculated at the time closest to hepatectomy in each patient were applied. Cirrhosis, vascular invasion, perineurium invasion and histological tumor differentiation were all evaluated according to postoperative pathology.

Follow-up
Follow-up examinations were carried out 3 months, 6 months, and 1 year after hepatectomy and every year thereafter by specialized staff until death or the last scheduled follow-up. There were three possible follow-up results, as follows. (1) died, the overall survival time was calculated from the date of hepatectomy to the date of death. (2) alive, the overall survival time was calculated from the date of hepatectomy to the date of the latest follow-up. (3) lost to follow-up, the overall survival time was calculated from the date of hepatectomy to the date of the last successful follow-up.

Measurement of biomarkers
Blood samples were taken from all subjects in 5 mL pro-coagulation tubes the morning before surgery after an overnight fast (at least 8 hours). Serum was separated and stored at -80°C. The magnetic microparticle chemiluminescence immunoassay method was used to measure the concentrations of AFP, AFP-L3 and DCP by a Hotgen MQ60plus automatic immune analyzer (AFP-L3 percentage assay kit, DCP assay kit, Hotgen, Beijing, China). AFP-L3 was extracted by affinity adsorption centrifugation and expressed as the AFP-L3 percentage (AFP-L3%) of total AFP. The interday variation coefficients of the quality control samples were 3.78% for AFP, 3.15% for AFP-L3% and 2.26% for DCP. Total bilirubin and albumin were tested within 12 hours after receiving the blood samples by a HITACHI 7600-210 automatic analyzer. The lab provided daily quality control charts.

Calculation of BALAD and BALAD-2 scores
The BALAD and BALAD-2 scores were calculated based on the serum levels of the five biomarkers indicating both tumor progression (AFP, AFP-L3%, and DCP) and liver function (total bilirubin and albumin). The tumor marker cut-offs for elevations in AFP, AFP-L3%, and DCP were 400 ng/mL, 15%, and 100 ng/mL, respectively. Total bilirubin was categorized as < 17.1 μmol/L, 17.1-34.2 μmol/L, or > 34.2 μmol/L and assigned 0, 1, and 2 points, respectively, while albumin was categorized as > 35 g/L, 28-35 g/L, or < 28 g/L and assigned 0, 1, and 2 points, respectively. The bilirubin-albumin score was then categorized based on the sum of the 2 values as 0-1, 2-3, or 4 and scored as 0, 1, and 2, respectively. The BALAD score was calculated by simply summing the number of elevated tumor markers and bilirubin-albumin score. The BALAD-2 function was calculated using the following equation: respectively [23,25].

Statistical analysis
Continuous variables following a normal distribution are presented as the mean with standard deviation (SD). Otherwise, they were reported as the median with interquartile range (IQR). Categorical variables are shown as frequencies with percentages. The Kaplan-Meier method was used to calculate survival curves and compared by the Log-rank test. The Cox proportional hazard model was used to calculate hazard ratios (HRs) with their 95% confidence intervals (CIs). A multivariate Cox proportional hazard model was performed and included factors with a P-value less than 0.1 in the univariate analysis by the forward LR method. The Harrell-C statistic, net reclassification index (NRI) and integrated discriminatory improvement (IDI) were utilized to evaluate the discriminatory performance of the prediction models. The 'CsChange' and 'PredictABEL' packages of R software were used to compare the Harrell-C statistics of different models and calculate NRI and IDI. A predictive nomogram was constructed, and a calibration plot was used to assess the discrepancy. The time-dependent ROC curve of the nomogram was drawn, and the area under the curve (AUC) was calculated. All analyses were performed using SPSS 25.0, GraphPad PRIM8, or R3.6.1 software. For all tests, a two-tailed P<0.05 was considered statistically significant.

Associations of general characteristics with all-cause death
The characteristics and overall survival of the subjects included in our study are shown in Table 1

Associations of BALAD, BALAD-2 score and their components with all-cause death
With respect to BALAD score, more than half of the patients were scored as 1 or higher, and no patient was scored as 5 [0 (n=85, 33.9%), 1 (n=77, 30.7%), 2 (n=54, 21.5%), 3 (n=31, 12.3%), and 4 (n=4, 1.6%)]. When comparing different BALAD scores, we observed that HCC patients had 5-year survival rates that worsened with each increase from 0 to ≥3 (66.9%, 44.1%, 28.7% and 17.1%; Log-rank P<0.001; Figure  1A). Regarding their predictive value for overall survival, the BALAD score demonstrated a fine Harrell-C statistic with a value of 0.665 (0.618-0.712). Among the different BALAD-2 scores, we found that the 5-year survival rate showed a decreasing trend with each increase from ≤2 to 4 (68.1%, 60.8% and 34.5%; Log-rank P<0.001; Figure 1B). Despite the BALAD-2 score being a revision, its Harrell-C statistic was not higher than the BALAD score and was 0.603 (0.554-0.636). In addition, the elevation of each tumor marker (AFP, AFP-L3%, and DCP) and deterioration of liver function (total bilirubin and albumin) significantly indicated poor overall survival (Table 2). The Harrell-C statistics of the BALAD and BALAD-2 multivariate models were 0.720 (0.671-0.769) and 0.701 (0.649-0.751), respectively. When comparing the predictive value of different models, a significant increase in Harrell-C statistic was observed after the addition of largest tumor size and BMI to the BALAD (P=0.048) or BALAD-2 (P<0.001) score, but there was no difference between the BALAD and BALAD-2 multivariate models (P=0.244) ( Table 3).

Equivalent BAA-BS model fitted by total bilirubin, albumin, AFP, BMI and largest tumor size
Considering that the Harrell-C statistics of the five biomarkers were approximately 0.6, there was no glaring difference with the Harrell-C statistic of the two scores. We speculated that some of the five biomarkers may have a small contribution, so we tried to combine biomarkers with clinicopathological characteristics to build an alternative model.  Table 3).

Nomogram of the equivalent BAA-BS model
A nomogram based on the BAA-BS model is shown in Figure 2A. In the nomogram, each enrolled patient can obtain one individualized score by adding up the points assigned to the five prognostic variables. The projection from the total points (range 0-260) on the scales below predicted the estimated probability of 5-year survival. The calibration plot for 5-year survival probability suggested good consistency between the predicted and observed overall survival probabilities ( Figure 2B). Finally, the time-dependent ROC curve suggested that the nomogram possessed good discrimination ability with an AUC of 0.793 (0.727-0.859) ( Figure 2C).

Discussion
Although there have been just a few studies discussing the applicability of the BALAD or BALAD-2 score in the past, there are many differences in nationality, HCC etiology, and treatment methods compared with our HCC population, so a relatively large sample prospective study is urgently needed to illustrate the feasibility and build a sufficient evidence base on the use of the BALAD and BALAD-2 scores in Chinese HCC patients who underwent hepatectomy [29,30].   This study first focused on Chinese HCC patients after hepatectomy and found that both the BALAD and BALAD-2 scores were highly suitable for predicting long-term survival. This is concordant with a nationwide study in Japan, in which that approximately 75% of the HCC patients had hepatitis C viral infections and the hepatectomy population only accounted for 28.0%, that found that the BALAD score was an effective predictor of overall survival, while approximately 82.9% of the HCC patients had hepatitis B viral infections and all the HCC patients underwent hepatectomy in our study [23]. For hepatitis B virus-related HCC patients, a Hong Kong study indicated the versatility of the BALAD score for predicting long-term survival among 27 patients receiving hepatectomy in 198 patients with HCC, in which the advanced HCC accounted for 62.0% [24]. On the contrary, our study mainly focused on early HCC in a relatively large sample. Regarding the BALAD-2 score, one Hong Kong cohort externally validated the utility of this score in predicting long-term survival in 36 patients underwent hepatectomy, but it's worth noting that our sample size of hepatectomy population was obviously larger than theirs and our median follow-up time (63.6 months) was also significantly longer than theirs (37 months) [26]. With these results in mind, Chinese HCC patients who received hepatectomy with higher preoperative BALAD or BALAD-2 scores should be closely followed up and more comprehensively treated to achieve a prolonged survival period.
With respect to the discriminatory performances of the BALAD and BALAD-2 scores, our study showed that both had a moderate capability to predict all-cause death, but the predictive value of the BALAD-2 score was not superior to that of the BALAD score. This finding is largely consistent with previous research, in HCC patients receiving liver transplantation, except the Harrell-C statistic of the BALAD-2 score differed among studies [31]. The dissimilar predictive values of the BALAD-2 score may be attributed to different treatments that the study population received and different detection methods/platforms, which resulted in different fluctuations in the values of the biomarkers. As we know, the BALAD-2 score is calculated with continuous format but still susceptible to fluctuations in the five biomarkers, although a transformation of the variables is performed. These results suggested that the BALAD score could be a more stable predictor of HCC prognosis than the BALAD-2 score across different detection methods or platforms.
Two multivariate models demonstrated excellent discriminatory performances after combining the two easily obtained indicators of largest tumor size and BMI with the BALAD or BALAD-2 score in this study (Table 3 and S1). Tumor size reflects the degree of tumor invasiveness as a part of tumor staging and has been largely adopted in clinical practice to determine patient prognosis and recommend specific treatment for many years [32]. In our study, largest tumor size remained an independent risk factor for all-cause death. Regarding BMI, the conclusions of previous studies have been controversial so far. One previous study reported that the 20-year overall survival rate of overweight HCC patients (BMI ≥25.0 kg/m 2 ) after hepatic resection was significantly better than that of non-overweight patients [33]. Nevertheless, another multicenter study found that underweight (BMI<18.5 kg/m 2 ) and overweight (BMI ≥25.0 kg/m 2 ) HCC patients appeared to have worse recurrence-free survival and overall survival following liver resection than those who were normal weight [34]. Our study showed that the survival rate increased in order from underweight (BMI<18.5 kg/m 2 ) to normal weight (18.5 kg/m 2 <BMI<25.0 kg/m 2 ) to overweight (BMI ≥25.0 kg/m 2 ) patients. After specifically focusing on the normal weight subpopulation, we also observed a decrease in the risk of death per increase of 1 kg/m 2 in BMI (Table S2). Because nearly 80% of patients were of normal weight in our study, we can only say that HCC patients with higher BMIs in the normal weight range had better long-term survival after hepatectomy than those with lower BMIs. The possible reason for the effect of BMI on overall survival was that patients with a higher BMI in the normal weight range had better a nutritional reserve and metabolic function, which are indispensable for the tumor immune response [35].
In view of the two scores' lack of availability and high cost in China, an equivalent BAA-BS model which is suitable for Chinese HCC patients with hepatectomy was established based on the combination of total bilirubin, albumin, AFP and BMI with largest tumor size. Compared with the previous S-LAD model (Diameter of the largest tumor at time of transplantation, AFP, AFP-L3%, DCP) which was optimized on the basis of BALAD for liver transplantation population [31], liver function indicators were still main predictors in our BAA-BS model, while indicators of liver function were not included in their S-LAD model because of the complete improvement in liver function of HCC patients after liver transplantation. Other than that, in our study, there was a paradox that AFP-L3% and DCP, which had better predictive values than total bilirubin an albumin in the univariate model, were not ultimately entered into the BAA-BS model. After further data analysis, a close correlation between AFP and AFP-L3%, as well as between DCP and largest tumor size, was observed (Spearman's correlation coefficients: 0.701 and 0.753, respectively). As we all know, only one variable among those with strong multicollinearity will be selected for inclusion in the multivariate model, while the other variables will be discarded. Hence, AFP and largest tumor size, replacing AFP-L3% and DCP, were ultimately entered into the BAA-BS model. Regarding the indicators in the model, the largest tumor size is simply and easily obtained, as computed tomography or magnetic resonance imaging must be performed before surgery to determine the tumor site and surgery type. BMI and liver function should also be evaluated as indicators for the patients' ability to tolerate surgery, and AFP is initially detected for the diagnosis of HCC in the Chinese guidelines. In brief, the BAA-BS model is not only accessible but also cost-effective.
Ultimately, we constructed a visual and assessable nomogram on the basis of the equivalent BAA-BS model in which all variables are ordinary indicators and no advanced algorithms are required. Importantly, the discriminatory performance of this nomogram was comparable to that of the two scores' multivariate models, with optimal agreement between the predicted and observed 5-year survival probabilities. Our nomogram offers a good alternative because it does not require detecting AFP-L3 and DCP and can be a powerful assistive tool for surgeons and patients to directly quantify the potential benefit of hepatectomy and indirectly evaluate the risk of all-cause death.
Two major strengths should be mentioned in this study. Our study pays special attention to Chinese HCC patients who underwent hepatectomy, and the extrapolated population for our model is clear and definite. Prognostic nomograms could help both surgeons and patients themselves visually and conveniently calculate and assess the possibilities of survival. However, there are some limitations in our study. All HCC patients included in our study were from only one hospital. In addition, the multivariate model and the equivalent BAA-BS model have not been externally validated. Therefore, more studies with a large sample size are warranted to verify the results.
In conclusion, both the BALAD and BALAD-2 scores were highly suitable for predicting long-term survival after hepatectomy in Chinese HCC patients. A significant increase in predictive efficacy was observed after the addition of largest tumor size and BMI to the BALAD or BALAD-2 score. Even if AFP-L3 and DCP are not detected, an equivalent BAA-BS model including largest tumor size and BMI also obtained an excellent discriminatory performance.