Risk-predicted dual nomograms consisting of clinical and ultrasound factors for downgrading BI-RADS category 4a breast lesions - A multiple centre study

Purpose: To develop and to validate a risk-predicted nomogram for downgrading Breast Imaging Reporting and Data System (BI-RADS) category 4a breast lesions. Patients and Methods: We enrolled 680 patients with breast lesions that were diagnosed as BI-RADS category 4a by conventional ultrasound from December 2018 to June 2019. All 4a lesions were randomly divided into development and validation groups at the ratio of 3:1. In the development group consisting of 499 cases, the multiple clinical and ultrasound predicted factors were extracted, and dual-predicted nomograms were constructed by multivariable logistic regression analysis, named clinical nomogram and ultrasound nomogram, respectively. Patients were twice classified as either “high risk” or “low risk” in the two nomograms. The performance of these dual nomograms was assessed by an independent validation group of 181 cases. Receiver Operating Characteristic (ROC) curve and diagnostic value were calculated to evaluate the applicability of the new model. Results: After multiple logistic regression analysis, the clinical nomogram included 2 predictors: age and the first-degree family members with breast cancer. The area under the curve (AUC) value for the clinical nomogram was 0.661 and 0.712 for the development and validation groups, respectively. The ultrasound nomogram included 3 independent predictors (margins, calcification and strain ratio), and the AUC value in this nomogram was 0.782 and 0.747 in the development and validation groups, respectively. In the development group of 499 patients, approximately 50.90% (254/499) of patients were twice classified “low risk”, with a malignancy rate of 1.18%. In the validation group of 181 patients, approximately 47.51% (86/181) of patients had been twice classified as “low risk”, with a malignancy rate of 1.16%. Conclusions: A dual-predicted nomogram incorporating clinical factors and imaging characteristics is an applicable model for downgrading the low-risk lesions in BI-RADS category 4a and shows good stability and accuracy, which is useful for decreasing the rate of invasive examinations and surgery.


Introduction
According to global cancer statistics in 2018, breast cancer is the most common cancer in women and is the leading cause of cancer deaths (15.0% of all cancer deaths in women) [1]. Ultrasound, as a radiation-free and non-invasive method, is the preferred approach for breast examination, especially for dense breasts [2][3]. According to the American College of Radiology (ACR) Breast Imaging Reporting and Data System (BI-RADS) [4][5], breast lesions are divided into 6 categories based on different ultrasonic characteristics. Category 4 lesions have great malignant probability, varying from 2% to 95%, which are further classified into the 4a, 4b, and 4c subcategories and suggested for further examination, such as puncture biopsy or surgical treatment. The malignancy rate of type 4a lesions is only 2% to 10%, and most of them are benign, which leads to the low specificity in ultrasound. If risk prediction is performed for grade 4a lesions and low-risk lesions were conducted follow-up observation, unnecessary invasive examinations could be reduced.
The issue of how to develop a simple and effective breast cancer risk prognostic model has become the focus of breast cancer prevention. The Breast Cancer Risk Assessment Tool (BCRAT) [6], also known as the Gail Model, which is based on American Caucasian data, was proposed in 1989 by Costantito et al., through many studies and years of testing and corrections, the Gail Model has been the most widely used and is one of the standard methods for breast cancer risk assessment, especially in European and American countries. Several risk-predicted models for breast cancer have been reported with different risk factors [7][8][9][10]. Because of disparities in the various racial and ethnic groups, the application of the Gail Model has certain limitations in the Asian population [11][12][13]. If a 4a lesion risk-prediction model can be established based on Asian population data, the patients can be divided into "high-risk" and "low-risk" populations according to the clinical risk degree results, which will help to further improve the diagnostic accuracy and to avoid missed diagnoses.
According to BI-RADS, conventional ultrasound (US), including two-dimension (2D) and colour Doppler, could provide information for discrimination of breast lesions [14,15]. However, there are overlapping morphological features in some benign and malignant lesions. Elastography as an add-on to the conventional imaging is non-invasive and enables quantitatively assessing the tissue stiffness objectively and has been used in many diseases. In addition, elastography was added to BI-RADS in the new version in 2013, and its diagnostic value was confirmed. Based on previous studies, tissue stiffness is associated with the risk of malignancy; the harder the lesion is, the greater the probability of malignant risk is [16][17][18]. Ultrasound is increasingly used in clinical breast examination and has demonstrated good performance not only in breast tumour differential diagnoses but also for the potential to downgrade BI-RADS 4a lesions to reduce false-positive biopsies without increasing the risk of missing cancers [19].
Most of the previous literature used clinical factors or ultrasound signs alone to distinguish benign and malignant breast nodules. According to the study by Jieun Koh et al., BI-RADS 4a lesions were classified into "average" and "high" risk by personal or family history, and "soft" and "not soft" by elastography. Only the lesions with "average risk" and "soft stiffness" could be downgraded without further examination [19]. It was found that the missed diagnosis rate for malignant lesions was only 1.5%, and 26.7% (68/255), respectively, and benign nodules could be degraded. To evaluate clinical risk factors more comprehensively, to identify suspicious signs of traditional ultrasound and to analyse elastography quantitatively, this study established a risk factorpredicted model for downgrading 4a breast lesions, which is more suitable for the Chinese population and to reduce the rate of unnecessary examinations and surgery.

Materials and Methods
This was a multi-centre study conducted at regional medical centres in China, including 32 hospitals from 23 different provinces. All hospitals and participating radiologists completed real-name registration on the website (www.nuqcc.cn) and uploaded information after approval. To reduce the difference in diagnosis between sonographers and to improve the proficiency and consistency, before the multi-centre research, the doctors of each centre had been trained through on-site operation, demonstration and practice. All the data and images from the website were separately reviewed by three experienced radiologists in our hospitals. When there was a discrepancy, the consensus was reached after discussion. This study was registered at Chinese Clinical Trail Registry platform (http://www.chictr. org.cn) with an approval number of ChiCTR 1900023916. Informed consent was obtained from all individual participants included in the study.

Study population
In total, 708 lesions diagnosed as 4a were selected from 3020 consecutive breast cancer patients who underwent biopsy and surgery and who were diagnosed by ultrasound from December 2018 to June 2019 in 32 hospitals.
Inclusion criteria were as follows: (1) patients who underwent breast lesion elastography examination; (2) patients with available pathological results; and (3) patients with available clinical information. The exclusion criteria included the following: (1) patients who had a history of preoperative radiotherapy, chemotherapy, or endocrine therapy; (2) patients for whom ultrasound images were not clear; and (3) patients for whom the  elastography imaging was not satisfactory, such as  patients with a cough who could not cooperate with  elastography imaging. A total of 680 patients were finally enrolled, including 639 with benign lesions and 41 with malignant lesions. All 4a lesions were randomly divided two groups in a ratio of 3:1 as development and validation groups, respectively. The study flow chart is shown in Figure 1. All of these patients underwent breast ultrasound examination prior to core needle biopsy or surgical pathology. The final pathologic results were considered the gold standard.

Clinical characteristics acquisition
Clinical characteristics based on the Gail Model, including personal history of breast cancer, first-degree relatives with breast cancer history (yes or no), a personal history of atypical ductal hyperplasia (ADH), and height and weight were recorded. In addition, body mass index (BMI) was calculated by the following formula: BMI = weight (kg)/height (m) 2 .

Ultrasonic imaging acquisition
All US examinations were performed with Resona 7 or 8 devices (Mindray Medical, Shenzhen, China) equipped with 5-14 MHz linear-array transducers. Conventional US and elastography were prospectively recorded before biopsy or surgery within two weeks by 32 sonographers with more than 3 years of experience who were blinded to the patient clinical data. For each patient, the ultrasound images of only one lesion with the highest BI-RADS categories was reserved.
The elastography model was switched after ascertaining that the lesion was in the largest diameter section and the B-mode images were optimal. To acquire reliable results, the angle of the probe was kept perpendicular to the skin and appropriate manual compression in the normal range was applied to keep the colour of the entire target stable. A wide colour spectrum of red to green to blue was displayed in elastography images, representing tissue from hard to intermediate to soft component. When the fatty tissue on the surface of the mass was blue, the elastography image was saved. Based on the colour, a Tsukuba score from 1 to 5 was assigned [21]. Scores from 1 to 5 indicated a uniform soft strain in the entire hypoechoic lesion, a mixed pattern, hard but smaller on elastogram, the same size on elastogram, and hard and larger on elastogram than in 2D images, respectively. Strain ratio was measured and recorded through drawing the region of interest location (ROI). The tumour ROI was placed entirely in the tumour, and the subcutaneous fat ROI was limited to fat not containing fibroglandular breast tissue at a similar depth to the lesion. The elastography imaging was displayed twice for every lesion and the average strain ratio was recorded.

Data and statistical analysis
Continuous variables are expressed as the mean±SD and were tested with Kruskal-Wallis rank sum test. Categorical variables are presented as frequencies and percentages and were analyzed by chi-square test. Variables showing P < 0.05 in univariate analysis were considered possible predictors and were entered in the multivariate model. Two nomograms, clinical nomogram and ultrasound nomogram, were built in the training cohort based on multivariate analysis.
Based on the significant clinical predicted factors in the development group, the patients were categorized as "clinical high risk" and "clinical low risk". In addition, based on the ultrasonic predicted factors in the development group, the patients were divided as "ultrasound high risk" and "ultrasound low risk".
The pathological diagnosis was used as the "gold standard", and the area under the receiver operating characteristic (ROC) curve (AUC) was calculated after determining a cut-off value by analyzing the nomogram. The Hosmer-Lemeshow (HL) test was assessed to evaluate the calibration. TP, TN, FP, and FN represented the number of true-positive findings, true-negative findings, false-positive findings and false-negative findings, respectively.
Then, the dual nomograms built in the development group were further verified in the validation cohort. The performance of the model in terms of discrimination and diagnostic value was assessed in the validation cohort using the same methods described above.
The software SPSS Statistics (version 24.0, USA) and R software (version 3.3.0) were used for data analysis. A P value of <0.05 was considered significantly different.

Clinical and pathological patient characteristics
The development group was comprised of 499 cases (473 benign lesions and 26 malignant lesions; mean age 42.13±10.85). An independent validation group included 181 cases (166 benign lesions and 15 malignant lesions; mean age 42.52±11.84). Histopathological diagnoses of the 680 breast masses were confirmed via US-guided core needle biopsy of 107 lesions and surgery of 573 lesions.
The pathology results in the development and validation cohorts are summarized in Table 1. The malignant lesion rate was 5.21% (26/499) and 8.29% (15/181) in the training and validation cohorts, respectively. The malignant lesions included ductal carcinoma in situ (DCIS), invasive ductal cancer (IDC), and invasive lobular cancer (ILC) mucinous carcinoma (MC) and invasive cancer (IC). In addition, fibroadenoma, adenosis and intraductal papilloma were the most common benign lesions.

Clinical predicted nomogram
Base on the Gail Model and previous study, clinical characteristics, including mean age, BMI, family history of first-degree relatives with breast cancer history, age at menarche, number of births, age at first birth and ADH history, were assessed as risk factors. The results showed that only mean age and family history were significant in in both the development and validation groups ( Table 2).
The variables in Table 1 were assessed in a univariate logistic regression analysis, and the variables with outcomes of P < 0.05 were entered into a multivariate logistic regression ( Table 3). The results showed that age (OR =1.05, 95% CIs: 1.03 to 1.08) was an independent predictor. Many previous studies have shown that family history is strongly associated with breast cancer [22][23][24], even in multivariate regression, family history (OR =7.13, 95% CIs: 0.62 to 82.03) with P=0.1125, the number of first-degree relatives with history of breast cancer was an independent predictor and was forced to be incorporated into the logistic regression analysis. The formula of Linear Predictor = -4.78455 +0.04167*Age +1.96384*(family history=1). A model incorporating these two independent predictive factors was built and is shown as a nomogram (Figure 2A). To use the nomogram, first, the subject's age and family history can be located on the relevant axis. Next, a straight line is drawn upwards, to the point of the axis on the top, to acquire the points received based on covariates, respectively. Total points are calculated by adding all the points obtained from every covariate. The final sum is located on the total points axis, and a straight line was drawn downwards from there to obtain the probability of risk degree. Through the nomogram, the cut-off of risk degree was 0.0593. These patients were regarded as "clinical high risk" with risk degree more than or equal to 0.0593, and as "clinical low risk" with risk degree less than 0.0593.
In internal validation, the ROC showed the resulting model with an AUC of 0.661. The Hosmer- Lemeshow test was not significant (P = 0.694), suggesting a good fit of the model. In the independent validation cohort, the clinical model displayed moderate discrimination with an AUC of 0.712 ( Figure 2B). Moreover, the Hosmer-Lemeshow test (P =0.358 ) was not significant. The diagnostic value of the clinical nomogram in the development and validation groups is shown in Table 4. In the development group, the sensitivity, specificity and accuracy were 0.5385, 0.7526 and 0.7415, respectively. With the same cut-off as the development group, the sensitivity, specificity and accuracy in the validation group were 0.5333, 0.7831 and 0.7624, respectively.

Ultrasound-predicted nomogram
According to BI-RADS, the ultrasound characteristics and their P values are shown in Table  5. The results showed that margin, shape, calcification morphology, position and elastography index were significantly different in the development group. In addition to the above parameters, there were other parameters, such as echo pattern and structural distortion, which had certain significant differences in the validation group.
The variables in Table 5 were assessed in a univariate logistic regression analysis, and the variables with outcomes of P < 0.05 were entered into a multivariate logistic regression (forward stepwise logistic regression) ( Table 3), where variables with P < 0.05 were considered possible predictors. Margin, calcification, and strain ratio were identified as independent predictors of patient classification as "high risk" or "low risk". The formula of Linear Predictor = -5.46644 +1.44638(Margin=1) +2.50926*(Margin=2) +0.26413*(Calcification=1) +1.68924*(Calcification=2) +0.32118*Strain ratio. A model was built and is shown as a nomogram ( Figure  3A).
The probability of ultrasound risk degree was obtained from the nomogram in the same method as above. The total points are calculated by adding all the points obtained from margin, calcification morphology, and strain ratio and the cut-off of risk degree was 0.0486. Patients were regarded "ultrasound high risk" with risk degree more than or equal to 0.0486, and "ultrasound low risk" with risk degree less than 0.0486.
In the development and validation cohorts, the discrimination of ultrasound risk nomogram was moderate with AUC of 0.782 (P = 0.905) and 0.747 (P = 0.359), respectively ( Figure 3B). In addition, the Hosmer-Lemeshow test was not significant.

The dual nomogram diagnostic process in the development group
For the dual nomogram established in this research, the internal validation was performed in the development group. First, the patients were divided into high-risk 26.25% (131/499) and low-risk 73.75% (368/499) groups based on clinical nomogram. Then, through the ultrasound nomogram, the patient was given a second risk-degree classification (Figures 4 &  5).   malignant and diagnosed as IDC pathologically, of which the malignancy rate was 1.18% and far below those with dual high-risk lesions. When either the clinical or ultrasonic risk was high, the malignancy rate was 4.88% and 7.89%, respectively (Figure 6).

The dual nomogram diagnostic process in the validation group
In the validation group, 181 lesions, 27.62% (50/181) showed high clinical risk and 72.38% (131/181) low clinical risk (Figure 7). The malignancy rate of lesions in women with both ultrasound and clinical low risks was 1.16% (1/86) and was significantly lower than the respective 42.86% (6/14) malignancy rate of lesions in women with both high clinical and ultrasound risks.
When one of the two nomograms was high risk, the malignancy rate could be 8.33% (3/36) and 11.11% (5/45), respectively. Through this model, approximately 47.51% (86/181) of lesions with dual low-risk lesions could be downgraded with a missed diagnosis rate of only 1.16%.

Discussion
Breast cancer has become a disease of global concern. Category 4a lesions have a malignancy rate of 2% to 10%, with low specificity in ultrasound diagnosis, which leads to unnecessary invasive examination of benign diseases. To establish an efficient model for downgrading 4a lesions, this study constructed dual nomograms based on a Chinese sample, and the patients were twice classified as either "high risk" or "low risk" in clinical and ultrasound nomograms. The diagnostic performance of dual nomograms was validated in another group. The results showed that whether internal or external validation, the model constructed in this study can effectively discriminate 4a lesions.
There are currently multiple breast cancer risk assessment models, and the existing risk prediction models have similar, moderate predictive accuracy overall [25]. The Gail Model is the most widely used and is one of the standard methods for breast cancer risk assessment. Moreover, its predicted value has been assessed in previous studies with different results. According to Gao et al., the model containing only age at menarche, age at first birth and number of first-degree relatives with breast cancer could provide a more convenient way to predict the risk of invasive breast cancer in Southeast Asian women [26]. Sa-Nguanraksa D conducted a study to evaluate whether the Gail Model can calculate the risk of breast cancer in Thai women and found age, parity, age at first live birth, and history of atypical ductal hyperplasia (ADH) were significant risk factors for breast cancer [27]. In our study, the patient's age and family history are two of the most significant variables for patients with 4a lesions, which is similar to previous studies. Other factors, such as ADH and age at first birth, were not associated with breast cancer risk, which may be due to the differences in the sample population enrolled and sample size restrictions.  According to BI-RADS, 4a category lesions have certain malignancy signs, but benign and malignant are not easy to distinguish. The ultrasound findings of some benign and malignant lesions in this study were overlapping. In the development group, 88.46% of malignant nodules grew in parallel, and 23.08% of lesions had clear margins. Furthermore, benign lesions could also show malignant signs; for example, 41.44% had unclear margins and 7.82% were non-parallel. For 4a lesions, selecting the riskiest malignant signs is helpful to distinguish the lesions and to improve the diagnostic value. This study found that margins and calcified morphology were significantly different. According to logistic regression analysis, the OR value of coarse calcification and micro-calcification was 0.98 (P=0.9766) and 4.67 (P=0.0006), respectively. Microcalcification was significantly associated with malignant lesions, which is the same as previous studies [28]. In addition, the margins in this study were highly significant for the diagnosis of 4a lesions. Unclear margins include indistinct, angular, microlobulated and spiculated, and when more than two cases are combined, the rate of malignancy risk is higher.
In the new version of BI-RADS guidelines, elastography as a new predictor has been added to assess breast lesions and has been reported with a high diagnostic value in the differentiation of breast lesions in many studies. In a meta-analysis of 2087 lesions by 9 studies, Sadigh et al. summarized the accuracy of elastography for differentiation of malignant and benign breast abnormalities and found the pooled sensitivity of 88% and specificity of 83% when using strain ratio, and the pooled sensitivity of 98% and specificity of 72% when using length ratio [29].
Han et al. compared different elastic methods, including strain elastography (SE), acoustic radiation force impulse-inducing Virtual Touch Imaging (VTI), and Virtual Touch Imaging Quantification (VTIQ), in downgrading US BI-RADS category 4a lesions. The authors found that 50.8% to 85.8% of lesions were downgraded with a malignancy rate range from 0% to 50% when using different combinations of elastography methods, which showed the combination of different elastic methods has the potential to downgrade BI-RADS 4a lesions with excellent performance [30]. This study added shear elastography on the basis of conventional ultrasound and combined clinical characteristics, and approximately 50.90% in the development group and 47.51% in the validation group 4a lesions with both low risks could be downgraded with the false negative rate of only 1.18% or 1.16%, respectively, which increased diagnostic value for the identification of benign and malignant lesions and showed excellent stability. The malignancy rate of BI-RADS 4a lesions with dual "low risks" is in line with the proportion of BI-RADS 3 lesions (<2%) and could be followed up by this dual nomogram, which is conducive to reducing unnecessary invasive biopsies and saving clinical resources.
There are some limitations in this study. First, only patients with pathology results were enrolled in this study, and some patients in follow-up were not included, which may have led to selection bias and resulted in the underestimation of the NPV and the overestimation of the PPV. Second, the predicted value of breast cancer family history (only 3 in the benign group and 2 in the malignant group with a positive breast cancer family history) may be limited because of the moderate sample size of the study. We used the forced inclusion logistic regression and incorporated "family history" as one risk factor to reduce the deviation of modal.
In this study, we extracted meaningful factors from a Chinese multi-centre sample and established a dual-risk predicted model by clinical and ultrasound factors. The patients who had both low risks could be downgraded and could be followed up with a low malignancy rate, which was less than 2%. In conclusion, this dual nomogram, taking clinical risk factors and ultrasound characteristics together, showed high predictive value for downgrading US BI-RADS 4a lesions in both the development and validation groups, which is useful to reduce invasive examinations and surgery, and can be used as a screening tool for risk stratification among the 4a lesions.