3.2
Impact Factor
J Cancer 2025; 16(15):4316-4337. doi:10.7150/jca.118698 This issue Cite
Review
1. Faculty of Information Technology and Computer Science, University of Central Punjab, Lahore, Pakistan.
2. Department of Electronic Engineering, Yeungnam University, Gyeongsan, 38541, Republic of Korea.
3. IRC-FDE, King Fahd University of Petroleum and Minerals, 31261, Dhahran, Saudi Arabia.
4. Universidad Europea del Atlantico, Isabel Torres 21, Santander, 39011, Spain.
5. Universidad Internacional Iberoamericana, Campeche 24560, Mexico.
6. Universidad Internacional Iberoamericana, Arecibo, Puerto Rico 00613, USA.
7. Universidade Internacional do Cuanza, Cuito, Bie, Angola.
8. Universidad de La Romana, La Romana, Republica Dominicana.
9. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, Jiangsu, China.
10. Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, 38541, Republic of Korea.
*These authors contributed equally to this work
Received 2025-6-2; Accepted 2025-9-8; Published 2025-10-20
Breast cancer is a lethal carcinoma impacting a considerable number of women across the globe. While preventive measures are limited, early detection remains the most effective strategy. Accurate classification of breast tumors into benign and malignant categories is important which may help physicians in diagnosing the disease faster. This survey investigates the emerging inclination and approaches in the area of machine learning (ML) for the diagnosis of breast cancer, pointing out the classification techniques based on both segmentation and feature selection. Certain datasets such as the Wisconsin Diagnostic Breast Cancer Dataset (WDBC), Wisconsin Breast Cancer Dataset Original (WBCD), Wisconsin Prognostic Breast Cancer Dataset (WPBC), BreakHis, and others are being evaluated in this study for the demonstration of their influence on the performance of the diagnostic tools and the accuracy of the models such as Support vector machine, Convolutional Neural Networks (CNNs) and ensemble approaches. The main shortcomings or research gaps such as prejudice of datasets, scarcity of generalizability, and interpretation challenges are highlighted. This research emphasizes the importance of the hybrid methodologies, cross-dataset validation, and the engineering of explainable AI to narrow these gaps and enhance the overall clinical acceptance of ML-based detection tools.
Keywords: tumor detection, breast cancer, deep learning, segmentation
The cell is a basic structural and functional unit of an organism which consists of numerous cell organelles. During a biological clock that a cell undergoes, it continues to grow and experiences cell division (mitosis) after a specified period of time. A cell becomes malignant or cancerous when it loses its capability to stop cell division. Such unnecessary mitosis led to the cells accumulating at a particular location and time forming a mass known as tumor [1]. Two kinds of tumors have been identified until now; benign means non-cancerous and malignant means cancerous. A cancerous tumor is malignant when it starts invading and damaging the nearby cells [2].
Breast cancer is a type of cancer that includes the cancerous tumor development in the tissues of the human breast. Every woman is at the peril of forming breast cancer at some stage of her life. The year 2020 observed the morbidity of more than four million women around the world [3] and the major reason for this wide-scale casualty is breast cancer. However, a significant number of them is gathered in third world countries which accommodate 72% cases. This death toll difference between economic- social areas has irked between 1990 and 2019 and this development is expected to proceed [4].
According to Health at Hand [5], in 2020, globally breast cancer is the most dominating type of carcinoma, afterwards colon and rectum cancer. Figure 1 shows the number of new cancer cases in women for the year 2023, indicating the highest number of breast cancers, with other types such ovarian, lung cancer, colon cancer, etc. [6-8]. These are the main cancer kinds across most countries; however, they vary with regard to their ranking across the world. According to The DAWN [9], Pakistan has the highest number of breast cancer cases in Asia, with an estimated 40,000 women falling victim to this fatal disease. Consultants at Shaukat Khanum Memorial Hospital connect the soaring incidence rate of breast cancer to Pakistan's orthodox societal norms and the lack of an advanced diagnostic system.
New cases in women for the year 2023, cancer statistics have been taken from [6].
To mitigate the mortality rate of breast cancer, early detection is crucial and can be bolstered through accurate classification of breast cancer tumors into benign called non-cancerous or malignant called cancerous classes [10]. Breast cancer has a considerable number of categorizations, which may help clinicians to recommend the best treatment. Binary classification or classification into two classes is most significant among them that is; whether the tumor is benign (non-cancerous) or malignant (invading the nearby cells) [11]. At present, it is crucial to group the cancer tumor as the acuteness of the ailment is figured out by these sorts of classifications. Various studies have been carried out utilizing certain ML methodologies and different datasets for the purpose of classifying the cancer tumor as benign or malignant [12, 13]. Such methodologies can help physicians to medicate the cancer properly. Over time, certain standard datasets have come to light in the literature that have been utilized by scientists for the diagnosis and prediction of breast cancer.
This review aims to address the gap in comprehending the latest trends and patterns in the evolution of breast cancer detection and the effectiveness of various detection methods, including deep learning (DL), feature selection-based, ensemble classifications, and image-based segmentation techniques. It further focuses on and evaluates the utilization and efficacy of a variety of datasets wielded while training breast cancer detection models, emphasizing their significance in improving detection accuracy.
In terms of women's deaths caused by cancer, breast cancer morbidity, and mortality numbers are considerable [3]. In addition, the phase extends beyond the patient's physical health in talking about the emotional, social, and economic outcomes of the disease. Families and caregivers often have to deal with a great amount of stress, and the costs of treatment and long-term care spiral ever higher in our healthcare systems. Further, in low-resource areas, disparities in access to healthcare enhance outcomes as there is often late-stage presentation. One of the measures taken toward combating breast cancer was to improve medical imaging, make public breast cancer awareness, and set up screening programs. These steps are being taken, but diagnostic inaccuracies persist, with high rates of false positives, and a need for expert interpretation. ML can address these limitations and represents real transformative potential to deliver precise, automated, and scalable detection and diagnosis solutions [4].
The diagnosis of breast cancer and treatment includes several challenges that influence the accuracy, efficacy, and accessibility of watchfulness. Detection techniques mostly encounter certain limitations because of technological obstructions, human inference problems, and heavy costs [14, 15]. Likewise, therapeutic approaches should account for tumor diversity, individualized responses, and prospect consequences, creating personalized treatment intricate. Tackling such challenges necessitates a blend of enhanced diagnostic techniques, advanced care methods, and the integration of innovative and modern technologies like ML [16].
For this review, we formulated the following research questions.
The extending intersection among ML and breast cancer detection provides the trans- formative potential in early detection, the most efficient strategy for mitigating rates of mortality. ML techniques can significantly improve the diagnostic accuracy, speed, and reliability.
In spite of the growing volume of ML-based research, various existing researches are fragmented focusing narrowly on specific algorithms, datasets, or imaging modalities.
This shattered landscape poses various challenges for the researchers and clinicians in recognizing the effective strategies, comparing results, and building upon prior work. This survey tackles these challenges by aiming to:
By consolidating existing findings into a unified narrative, this survey helps improve reproducibility, inform practical decision-making, and identify promising areas for future research.
Breast cancer persists as a major reason for cancer-related casualties among women throughout the world. The efficacy of treatment is highly contingent on early and precise diagnosis, while conventional diagnostic approaches often suffer from certain constraints, including high false positives and false-negative rates, bias in interpretability, and availability issues. ML has come up as an auspicious tool in clinical diagnostics, offering top accuracies, scalability, and automation. However, in spite of substantial advancements, ML-based breast cancer diagnostics still encounter challenges in terms of quality of data, generalizability of a model, and medical adoption. This sur- vey attempts to provide a comprehensive overview of various ML approaches utilized in breast cancer detection, pointing out their strengths, constraints, and potential improvement areas. Table 1 presents comparative analysis of existing surveys.
Figure 2 shows the organization of the sections that follow the introduction. After the literature review in Section 2, the methodology is presented in Section 3. Findings and discussions are given in Section 4 while the conclusion and future research directions are presented in Section 5.
A considerable number of ML methodologies have been employed so far to correctly diagnose breast cancer disease in various research. In [25], a Tabu search was done to choose the most appropriate features from the dataset for the detection of breast lesions or tumors using a rough set. The method was tested on the WDBC and BIDMC-MGH datasets. AdaBoost, hlK-Nearest Neighbor (KNN), and hlLR were used as performing models. hlKNN achieved the highest accuracy among all using Tabu search at 98.24%.
hlLR, KNN, discrete cosine transform (DCT), random forest (RF) classifier, hlSVM, multilayer perceptron (MLP), and ensemble MLP with genetic algorithm (GA) have been applied in [26] using the WBCD dataset. The study accomplished an accuracy of 98% with MLP-GA and holdout approach while 99.7% using MLP- GA and cross-validation. In [27], an artificial neural network (ANN) was optimized by integrated artificial immune system and artificial bee colony (IAIS-ABC-CDS), momentum-based gradient descent backpropagation (MBGD), simulated annealing (SA), resilient backpropagation techniques (RBPT) and GA approach on to the publicly available WBCD dataset for breast cancer detection. The study achieved an accu- racy of 99.34% using IAIS-ABC-CDS with MBGD and 99.11% using IAIS-ABC-CDS with RBPT.
Bayesian classifier-embedded integrated genetic-driven framework, GA, kernel- based Bayesian classification was applied by Wuniri et al. [48] on the WDBC dataset for the diagnosis of breast cancer attaining 97.1% accuracy. Abunasser at el. [28] utilized DL model Xception over BreakHis dataset collected from the Kaggle repository and achieved the accuracy of 99.78% for training, 98.59% for validation, and 97.60% for testing. Additionally, the Xception model showed a precision of 97.60%, recall of 97.60%, and an F1 score of 97.58%. In [29], authors applied hlCNN to the BreakHis dataset for the accurate diagnosis of breast cancer and secured the training accuracy of approximately 99% and testing accuracy of 97.80%.
The study [30] demonstrates the application of a hybrid gravitational search optimization algorithm and emperor penguin optimization (HGSAEPO) for the feature selection while RF, SVM, LR, decision tree (DT), and KNN for the classification of breast tumor into benign or malignant categories. The accuracy was 98.31% with HGSAEPO and RF showing a 97% sensitivity, 98.87% specificity, 98% precision, and 95.39% F1 score. Kadhim et. al [31] performed the comparison of different ML techniques comprising of DT, quadratic discriminant analysis, AdaBoost, bagging meta estimator, extra randomized trees (ERT), Gaussian process classifier, Ridge, Gaussian Naive Bayes (GNB), KNN, MLP, and hlSVM classifier. The authors found out that on the WDBC dataset, a 97.36% accuracy was achieved in the case of ERT which outperformed other algorithms for breast cancer diagnosis.
Comparative analysis of breast cancer detection surveys with respect to research questions addressed in the study.
Ref | Datasets | H-Index | Segmentation | Feature selection | XAI |
---|---|---|---|---|---|
[19] | WBCD and only image-based datasets (ultra-sound, histopathology, MRI, etc.) | No | Yes | Yes | No |
[20] | WBCD datasets and image-based | No | No | Yes | No |
[21] | WBCD datasets and image-based | No | No | No | No |
[22] | Image-based datasets only | No | Yes | Yes | No |
[23] | WBCD, WDBC and image-based datasets | No | No | Yes | No |
[24] | Only image-based datasets | No | Yes | Yes | No |
Structure of the paper with section and subsections.
The authors employed ANN in [32] for breast cancer detection through WBC and WDBC datasets securing an accuracy of 99.85% on WBCD and 99.47% on WDBC. In [33], Yusuf et. al described LR, SVM, RF, gradient boost (GB), and AdaBoost hl(AB) for the classification of breast cancer tumors into benign and malignant categories using the WDBC dataset achieving the accuracy of about 99% with LR, RF, and hlAB. Rakibul et al. [34] employed LR and SVM including linear SVM (LSVM), and quadratic SVM (QSVM) to WBCO, WDBC, and WPBC datasets and attained the accuracy of 94% for WBCO, 97.4% using QSVM on the WDBC dataset, and 83.5% using LR on the WPBC dataset.
In [35], wrapper subset selection method, correlation analysis, and principal component analysis (PCA) are used for feature selection and NB, SVM, DT, KNN, RF, LR, stochastic gradient descent learning-based ensemble classification methodology for breast cancer diagnosis is adopted. A 98.24% accuracy was achieved using the WDBC dataset. Huang Z and Chen D. A [36] applied variable importance measure (VIM), hierarchical clustering RF algorithm, DT, hlAB, and RF models on WBCD and WDBC datasets with accuracy of 97.05% on WDBC, and 97.76% on WBC with HCRF. KNN, chi-square-based feature selection, L1 based selection from model feature selection are applied in [37] on the WBCD and WDBC datasets having an accuracy of 99.42% for WBC, and 99% for WDBC dataset with L1-based feature selection.
Dragonfly algorithm (DA), PCA, DL models, SVM, RF, and KNN were utilized by Ibrahim et. al. [38] for breast cancer detection and achieved 97.90% accuracy. In [39], Akkur et al. used relief and binary Harris hawk optimization (BHHO) hybrid model, KNN, SVM, LR, and NB for the diagnosis of breast cancer using the WDBC, WBCD and mammographic breast cancer dataset (MBCD). They secured an accuracy of 98.77% for the WDBC dataset. For the WBCD, 99.28% accuracy and for MBCD 97.44% accuracy was secured with relief-BHO-SVM. Ensemble filter-based feature selection with 1-D CNN (1D-CNN) was employed in [40] with an accuracy of 98.5% via the WDBC dataset.
In [41], the WDBC dataset was utilized for breast cancer detection through Pear- son's correlation coefficient, lasso, and minimum redundancy-maximum relevance (mRMR) for feature selection and SVM, light GBM, RF, DT, NB, KNN, LR were used for the classification of breast tumor into benign and malignant classes. Hossin et. al in [42] performed a comparison of different ML algorithms using univariate feature selection, recursive feature elimination, correlation heatmap, LR, RF, KNN, DT, hlAB, SVM, GB, and Gaussian NB. They found that LR and SVM are more effective as they attain an accuracy of 99.12% on the WDBC dataset. In [43], Sundar and the co-authors utilized the ResNet50v2 model of CNN and ensemble approach with DT, RF, ET, and XGBoost on invasive ductal carcinoma (IDC). The ensemble model achieved an accuracy of 99.82%.
The study [44] is associated with the usage of SVM and its parameters' fine-tuning for the diagnosis of breast cancer using WDBC attaining an accuracy of 95.61%. Doaa et al. in [45] utilized thermal images from the DMR-IR dataset and employed Gabor filters, canny edge detection, and holistically nested edge detection, CNN, RESNET- 50, SVM, and XGB achieving 96.23% accuracy. Saurav and co-authors utilized TCGA and applied RF, SVM, DT, KNN, Gaussian NB, and XGBoost in [46] and got 97.19% accuracy. While in [47], XGBoost was used on the WDBC dataset and the resulting accuracy was 99.12%.
In spite of significant breakthroughs in the ML area for breast cancer detection, various gaps in the existing literature hamper its full prospective in medical implementations. These gaps comprise:
By tackling these gaps, this study aims to offer practical comprehension and facts for researchers and professionals in the domain. It advocates for the development of reliable, robust, interpretable, and clinically acceptable ML models for the detection of breast cancer, facilitating enhanced detection accuracy, early diagnosis, and better medical outcomes.
This research offers streamlined approaches for scrutinizing, classifying, and amalgamating the literature commensurate with the established objectives. This emphasizes the spheres that may set out as a strategy for anticipated research inclination in the particular domain. This survey has been carried out in a number of steps. The first step comprises the research question definitions, while in the second step, the research objectives have been developed using the pre-defined research questions. In the third step, the shortlisting strategy is formulated to find out the related articles after which they will get nominated, categorized, and scrutinized in conjunction with the research domain. Finally, the results were discussed and analyzed as per research questions. Figure 3 presents the adopted approach for this review.
The articulation of a search proposition to gather the related and original information within the specified area is the most critical step in the formulation of this review. This research examines relevant literature from 2019 to 2024, collected from a number of databases such as MDPI, IEEE, Elsevier, Springer, Neural Network World, and Computational and Mathematical Methods in Medicine. The relevant journals have higher H-index and good citation rates and consulted with specific keywords such as “machine learning,” “breast cancer detection,” “segmentation-based classification,” and “feature selection”, These keywords were used to identify relevant studies in academic databases like PubMed, Wiley, Springer, etc.
Applying the search string to the diverse digital repositories resulted in the acquirement of a huge amount of data, which needed to be shortlisted by going through a multi-stage shortlisting process. The research papers were selected on the basis of an H-Index criterion, Figure 4, and restricted to the publications from 2019 till 2024, Figure 5. After the removal of redundancy, the papers were scrutinized via abstract reading as well as results evaluation so that the most relevant articles were selected.
Benchmark datasets that are readily available to the public were used in shortlisted studies (e.g., WBCD, DDSM), such that results are reproducible. The experimental results focus was the ML techniques discussions, and the performance metrics reported were used to evaluate the articles. Studies with more complete experimental validation and comparison were preferred. Research papers across a wide range of ML techniques from traditional supervised learning to DL were included as part of the reporting effort to maintain a balanced review. Through adopting this systematic shortlisting approach, the survey conducts comprehensive and impartial scrutiny of the state-of- the-art ML methodologies applied for breast cancer detection.
PRISMA approach for this review.
Number of papers per H-index.
Year-wise publications.
The following kinds of research and datasets were incorporated in this survey.
The following kinds of research were excluded while shortlisting.
Table 2 shows the distribution of selected papers with respect to the publisher.
Breast cancer is one of the prevailing cancers across the world, influencing a wide range of individuals perennially. It is attributed to the unhampered or abnormal division of malicious cells in breast tissues, together they become a malignant or cancerous tumor invading the surrounding cells. Although this cancer primarily targets women, men can also become a victim of it yet at minimum frequency. As per the World Health Organization (WHO), the predominant cause of cancer-related casualties in women is breast cancer, with substantial distinctness in the rate of occurrence and death toll worldwide because of inequalities in healthcare availability, cognizance, and early diagnosis programs.
Publisher-wise distribution of papers with corresponding references.
Publisher | Count | Reference Papers |
---|---|---|
IEEE Access | 14 | [36, 45, 48, 53-63] |
MDPI | 11 | [18, 34, 49, 64-70] |
Springer | 5 | [30, 71-75] |
BMC Series | 3 | [46, 76, 77] |
Elsevier | 3 | [17, 32] |
Wiley | 3 | [51, 78, 79] |
Advances in Artificial Intelligence and Machine Learning | 2 | [66, 80] |
Archives of Breast Cancer | 1 | [66] |
International Journal of Advanced Computer Science and Applications | 1 | [28] |
Asian Pacific Journal of Cancer Prevention | 1 | [29] |
International Journal of Electrical and Computer Engineering | 1 | [41] |
Bulletin of Electrical Engineering and Informatics | 1 | [42] |
International Journal of Integrated Engineering | 1 | [81] |
Concurrent Engineering Research and Applications | 1 | [52] |
International Journal of Reconfigurable and Embedded Systems | 1 | [31] |
International Journal of Image, Graphics and Signal Processing | 1 | [34] |
Journal of Experimental and Theoretical Artificial Intelligence | 1 | [82] |
IOP Conference Series Materials Science and Engineering | 1 | [83] |
Journal of the Nigerian Society of Physical Sciences | 1 | [33] |
Journal of the Chinese Institute of Engineers | 1 | [37] |
Jundishapur Journal of Microbiology | 1 | [40] |
African Journal of Biomedical Research | 1 | [44] |
Journal of Electrical Systems | 1 | [84] |
Automation Controls & Engineering | 1 | [85] |
Simulation | 1 | [86] |
Neural Network World | 1 | [39] |
MATEC Web of Conferences | 1 | [43] |
Jundishapur Journal of Natural Pharmaceutical Products | 1 | [50] |
Substantial psychological distress including anxiety, fear of recrudescence, financial stress, and depression for the patient and their families is caused when breast cancer is diagnosed. The diagnosis of breast cancer often leads to a financial burden for low- income families as the cost associated with the surgeries, radiation, chemo-therapies, and related treatment is significant. The screening programs are scarce in third-world countries leading to the diagnosis at the last stage of cancer. Many people in such countries owing to the disparities are reluctant to regular scan of breast tissues due to the cost associated with it leading to distress and suffering.
A multidisciplinary approach is required for the treatment of breast cancer involving a versatile number of surgeons and oncologists which impose a substantial burden on healthcare systems. However, countries having early detection tools and well-established screening systems have higher chances of survival as compared to third-world countries where these facilities are scarce and the available programs are costly for log-wage families. So, there is a need for early detection including scalable and cost-effective AI-driven solutions for the diagnosis of breast cancer.
Lowering breast cancer mortality rate and improving the health of breast cancer patients depend on early detection. However, if diagnosed early enough, there is a 90 out of 100 chance the person will survive breast cancer. These limitations do not preclude the use of ultrasound or mammography, classic diagnostic tools that suffer however operator dependency and the concomitant variability of its interpretation, failing to satisfy needs in underserved areas. What this means is that early detection and the capability of doing so are being greatly enhanced by ML as a powerful tool. Large datasets are used by ML models to uncover patterns and anomalies that a human observer might miss. This is a characterization of malignant tumors at a greater speed and with accuracy, facilitated by the use of image segmentation, feature extraction, and predictive modeling techniques. On top of these, ML-based tools can further filter out high-risk prospective patients, grade a case's urgency, and help clinicians update their diagnostic workflow and corresponding patient outcomes.
This section describes the conclusions and key findings attained after analyzing the 40 publications selected in this survey. All RQs are briefly described in order to clarify the respective exploring areas of the breast cancer detection domain.
Different publicly available datasets have largely contributed to the advancement of ML and DL techniques for breast cancer detection. These datasets differ in size, imaging modalities, annotations, and patient demographics, enabling researchers to develop and evaluate diverse models.
The datasets that have been widely utilized in the study are WDBC and WBC datasets as illustrated in Figure 6. This chart demonstrates the popularity of certain datasets in breast cancer detection literature.
Conclusively, the supremacy of WDBC and WBC emphasizes their powerful dominance in the literature of breast cancer detection, probably because of their fine data, availability, and continuous domination. Other datasets which have been least employed so far are either new or less accessible to the general public which contributes to their limited application.
The pie chart in Figure 7 represents the apportionment of the datasets utilized for breast cancer diagnosis using various ML methodologies, partitioned into three classes: WDBC, WBC, and WPBC. Following is a comprehensive analysis of the figure targeting its relativity to breast cancer detection via ML approaches.
These datasets are widely utilized for the purpose of classification, prediction, and feature extraction or selection. WDBC and WDBC due to their enormous sizes are far suited for the diagnosis and prognosis scenarios.
A huge number of samples of these datasets contribute to the extrapolability of the ML models trained on these datasets. However, WBC due to its smaller size is not well suited for training complex or large-scale ML models but can be efficiently utilized for fundamental-level ML models.
The given chart in Figure 8 demonstrates the sizes of certain datasets based on images applied in breast cancer detection literature, presenting an analysis of their spectrum and scope. Following is a detailed overview:
Number of research papers per dataset.
Size of feature-based datasets.
BreakHis and VinDr-Mammo are two huge datasets, which makes them ideal for data- driven approaches such as DL. Small-scale datasets, like the Dunya Women's Cancer Dataset, IDC, UPFE DB, and DMR-IR DB, are suitable for focused research purposes or pilot studies. The size of the dataset performs a significant role in determining its applications, with large-scale datasets advocating the advanced techniques while smaller ones facilitating the targeted research. This apportionment highlights the significance of the selection of datasets on the basis of the study objective, with larger datasets like BreakHis being essential for high-level research and smaller ones serving as the focused ones. Figure 8 shows the size of image-based datasets.
Size of image-based datasets.
Most popular ML approaches in the literature.
Approaches such as SVM, RF, DTs, and KNN have a huge number of applications for the binary classification of breast cancer tumors as is shown in Figure 9.
CNNs have been shown to be state-of-the-art in medical imaging data analysis and work in the best interest of image data classification.
Besides the widely used models like RF, KNN, etc, some ML models are moderately used in existing literature.
A few ML models have been used in a few studies for breast cancer detection.
Using ML to automate diagnostic processes in breast cancer detection has advanced its detection vastly. They are able to process massive medical imaging and patient data to find hints at malignancies. ML techniques broadly fall into two categories: The methods that I explore are segmentation-based classification and feature selection- based classification. Analysis of medical datasets using these two approaches can lead to unique advantages in the understanding of cancer, and their integration into clinical workflows will bring transformative improvements in cancer care.
Isolating regions of interest (e.g., tumors) from medical images as preprocessing is an important task, termed segmentation. This focuses on relevant anatomical structures and therefore enhances the reliability of subsequent diagnostic processes in case of accurate segmentation.
Radwan et al. [17] used YOLOv5, MedSAM segmentation models and contrast- limited adaptive histogram equalization (CLAHE) algorithm along with a Gaussian blur, ensemble deep random vector, functional link neural network algorithm for breast cancer diagnosis. While Sarfaraz et al. [16] applied H and E staining, Nuclei segmentation, nuclei-based instance segmentation as well as PCA and PSO for feature selection, RF, LR, NB, KNN, SVM, digital image analysis, and CNN for detection of breast cancer after analyzing WDBC, and WSI datasets.
Feature selection is the process that includes the identification of the very relevant and preferred features from multi-dimensional datasets available in the digital public repositories. This process ameliorates the overall performance of different ML classification models. The approach is specifically valuable for patient-related confidential data involving histopathology-based features, statistics, and genes-related data.
i. Recursive Feature Elimination: This technique recurrently minimizes the least significant features enhancing the performance of the ML model, assuring that only the important attributes should be selected for the classification purposes.
ii. Principal Component Analysis: PCA is a feature selection technique for the data transformation to the group of stochastic elements, securing the crucial divergence. This technique has been effectively employed for the dimensionality reduction and duplication of the datasets of breast cancer.
iii. Evolutionary Feature selection: EFS is a technique utilized in the area of ML to ameliorate the performance of ML classification models. It applies the evolutionary algorithms (e.g. hlGA, PSO, ICA) for the identification of the subset of attributes that assist in the effective contribution to the anticipation of the accuracy of the classification model. By modeling the process of Darwinism (survival of the fittest), EFS recurrently chooses and combines the relevant attributes to search for the optimal combination through which the performance of the classification is enhanced while reducing the computational complexity.
Saeed et al. [82] has utilized ensemble classification based on MLP neural network, evolutionary algorithm (GA, PSO, and ICA) on WBCD original dataset seeking the classification accuracy of about 98.74%. Roger and the co-authors [18] applied GA and SVM for breast cancer detection over the datasets containing thermal images available in the database for mastology research with infrared image (DMR-IR) and private thermal image database of the Federal University of Pernambuco (UFPE) while achieving the accuracy of 97.18%. Sahar A. [74] employed GA, RFE, rough set feature selection, and PCA for feature selection along with DT, KNN, ANN, SVM, RF, and relief methods for the classification of breast cancer tumors. Khatereh [86] used GA for feature selection in the BreakHis dataset and CNN for classification purposes.
Through exploiting segmentation-based and feature selection-based methodologies, ML classification techniques for the diagnosis of breast cancer have become vigorous and trustworthy. These techniques work together side by side, providing extensive solutions for the analysis of both image-based and feature-based datasets.
As the ML and DL approaches have become complicated to a greater extent, one of the crucial challenges in their real-world acceptance is the meagerness of interpretation and explainability. XAI signifies a collection of approaches developed to cause AI-led resolution more lucid, interpretable, and reasonable for clinicians and patients.
i. Analysis of Important Features: Approaches like local interpretable model- agnostic explanations (LIME) and Shapley additive explanations (SHAP) can assist in emphasizing the relevant features (e.g., tumor size, shape, or density) affected a model's prediction.
ii. Attention Procedure in DL Approaches: Models such as attention-driven neural networks offer visual interpretation, that makes it simpler to elucidate how a DL model handles clinical images.
iii. Rule-Driven Models and DTs: Although DL techniques provide high accuracies, more straightforward rule-based approaches or hlDTs can be utilized in conjunction with them to enhance interpretation.
iv. Saliency Maps and HeatMaps: CNNs can produce heatmaps for the visualization of the regions within histopathological images or mammograms that play an essential part in the classification decision.
In real clinical settings, SHAP and LIME are mostly utilized to pinpoint the highly influential features or image regions behind the prediction of a model. For example, in mammography or biopsy image analysis, SHAP can visually highlight which parts of a tumor contributed highest to a malignancy prediction assisting radiologists validate or question the AI's outcome.
The ML models employed for the diagnosis of breast cancer are frequently evaluated through a number of performance metrics. These performance metrics cater to understanding the algorithms' abilities for the accurate classification of cancerous or non-cancer tumors. The most popular of them is the accuracy. Following is an examination on the basis of Figure 10.
(1)
where TP means true positive, TN is true negative, FP is false positive and FN is false negative. The accuracy of various algorithms ranges between 85 and 100%.
(2)
Accuracy for each classification model.
(3)
(4)
(5)
Note: For Figure 10 and 11, accuracy values are mainly derived from 10-fold cross validation or test datasets, as reported by respective authors. For Figure 12, performance metrics show test dataset accuracies or average cross-validation scores based source publications.
In breast cancer detection research, the credibility of various ML algorithms is substantially affected by the choice of the dataset. Datasets such as BreakHis, WDBC, and WPBC vary enormously in terms of size, complications, and structure, which may have a strong influence on the performance benchmarks of certain models.
There are some large-sized datasets such as BreakHis utilized in [48] and [28] comprising a huge repository of images (histopathological images) which is a source of plenty of data for training DL models like hlCNNs achieving accuracies of more than 99 per- cent.
Contrarily, smaller datasets such as WDBC require vigilant feature selection and are more suitable for conventional ML algorithms like Random Forest (RF) and Support Vector Machines (SVM). These datasets benefit from hybrid approaches that involve preprocessing steps like feature selection prior to classification, often reaching accuracies between 96 percent and 99 percent.
When small sized datasets such as WPBC are employed with complex models, challenges such as overfitting arise. In such cases, hybrid or ensemble methods where combination of feature selection with classification is employed are critical to enhance generalization.
Conclusively, the performance of ML models is essentially related to dataset characteristics. Larger datasets are suitable for DL models, while small tabular datasets favor hybrid approaches. Future research should emphasize on leveraging diversity of the dataset and structure-aware techniques to attain optimal performance.
There is a considerable bias in breast cancer detection research because of the prevalent use of particular datasets like BreakHis and WDBC. This over-dependency halts the generalizability of models, as they often fail when employed to different or unseen forms of data.
Accuracy for different models for different datasets.
Accuracy for different models using WBCD and WDBC datasets.
Datasets such as BreakHis continue to dominate the research arena because of their huge size and image-based characteristics. However, models trained specifically on such datasets face performance issues on other datasets, such as clinical or thermal imaging data. For example, CNN-based models that best perform on BreakHis struggle with WDBC due to the differences in format and structure of the data [28].
To mitigate this prejudice, researchers should incorporate multiple variety of datasets and validate ML models across them. Integrating BreakHis with WPBC or thermal imaging datasets ameliorates the generalizability. Generative models like GANs can also be utilized to create synthetic illustrations, assisting in addressing the class imbalance and scarcity of rare cases.
One study showed excellent outcomes when training on BreakHis, but the same model failed to replicate the same performance over WDBC highlighting the significance of cross-dataset validation. Merging datasets like WDBC and WPBC, as done in [74], proves helpful.
In a nutshell, excessive dependency on a few datasets can crook the outcomes and reduce medical relevance. Introducing dataset diversity, carrying out cross-validation across various datasets, and synthesizing the benchmark standards can significantly enhance the fairness and robustness of ML models in breast cancer detection.
Many factors influence the selection of algorithm for breast cancer detection. These include type of the dataset (either image or tabular), dataset size, features structure, model complexity, and resource availability.
Convolutional Neural Networks (CNNs) are mostly selected for image-based datasets like BreakHis because of their strong feature extraction capabilities. Contrarily, algorithms like SVM and RF are prioritized for structured datasets like WDBC, where tabular features are usually relevant.
In researches preferring sensitivity, RF has shown strong performance over tabular datasets. Similarly, CNNs have shown high accuracy on image datasets. However, resource-intensive models like GANs may not be a good choice in the environment with limited computing power. In such cases, simpler models like Decision Trees (DT) and Logistic Regression (LR) provide more practical alternatives.
In clinical settings where interpretability is crucial, models such as SVM and DT are often preferred over complex DL models. For example, [31] used CNNs to achieve 99.78 percent accuracy on image-based data, Conversely, [29] employed SVM on WDBC, reaching 96.82 percent accuracy with engineered features.
Ensemble methodologies such as combining segmentation or feature extraction with DL models further ameliorate performance. Therefore, model selection must take not only accuracy into account but also the elements of deployment and resource constraints.
Conclusively, factors such as dataset type, desired performance metrics, interpretability, and computational cost altogether guide the algorithm selection. Using hybrid strategies, AutoML frameworks, and dataset-aware techniques can enhance performance of diagnosis while keeping robustness intact.
In breast cancer detection, attaining high accuracy is crucial, however, interpretability is equally important, especially in clinical environment. This often results in a trade- off where interpretable models miss accuracy, and highly accurate models are not transparent.
DL models such as CNNs trained on BreakHis can achieve accuracies more than 99 percent, but they are considered as “black boxes.” This opaqueness restricts their acceptance in clinical settings. Contrarily, simpler models trained on tabular datasets may offer less accuracy but are easier for clinicians to comprehend.
This trade-off offers challenges in model deployment. Healthcare professionals may hesitate to trust such non-transparent models, even if they are accurate. Conversely, relying solely on interpretable models could result into false positives or negatives if accuracy is compromised.
To tackle this issue, hybrid and ensemble approaches have been explored. These models combine several algorithms to balance accuracy and interpretability. For example, in [33], Random Forests offers 100 percent accuracy and are relatively interpretable. Other studies [28, 29, 32] reported top accuracies but do not provide model explanation techniques.
Explainable AI (XAI) tools like SHAP and LIME can assist in making DL models more interpretable. However, their incorporation into research and clinical environment remains scarce. More work is required to make these tools a benchmark practice.
In a nutshell, addressing the accuracy interpretability trade-off is critical. Hybrid methodologies, ensemble models, and XAI integration provide feasible paths forward to ensure reliable and trustworthy diagnostic systems for breast cancer.
While the referenced literature on breast cancer detection techniques provides valuable information, it is important to acknowledge certain limitations and challenges identified in all studies.
The deduction of this study highlights that the performance and generalizability of various ML models are impacted by the dataset choice such as WDBC, BreakHis, etc. in the diagnosis of breast cancer. Large-sized image-based datasets like BreakHis facilitate DL algorithms and the accuracy of more than 99% was attained. Whereas, small-sized tabular-based datasets such as WDBC require cautious feature development and hybrid approaches for reaching sufficient accuracy. While the over-dependency on various datasets incorporates prejudice, restraining clinical applications. The priorities of future research should include the versatility of datasets, incorporating multi-faceted data such as image-based, and genome-based medical data to improve the robustness of the model. For progressing and developing in the area of ML, target of the future studies must be:
Alongside the above dimensions, there is another research question arose i.e. what will be the future trend of this research? i.e. How can the simpler algorithms such as hlLR in contrast to the more complex models like DL concerning computational cost and accuracy?
In addition to this, the challenge of the trade-off between the accuracy and the interpretation will remain intact with the elementary models which provide lucidity but do not offer higher accuracies as offered by the DL models. Taking up the XAI, cross- dataset validation, and streamlining approaches are an important breakthrough. A roadmap is provided for progressing the diagnosis of breast cancer in this survey via rational, exact, and understandable ML approaches.
This study was funded by the Universidad Europea del Atlantico.
AS conceptualization, data curation, writing - the original draft.
MU formal analysis, conceptualization, writing - the original draft.
MTS methodology, data curation, formal analysis.
MZ software, project administration, methodology.
SAO funding acquisition, visualization, investigation.
RCI investigation, validation, resources.
SH visualization, software, formal analysis.
IA supervision, validation, writing- review and editing.
All authors read and approved the final manuscript.
The authors have declared that no competing interest exists.
1. Alturki N, Umer M, Ishaq A, Abuzinadah N, Alnowaiser K, Mohamed A. et al. Combining CNN features with voting classifiers for optimizing performance of brain tumor classification. Cancers. 2023;15(6):1767
2. Ahmed KT, Rustam F, Mehmood A, Ashraf I, Choi GS. Predicting skin cancer melanoma using stacked convolutional neural networks model. Multimedia Tools and Applications. 2024;83(4):9503-9522
3. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians. 2021;71(3):209-249
4. Xu Y, Gong M, Wang Y, Yang Y, Liu S, Zeng Q. Global trends and forecasts of breast cancer incidence and deaths. Scientific data. 2023;10(1):334
5. Health at Hand. Breast cancer awareness month. 2020. Available from: https://www.myhealthathand.com/breast-cancer-awareness-2020/.
6. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA: a cancer journal for clinicians. 2023;73(1):17-48
7. Abuzinadah N, Kumar Posa S, Alarfaj AA, Alabdulqader EA, Umer M, Kim TH. et al. Improved prediction of ovarian cancer using ensemble classifier and Shapley explainable AI. Cancers. 2023;15(24):5793
8. Chen X, Aljrees T, Umer M, Saidani O, Almuqren L, Mzoughi O. et al. Cervical cancer detection using K nearest neighbor imputer and stacked ensemble learning model. Digital Health. 2023;9:20552076231203802
9. DAWN. Pakistan has highest rate of breast cancer cases in Asia: expert. Available from: https://www.dawn.com/news/1872838.
10. Shafi I, Din S, Khan A, Díez IDLT, Casanova RJP, Pifarre KT. et al. An effective method for lung cancer diagnosis from CT scan using deep learning-based support vector network. Cancers. 2022;14(21):5457
11. Chaganti R, Rustam F, De La Torre Díez I, Mazón JLV, Rodríguez CL, Ashraf I. Thyroid disease prediction using selective features and machine learning techniques. Cancers. 2022;14(16):3914
12. Shafique R, Rustam F, Choi GS, Díez IDLT, Mahmood A, Lipari V, Velasco CLR, Ashraf I. Breast cancer prediction using fine needle aspiration features and upsampling with supervised machine learning. Cancers. 2023;15(3):681
13. Umer M, Naveed M, Alrowais F, Ishaq A, Hejaili AA, Alsubai S, Eshmawi A, Mohamed A, Ashraf I. Breast cancer detection using convoluted features and ensemble machine learning algorithm. Cancers. 2022;14(23):6015
14. Karamti H, Alharthi R, Anizi AA, Alhebshi RM, Eshmawi A, Alsubai S, Umer M. Improving prediction of cervical cancer using KNN imputed SMOTE features and multi-model ensemble learning approach. Cancers. 2023;15(17):4412
15. Rupapara V, Rustam F, Aljedaani W, Shahzad HF, Lee E. et al. Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model. Scientific Reports. 2022;12(1):1000
16. Gopal VN, Al-Turjman F, Kumar R, Anand L, Rajesh M. Feature selection and classification in breast cancer prediction using IoT and machine learning. Measurement. 2021;178:109442
17. Qasrawi R, Daraghmeh O, Qdaih I, Thwib S, Polo SV, Owienah H, Al-Halawa DA, Atari S. Hybrid ensemble deep learning model for advancing breast cancer detection and classification in clinical applications. Heliyon. 2024 10(19)
18. Resmini R, Silva L, Araujo AS, Medeiros P, Muchaluat-Saade D, Conci A. Combining genetic algorithms and SVM for breast cancer diagnosis using infrared thermography. Sensors. 2021;21(14):4802
19. Chugh G, Kumar S, Singh N. Survey on machine learning and deep learning applications in breast cancer diagnosis. Cognitive Computation. 2021;13(6):1451-1470
20. Yadav RK, Singh P, Kashtriya P. Diagnosis of breast cancer using machine learning techniques - a survey. Procedia Computer Science. 2023;218:1434-1443
21. Priyanka KS. A review paper on breast cancer detection using deep learning. IOP Conference Series: Materials Science and Engineering. 2021;1022:012071
22. Meenalochini G, Ramkumar S. Survey of machine learning algorithms for breast cancer detection using mammogram images. Materials Today: Proceedings. 2021;37:2738-2743
23. Fatima N, Liu L, Hong S, Ahmed H. Prediction of breast cancer, comparative review of machine learning techniques, and their analysis. IEEE Access. 2020;8:150360-150376
24. Rautela K, Kumar D, Kumar V. A systematic review on breast cancer detection using deep learning techniques. Archives of Computational Methods in Engineering. 2022;29(7):4599-4629
25. Dhahri H, Rahmany I, Mahmood A, Al Maghayreh E, Elkilani W. Tabu search and machine-learning classification of benign and malignant proliferative breast lesions. Biomed Research International. 2020;2020:4671349
26. Abdollahi J, Keshandehghan A, Gardaneh M, Panahi Y, Gardaneh M. Accurate detection of breast cancer metastasis using a hybrid model of artificial intelligence algorithm. Archives of Breast Cancer. 2020;7(1):22-28
27. Punitha S, Al-Turjman F, Stephan T. An automated breast cancer diagnosis using feature selection and parameter optimization in ANN. Computers & Electrical Engineering. 2021;90:106958
28. Abunasser BS, Al-Hiealy MRJ, Zaqout IS, Abu-Naser SS. Breast cancer detection and classification using deep learning Xception algorithm. International Journal of Advanced Computer Science and Applications. 2022;13(7):566-574
29. Abunasser BS, Al-Hiealy MRJ, Zaqout IS, Abu-Naser SS. Convolution neural network for breast cancer detection and classification using deep learning. Asian Pacific Journal of Cancer Prevention. 2023;24(2):531-537
30. Singh LK, Khanna M, Singh R. An enhanced soft-computing based strategy for efficient feature selection for timely breast cancer prediction: Wisconsin diagnostic breast cancer dataset case. Multimedia Tools and Applications. 2024;83(31):76607-76672
31. Kadhim RR, Kamil MY. Comparison of breast cancer classification models on Wisconsin dataset. International Journal of Reconfigurable and Embedded Systems. 2022;11(1):49-55
32. Alshayeji MH, Ellethy H, Abed S, Gupta R. Computer-aided detection of breast cancer on the Wisconsin dataset: An artificial neural networks approach. Biomedical Signal Processing and Control. 2022;71:103141
33. Yusuf A, Dima R, Aina S. Optimized breast cancer classification using feature selection and outliers detection. Journal of the Nigerian Society of Physical Sciences. 2021;3(4):298-307
34. Hasan R, Shafi A. Feature selection based breast cancer prediction. International Journal of Image, Graphics and Signal Processing. 2021;13(2):13-21
35. Ibrahim S, Nazir S, Velastin SA. Feature selection using correlation analysis and principal component analysis for accurate breast cancer diagnosis. Journal of Imaging. 2021;7(11):225
36. Huang Z, Chen D. A breast cancer diagnosis method based on VIM feature selection and hierarchical clustering random forest algorithm. IEEE Access. 2021;10:3284-3293
37. Mushtaq Z, Yaqub A, Sani S, Khalid A. Effective k-nearest neighbor classifications for Wisconsin breast cancer datasets. Journal of the Chinese Institute of Engineers. 2020;43(1):80-92
38. Ibrahim MM, Salem DA, Seoud RAAAA. Deep learning hybrid with binary dragonfly feature selection for the Wisconsin breast cancer dataset. International Journal of Advanced Computer Science and Applications. 2021;12(3):74-82
39. Akkur E, Türk F, Eroğul O. Breast cancer classification using a novel hybrid feature selection approach. Neural Network World. 2023;33(2):77-94
40. Krishnan VG, Saradhi MV, Deepa J, Priya KH, Selvaraj D, Divya V. An ensemble filter-based feature selection with deep learning classification for breast cancer prediction using IoT. Journal of Ambient Intelligence and Humanized Computing. 2022;13(7):3239-3254
41. Al Tawil A, Almazaydeh L, Alqudah B, Abualkishik AZ, Alwan AA. Predictive modeling for breast cancer based on machine learning algorithms and feature selection methods. International Journal of Electrical and Computer Engineering. 2024;14(2):1103-1111
42. Hossin MM, Shamrat FJM, Bhuiyan MR, Hira RA, Khan T, Molla S. Breast cancer detection: an effective comparison of different machine learning algorithms on the Wisconsin dataset. Bulletin of Electrical Engineering and Informatics. 2023;12(4):2446-2456
43. Sundar R, Srinivasulu C, Anusha MB, Brahmaiah M, Srikanth T, Gupta KG. Enhancing breast cancer detection from histopathology images: A novel ensemble approach with deep learning-based feature extraction. MATEC Web of Conferences. 2024;392:01139
44. Veena S, Aravindhar DJ. Detection of breast cancer using support vector machine algorithm with fine-tuning and optimization. African Journal of Biomedical Research. 2024;27(3):2256-2261
45. Youssef D, Atef H, Gamal S, El-Azab J, Ismail T. Early breast cancer prediction using thermal images and hybrid feature extraction-based system. IEEE Access. 2025;13:1-10
46. Das SC, Tasnim W, Rana HK, Acharjee UK, Islam MM, Khatun R. Comprehensive bioinformatics and machine learning analyses for breast cancer staging using TCGA dataset. Briefings in Bioinformatics. 2025;26(1):628
47. Jagetiya A, Dadhech P. Optimizing breast cancer prognosis with machine learning for enhanced clinical decision-making. Bio-Algorithms and Med-Systems. 2024;20(1):37-48
48. Wuniri Q, Huangfu W, Liu Y, Lin X, Liu L, Yu Z. A generic-driven wrapper embedded with feature-type-aware hybrid Bayesian classifier for breast cancer classification. IEEE Access. 2019;7:119931-119942
49. Ali SH, Shehata M. A new breast cancer discovery strategy: A combined outlier rejection technique and an ensemble classification method. Bioengineering. 2024;11(11):1148
50. Dehghan MJ, Azizi A. A hybrid intelligent approach to breast cancer diagnosis and treatment using grey wolf optimization algorithm. Jundishapur Journal of Natural Pharmaceutical Products. 2024;18(4):e1345
51. Aamir S, Rahim A, Aamir Z, Abbasi SF, Khan MS, Alhaisoni M. et al. Predicting breast cancer leveraging supervised machine learning techniques. Computational and Mathematical Methods in Medicine. 2022;2022(1):5869529
52. Asare M. Evaluating feature selection methods in machine learning with class imbalance. Master's Thesis, The University of Texas Rio Grande Valley. 2024
53. Chen H, Mei K, Zhou Y, Wang N, Cai G. Auxiliary diagnosis of breast cancer based on machine learning and hybrid strategy. IEEE Access. 2023;11:96374-96386
54. Haq AU, Li JP, Saboor A, Khan J, Wali S, Ahmad S. et al. Detection of breast cancer through clinical data using supervised and unsupervised feature selection techniques. IEEE Access. 2021;9:22090-22105
55. Batool A, Byun YC. Toward improving breast cancer classification using an adaptive voting ensemble learning algorithm. IEEE Access. 2024;12:12869-12882
56. Rastogi M, Vijarania M, Goel N, Agrawal A, Biamba CN, Iwendi C. Conv1d-LSTM: Autonomous breast cancer detection using a one-dimensional convolutional neural network with long short-term memory. IEEE Access. 2024;12:104221
57. Naseem U, Rashid J, Ali L, Kim J, Haq QEU, Awan MJ. et al. An automatic detection of breast cancer diagnosis and prognosis based on machine learning using ensemble of classifiers. IEEE Access. 2022;10:78242-78252
58. Sharmin S, Ahammad T, Talukder MA, Ghose P. A hybrid dependable deep feature extraction and ensemble-based machine learning approach for breast cancer detection. IEEE Access. 2023;11:87694-87708
59. Zheng J, Lin D, Gao Z, Wang S, He M, Fan J. Deep learning assisted efficient AdaBoost algorithm for breast cancer detection and early diagnosis. IEEE Access. 2020;8:96946-96954
60. Rahman MA, Hamada M, Sharmin S, Rimi TA, Talukder AS, Imran N. et al. Enhancing early breast cancer detection through advanced data analysis. IEEE Access. 2024;12:104569
61. Jebarani PE, Umadevi N, Dang H, Pomplun M. A novel hybrid K-means and GMM machine learning model for breast cancer detection. IEEE Access. 2021;9:146153-146162
62. Routray N, Rout SK, Sahu B, Panda SK, Godavarthi D. Ensemble learning with symbiotic organism search optimization algorithm for breast cancer classification and risk identification of other organs on histopathological images. IEEE Access. 2023;11:110544-110557
63. Elkorany AS, Marey M, Almustafa KM, Elsharkawy ZF. Breast cancer diagnosis using support vector machines optimized by whale optimization and dragonfly algorithms. IEEE Access. 2022;10:69688-69699
64. Reshan MSA, Amin S, Zeb MA, Sulaiman A, Alshahrani H, Azar AT. et al. Enhancing breast cancer detection and classification using advanced multi-model features and ensemble machine learning techniques. Life. 2023;13(10):2093
65. Chen H, Wang N, Zhou Y, Mei K, Tang M, Cai G. Breast cancer prediction based on differential privacy and logistic regression optimization model. Applied Sciences. 2023;13(19):10755
66. Ak MF. A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications. Healthcare. 2020;8:111
67. Khalid A, Mehmood A, Alabrah A, Alkhamees BF, Amin F, AlSalman H. et al. Breast cancer detection and prevention using machine learning. Diagnostics. 2023;13(19):3113
68. Avci H, Karakaya J. A novel medical image enhancement algorithm for breast cancer detection on mammography images using machine learning. Diagnostics. 2023;13(3):348
69. Rasool A, Bunterngchit C, Tiejian L, Islam MR, Qu Q, Jiang Q. Improved machine learning-based predictive models for breast cancer diagnosis. International Journal of Environmental Research and Public Health. 2022;19(6):3211
70. Sureshkumar V, Prasad RSN, Balasubramaniam S, Jagannathan D, Daniel J, Dhanasekaran S. Breast cancer detection and analytics using hybrid CNN and extreme learning machine. Journal of Personalized Medicine. 2024;14(8):792
71. Nahid AA, Raihan MJ, Bulbul AAM. Breast cancer classification along with feature prioritization using machine learning algorithms. Health and Technology. 2022;12(6):1061-1069
72. Botlagunta M, Botlagunta MD, Myneni MB, Lakshmi D, Nayyar A, Gullapalli JS. et al. Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms. Scientific Reports. 2023;13(1):485
73. Mohammed SA, Darrab S, Noaman SA, Saake G. Analysis of breast cancer detection using different machine learning techniques. Lecture Notes in Computer Science - Data Mining and Big Data. 2020;5:108-117
74. El Rahman SA. Predicting breast cancer survivability based on machine learning and features selection algorithms: a comparative study. Journal of Ambient Intelligence and Humanized Computing. 2021;12(8):8585-8623
75. Ajlan I, Murad H, Salim A, Yousif A. Extreme learning machine algorithm for breast cancer diagnosis. Multimedia Tools and Applications. 2024;83:1-20
76. 76. Taghizadeh E, Heydarheydari S, Saberi A, JafarpoorNesheli S, Rezaeijo SM. Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods. BMC Bioinformatics. 2022;23(1):410
77. Almarri B, Gupta G, Kumar R, Vandana V, Asiri F, Khan SB. The BCPM method: decoding breast cancer with machine learning. BMC Medical Imaging. 2024;24(1):248
78. Alanazi SA, Kamruzzaman M, Islam Sarker MN, Alruwaili M, Alhwaiti Y, Alshammari N. et al. Boosting breast cancer detection using convolutional neural network. Journal of Healthcare Engineering. 2021;2021:5528622
79. Chen H, Wang N, Du X, Mei K, Zhou Y, Cai G. Classification prediction of breast cancer based on machine learning. Computational Intelligence and Neuroscience. 2023;2023:6530719
80. Mohammed SA, Abeysinghe SD, Ralescu AL. Feature selection and comparative analysis of breast cancer prediction using clinical data and histopathological whole slide images. Advances in Artificial Intelligence and Machine Learning. 2023;3(3):1494-1525
81. Mashudi NA, Rossli SA, Ahmad N, Noor NM. Breast cancer classification: features investigation using machine learning approaches. International Journal of Integrated Engineering. 2021;13(5):107-118
82. Talatian Azad S, Ahmadi G, Rezaeipanah A. An intelligent ensemble classification method based on multi-layer perceptron neural network and evolutionary algorithms for breast cancer diagnosis. Journal of Experimental & Theoretical Artificial Intelligence. 2022;34(6):949-969
83. Hardani D, Nugroho H. Feature selection using rough set theory algorithm for breast cancer diagnosis. IOP Conference Series: Materials Science and Engineering. 2020;771:012017
84. Dada EG, Oyewola DO, Misra S. Computer-aided diagnosis of breast cancer from mammogram images using deep learning algorithms. Journal of Electrical Systems and Information Technology. 2024;11(1):38
85. Singh K, Shastri S, Kumar S, Mansotra V. BC-Net: Early diagnostics of breast cancer using nested ensemble technique of machine learning. Automatic Control and Computer Sciences. 2023;57(6):646-659
86. Davoudi K, Thulasiraman P. Evolving convolutional neural network parameters through the genetic algorithm for the breast cancer classification problem. Simulation. 2021;97(8):511-527
Corresponding authors: E-mail(s): nmtahirac.kr; imranashrafac.kr.