J Cancer 2025; 16(15):4316-4337. doi:10.7150/jca.118698 This issue Cite

Review

Divulging Patterns: An Analytical Review for Machine Learning Methodologies for Breast Cancer Detection

Alveena Saleem1*, Muhammad Umair1*, Muhammad Tahir Naseem2 Corresponding address, Muhammad Zubair3, Silvia Aparicio Obregon4,5,6, Ruben Calderon Iglesias4,7,8, Shoaib Hassan9, Imran Ashraf10 Corresponding address

1. Faculty of Information Technology and Computer Science, University of Central Punjab, Lahore, Pakistan.
2. Department of Electronic Engineering, Yeungnam University, Gyeongsan, 38541, Republic of Korea.
3. IRC-FDE, King Fahd University of Petroleum and Minerals, 31261, Dhahran, Saudi Arabia.
4. Universidad Europea del Atlantico, Isabel Torres 21, Santander, 39011, Spain.
5. Universidad Internacional Iberoamericana, Campeche 24560, Mexico.
6. Universidad Internacional Iberoamericana, Arecibo, Puerto Rico 00613, USA.
7. Universidade Internacional do Cuanza, Cuito, Bie, Angola.
8. Universidad de La Romana, La Romana, Republica Dominicana.
9. School of Computer Science and Engineering, Nanjing University of Science and Technology, Nanjing, 210094, Jiangsu, China.
10. Department of Information and Communication Engineering, Yeungnam University, Gyeongsan, 38541, Republic of Korea.
*These authors contributed equally to this work

Received 2025-6-2; Accepted 2025-9-8; Published 2025-10-20

Citation:
Saleem A, Umair M, Naseem MT, Zubair M, Obregon SA, Iglesias RC, Hassan S, Ashraf I. Divulging Patterns: An Analytical Review for Machine Learning Methodologies for Breast Cancer Detection. J Cancer 2025; 16(15):4316-4337. doi:10.7150/jca.118698. https://www.jcancer.org/v16p4316.htm
Other styles

File import instruction

Abstract

Graphic abstract

Breast cancer is a lethal carcinoma impacting a considerable number of women across the globe. While preventive measures are limited, early detection remains the most effective strategy. Accurate classification of breast tumors into benign and malignant categories is important which may help physicians in diagnosing the disease faster. This survey investigates the emerging inclination and approaches in the area of machine learning (ML) for the diagnosis of breast cancer, pointing out the classification techniques based on both segmentation and feature selection. Certain datasets such as the Wisconsin Diagnostic Breast Cancer Dataset (WDBC), Wisconsin Breast Cancer Dataset Original (WBCD), Wisconsin Prognostic Breast Cancer Dataset (WPBC), BreakHis, and others are being evaluated in this study for the demonstration of their influence on the performance of the diagnostic tools and the accuracy of the models such as Support vector machine, Convolutional Neural Networks (CNNs) and ensemble approaches. The main shortcomings or research gaps such as prejudice of datasets, scarcity of generalizability, and interpretation challenges are highlighted. This research emphasizes the importance of the hybrid methodologies, cross-dataset validation, and the engineering of explainable AI to narrow these gaps and enhance the overall clinical acceptance of ML-based detection tools.

Keywords: tumor detection, breast cancer, deep learning, segmentation

1. Introduction

The cell is a basic structural and functional unit of an organism which consists of numerous cell organelles. During a biological clock that a cell undergoes, it continues to grow and experiences cell division (mitosis) after a specified period of time. A cell becomes malignant or cancerous when it loses its capability to stop cell division. Such unnecessary mitosis led to the cells accumulating at a particular location and time forming a mass known as tumor [1]. Two kinds of tumors have been identified until now; benign means non-cancerous and malignant means cancerous. A cancerous tumor is malignant when it starts invading and damaging the nearby cells [2].

Breast cancer is a type of cancer that includes the cancerous tumor development in the tissues of the human breast. Every woman is at the peril of forming breast cancer at some stage of her life. The year 2020 observed the morbidity of more than four million women around the world [3] and the major reason for this wide-scale casualty is breast cancer. However, a significant number of them is gathered in third world countries which accommodate 72% cases. This death toll difference between economic- social areas has irked between 1990 and 2019 and this development is expected to proceed [4].

According to Health at Hand [5], in 2020, globally breast cancer is the most dominating type of carcinoma, afterwards colon and rectum cancer. Figure 1 shows the number of new cancer cases in women for the year 2023, indicating the highest number of breast cancers, with other types such ovarian, lung cancer, colon cancer, etc. [6-8]. These are the main cancer kinds across most countries; however, they vary with regard to their ranking across the world. According to The DAWN [9], Pakistan has the highest number of breast cancer cases in Asia, with an estimated 40,000 women falling victim to this fatal disease. Consultants at Shaukat Khanum Memorial Hospital connect the soaring incidence rate of breast cancer to Pakistan's orthodox societal norms and the lack of an advanced diagnostic system.

 Figure 1 

New cases in women for the year 2023, cancer statistics have been taken from [6].

J Cancer Image

To mitigate the mortality rate of breast cancer, early detection is crucial and can be bolstered through accurate classification of breast cancer tumors into benign called non-cancerous or malignant called cancerous classes [10]. Breast cancer has a considerable number of categorizations, which may help clinicians to recommend the best treatment. Binary classification or classification into two classes is most significant among them that is; whether the tumor is benign (non-cancerous) or malignant (invading the nearby cells) [11]. At present, it is crucial to group the cancer tumor as the acuteness of the ailment is figured out by these sorts of classifications. Various studies have been carried out utilizing certain ML methodologies and different datasets for the purpose of classifying the cancer tumor as benign or malignant [12, 13]. Such methodologies can help physicians to medicate the cancer properly. Over time, certain standard datasets have come to light in the literature that have been utilized by scientists for the diagnosis and prediction of breast cancer.

This review aims to address the gap in comprehending the latest trends and patterns in the evolution of breast cancer detection and the effectiveness of various detection methods, including deep learning (DL), feature selection-based, ensemble classifications, and image-based segmentation techniques. It further focuses on and evaluates the utilization and efficacy of a variety of datasets wielded while training breast cancer detection models, emphasizing their significance in improving detection accuracy.

1.1 Overview of Breast Cancer and its Societal Impact

In terms of women's deaths caused by cancer, breast cancer morbidity, and mortality numbers are considerable [3]. In addition, the phase extends beyond the patient's physical health in talking about the emotional, social, and economic outcomes of the disease. Families and caregivers often have to deal with a great amount of stress, and the costs of treatment and long-term care spiral ever higher in our healthcare systems. Further, in low-resource areas, disparities in access to healthcare enhance outcomes as there is often late-stage presentation. One of the measures taken toward combating breast cancer was to improve medical imaging, make public breast cancer awareness, and set up screening programs. These steps are being taken, but diagnostic inaccuracies persist, with high rates of false positives, and a need for expert interpretation. ML can address these limitations and represents real transformative potential to deliver precise, automated, and scalable detection and diagnosis solutions [4].

1.2 Challenges in Diagnosis and Treatment

The diagnosis of breast cancer and treatment includes several challenges that influence the accuracy, efficacy, and accessibility of watchfulness. Detection techniques mostly encounter certain limitations because of technological obstructions, human inference problems, and heavy costs [14, 15]. Likewise, therapeutic approaches should account for tumor diversity, individualized responses, and prospect consequences, creating personalized treatment intricate. Tackling such challenges necessitates a blend of enhanced diagnostic techniques, advanced care methods, and the integration of innovative and modern technologies like ML [16].

1.2.1 Limitations of Conventional Diagnostic Approaches

  • Heavy Cost and Restrained Approach: Modern imaging approaches like MRI and 3D techniques are costly and may not be accessible in resource-constrained environments ultimately leading to inequalities in early diagnosis [4, 16].
  • False Negative and False Positive Cases: The mammograms may create false-positive outcomes that can lead to unwanted biopsies, whereas false negative results linger the treatment, which minimizes the chances of early detection and care [17].
  • Bias in Interpretability: Oncologists' evaluation can oscillate on the basis of experience and training resulting in unpredictability in the diagnosis [18].
  • Radiation Subjection: The ionizing radiations can be subjected to the patients with frequent mammograms resulting in the anticipated risks [17].
  • Trouble in the Detection of Thick Breast Tissue: Thick breast tissue can cloak the tumors in mammograms, making it even more challenging to detect malevolence on time [5, 16].

1.3 Research Questions

For this review, we formulated the following research questions.

  • How is the performance of various ML models impacted or inflicted by the dataset choice such as WDBC, BreakHis, etc. in the diagnosis of breast cancer?
  • Do the research results contain the prejudice just because of the excessive utilization of prevailing datasets and how this can be alleviated?
  • What are the determinants that impact the selection of algorithms for the fact finder in regard to breast cancer detection?
  • Is there any trade-off between interpretation and accuracy while selecting the algorithm for diagnosis of breast cancer?
  • What will be the future trend of this research? i.e. How can the simpler algorithms such as Logistic regression hl(LR) in contrast to the more complex models like DL with respect to computational cost and accuracy?

1.4 Research Objectives

  • To analyze the application of ML techniques in breast cancer detection, we look into the role segmentation-based as well as feature selection-based methods play in enhancing diagnostic accuracy and efficiency.
  • To identify widely used datasets and popular ML methods, public and benchmark datasets widely used in breast cancer research and ML techniques (supervised learning, ML, ensemble methods) will be surveyed.
  • To evaluate performance metrics used for comparison, the survey attempts to standardize the notion of comparing the efficacy of various ML models, by evaluating the indicators such as accuracy, precision, recall, F1-score, and area under the curve (AUC).
  • To highlight limitations and propose future directions for research, these issues include data scarcity, model interpretability, and computational requirements, and we provide suggestions to help serve to advance the field.

1.5 Rationale of this Survey

The extending intersection among ML and breast cancer detection provides the trans- formative potential in early detection, the most efficient strategy for mitigating rates of mortality. ML techniques can significantly improve the diagnostic accuracy, speed, and reliability.

In spite of the growing volume of ML-based research, various existing researches are fragmented focusing narrowly on specific algorithms, datasets, or imaging modalities.

This shattered landscape poses various challenges for the researchers and clinicians in recognizing the effective strategies, comparing results, and building upon prior work. This survey tackles these challenges by aiming to:

  • Produce the recent developments in ML-based breast cancer diagnosis.
  • Evaluate the strengths and shortcomings of various methodologies.
  • Highlight frequently utilized datasets and their impact over model performance.
  • Explore the role of Explainable AI (XAI) in enhancing clinical trust.
  • Provide structured insights for researchers, developers, and clinicians seeking to develop transparent and efficient diagnostic solutions.

By consolidating existing findings into a unified narrative, this survey helps improve reproducibility, inform practical decision-making, and identify promising areas for future research.

1.6 Why This Research is Significant

Breast cancer persists as a major reason for cancer-related casualties among women throughout the world. The efficacy of treatment is highly contingent on early and precise diagnosis, while conventional diagnostic approaches often suffer from certain constraints, including high false positives and false-negative rates, bias in interpretability, and availability issues. ML has come up as an auspicious tool in clinical diagnostics, offering top accuracies, scalability, and automation. However, in spite of substantial advancements, ML-based breast cancer diagnostics still encounter challenges in terms of quality of data, generalizability of a model, and medical adoption. This sur- vey attempts to provide a comprehensive overview of various ML approaches utilized in breast cancer detection, pointing out their strengths, constraints, and potential improvement areas. Table 1 presents comparative analysis of existing surveys.

Figure 2 shows the organization of the sections that follow the introduction. After the literature review in Section 2, the methodology is presented in Section 3. Findings and discussions are given in Section 4 while the conclusion and future research directions are presented in Section 5.

2. Literature Review on Breast Cancer Detection

A considerable number of ML methodologies have been employed so far to correctly diagnose breast cancer disease in various research. In [25], a Tabu search was done to choose the most appropriate features from the dataset for the detection of breast lesions or tumors using a rough set. The method was tested on the WDBC and BIDMC-MGH datasets. AdaBoost, hlK-Nearest Neighbor (KNN), and hlLR were used as performing models. hlKNN achieved the highest accuracy among all using Tabu search at 98.24%.

hlLR, KNN, discrete cosine transform (DCT), random forest (RF) classifier, hlSVM, multilayer perceptron (MLP), and ensemble MLP with genetic algorithm (GA) have been applied in [26] using the WBCD dataset. The study accomplished an accuracy of 98% with MLP-GA and holdout approach while 99.7% using MLP- GA and cross-validation. In [27], an artificial neural network (ANN) was optimized by integrated artificial immune system and artificial bee colony (IAIS-ABC-CDS), momentum-based gradient descent backpropagation (MBGD), simulated annealing (SA), resilient backpropagation techniques (RBPT) and GA approach on to the publicly available WBCD dataset for breast cancer detection. The study achieved an accu- racy of 99.34% using IAIS-ABC-CDS with MBGD and 99.11% using IAIS-ABC-CDS with RBPT.

Bayesian classifier-embedded integrated genetic-driven framework, GA, kernel- based Bayesian classification was applied by Wuniri et al. [48] on the WDBC dataset for the diagnosis of breast cancer attaining 97.1% accuracy. Abunasser at el. [28] utilized DL model Xception over BreakHis dataset collected from the Kaggle repository and achieved the accuracy of 99.78% for training, 98.59% for validation, and 97.60% for testing. Additionally, the Xception model showed a precision of 97.60%, recall of 97.60%, and an F1 score of 97.58%. In [29], authors applied hlCNN to the BreakHis dataset for the accurate diagnosis of breast cancer and secured the training accuracy of approximately 99% and testing accuracy of 97.80%.

The study [30] demonstrates the application of a hybrid gravitational search optimization algorithm and emperor penguin optimization (HGSAEPO) for the feature selection while RF, SVM, LR, decision tree (DT), and KNN for the classification of breast tumor into benign or malignant categories. The accuracy was 98.31% with HGSAEPO and RF showing a 97% sensitivity, 98.87% specificity, 98% precision, and 95.39% F1 score. Kadhim et. al [31] performed the comparison of different ML techniques comprising of DT, quadratic discriminant analysis, AdaBoost, bagging meta estimator, extra randomized trees (ERT), Gaussian process classifier, Ridge, Gaussian Naive Bayes (GNB), KNN, MLP, and hlSVM classifier. The authors found out that on the WDBC dataset, a 97.36% accuracy was achieved in the case of ERT which outperformed other algorithms for breast cancer diagnosis.

 Table 1 

Comparative analysis of breast cancer detection surveys with respect to research questions addressed in the study.

RefDatasetsH-IndexSegmentationFeature selectionXAI
[19]WBCD and only image-based datasets (ultra-sound, histopathology, MRI, etc.)NoYesYesNo
[20]WBCD datasets and image-basedNoNoYesNo
[21]WBCD datasets and image-basedNoNoNoNo
[22]Image-based datasets onlyNoYesYesNo
[23]WBCD, WDBC and image-based datasetsNoNoYesNo
[24]Only image-based datasetsNoYesYesNo
 Figure 2 

Structure of the paper with section and subsections.

J Cancer Image

The authors employed ANN in [32] for breast cancer detection through WBC and WDBC datasets securing an accuracy of 99.85% on WBCD and 99.47% on WDBC. In [33], Yusuf et. al described LR, SVM, RF, gradient boost (GB), and AdaBoost hl(AB) for the classification of breast cancer tumors into benign and malignant categories using the WDBC dataset achieving the accuracy of about 99% with LR, RF, and hlAB. Rakibul et al. [34] employed LR and SVM including linear SVM (LSVM), and quadratic SVM (QSVM) to WBCO, WDBC, and WPBC datasets and attained the accuracy of 94% for WBCO, 97.4% using QSVM on the WDBC dataset, and 83.5% using LR on the WPBC dataset.

In [35], wrapper subset selection method, correlation analysis, and principal component analysis (PCA) are used for feature selection and NB, SVM, DT, KNN, RF, LR, stochastic gradient descent learning-based ensemble classification methodology for breast cancer diagnosis is adopted. A 98.24% accuracy was achieved using the WDBC dataset. Huang Z and Chen D. A [36] applied variable importance measure (VIM), hierarchical clustering RF algorithm, DT, hlAB, and RF models on WBCD and WDBC datasets with accuracy of 97.05% on WDBC, and 97.76% on WBC with HCRF. KNN, chi-square-based feature selection, L1 based selection from model feature selection are applied in [37] on the WBCD and WDBC datasets having an accuracy of 99.42% for WBC, and 99% for WDBC dataset with L1-based feature selection.

Dragonfly algorithm (DA), PCA, DL models, SVM, RF, and KNN were utilized by Ibrahim et. al. [38] for breast cancer detection and achieved 97.90% accuracy. In [39], Akkur et al. used relief and binary Harris hawk optimization (BHHO) hybrid model, KNN, SVM, LR, and NB for the diagnosis of breast cancer using the WDBC, WBCD and mammographic breast cancer dataset (MBCD). They secured an accuracy of 98.77% for the WDBC dataset. For the WBCD, 99.28% accuracy and for MBCD 97.44% accuracy was secured with relief-BHO-SVM. Ensemble filter-based feature selection with 1-D CNN (1D-CNN) was employed in [40] with an accuracy of 98.5% via the WDBC dataset.

In [41], the WDBC dataset was utilized for breast cancer detection through Pear- son's correlation coefficient, lasso, and minimum redundancy-maximum relevance (mRMR) for feature selection and SVM, light GBM, RF, DT, NB, KNN, LR were used for the classification of breast tumor into benign and malignant classes. Hossin et. al in [42] performed a comparison of different ML algorithms using univariate feature selection, recursive feature elimination, correlation heatmap, LR, RF, KNN, DT, hlAB, SVM, GB, and Gaussian NB. They found that LR and SVM are more effective as they attain an accuracy of 99.12% on the WDBC dataset. In [43], Sundar and the co-authors utilized the ResNet50v2 model of CNN and ensemble approach with DT, RF, ET, and XGBoost on invasive ductal carcinoma (IDC). The ensemble model achieved an accuracy of 99.82%.

The study [44] is associated with the usage of SVM and its parameters' fine-tuning for the diagnosis of breast cancer using WDBC attaining an accuracy of 95.61%. Doaa et al. in [45] utilized thermal images from the DMR-IR dataset and employed Gabor filters, canny edge detection, and holistically nested edge detection, CNN, RESNET- 50, SVM, and XGB achieving 96.23% accuracy. Saurav and co-authors utilized TCGA and applied RF, SVM, DT, KNN, Gaussian NB, and XGBoost in [46] and got 97.19% accuracy. While in [47], XGBoost was used on the WDBC dataset and the resulting accuracy was 99.12%.

2.1 Gaps in Existing Literature and Their Significance to This Study

In spite of significant breakthroughs in the ML area for breast cancer detection, various gaps in the existing literature hamper its full prospective in medical implementations. These gaps comprise:

  • Finite Generalizability and Dataset Diversity: Mostly ML models to detect breast cancer depend on some publicly accessible datasets, such as WDBC, WBCD, and BreakHis [25-33, 48], restricting the generalizability of ML models. Such datasets are usually less diversified in regards to the demographics of patients, imaging modalities, and the subtypes of tumors, which can cause prejudiced models that struggle to perform on real-time clinical data.
  • Contribution of this Study: This survey highlights the significance of cross- dataset validation and hybrid or ensemble learning methodologies to enhance generalizability and make sure that ML models are robust in nature across various populations and clinical settings.
  • Over-Dependency over Black-Box Models: DL approaches, specifically CNNs have exhibited top accuracy in breast cancer detection [28, 29, 34, 35]. However, they often lag behind in terms of interpretability and explanation due to which they are not trustworthy and reliable making them difficult to get accepted and adopted in clinical settings.
  • Contribution of this Study: This survey highlights the importance of XAI techniques to improve transparency, and to make sure that ML-driven methodologies are interpretable and clinically adaptable.
  • Lack of Balance Among Accuracy and Interpretability: Most of the studies prefer accuracy over model interpretability [36-38] making it challenging for clinicians to trust and rely on the predictions of the model. Conventional models such as hlLR and hlDTs provide good interpretability but are not highly accurate.
  • Contribution of this Study: This research explores certain hybrid methodologies through the analysis of both segmentation-based and feature selection-based approaches. This equalizes accuracy and interpretability to make reliable detection methods.
  • Scarce Systematic Performance Metrics: Existing literature often utilizes various performance metrics such as accuracy, F1 score, AUC-ROC, etc. [39-41] that makes it challenging to contrast models equitably. In addition to this, some research reports high accuracies through training data with no meticulous validation on test datasets.
  • Contribution of this Study: This survey supports the systematic evaluation standards and benchmarking approaches to ensure unbiased and staunch contrast between various ML models.
  • Absence of Comparative Analysis: Many research works [42, 43, 49] analyze specific ML approaches without providing a comprehensive contrast of segmentation-based and feature selection-based classification methodologies.
  • Contribution of this Study: This research offers a structured juxtaposition of certain ML models, even containing DL and ensemble approaches.
  • Difficulties in Medical Integration and Acceptance: In spite of the success of breast cancer diagnostic based on ML in terms of research, they face hindrances in adoption by clinical professionals because of administrative challenges, data privacy issues, and the requirement for comprehensive validation as well as controlling, ethical, and computational limitations [50-52].
  • Contribution of this Study: This research explores prospective resolutions, like ensemble learning and administrative-compliant ML models, to narrow the gap between research development and real-time clinical applications.

By tackling these gaps, this study aims to offer practical comprehension and facts for researchers and professionals in the domain. It advocates for the development of reliable, robust, interpretable, and clinically acceptable ML models for the detection of breast cancer, facilitating enhanced detection accuracy, early diagnosis, and better medical outcomes.

3. Material and Methods

3.1 Research Methodology

This research offers streamlined approaches for scrutinizing, classifying, and amalgamating the literature commensurate with the established objectives. This emphasizes the spheres that may set out as a strategy for anticipated research inclination in the particular domain. This survey has been carried out in a number of steps. The first step comprises the research question definitions, while in the second step, the research objectives have been developed using the pre-defined research questions. In the third step, the shortlisting strategy is formulated to find out the related articles after which they will get nominated, categorized, and scrutinized in conjunction with the research domain. Finally, the results were discussed and analyzed as per research questions. Figure 3 presents the adopted approach for this review.

Shortlisting Strategy

The articulation of a search proposition to gather the related and original information within the specified area is the most critical step in the formulation of this review. This research examines relevant literature from 2019 to 2024, collected from a number of databases such as MDPI, IEEE, Elsevier, Springer, Neural Network World, and Computational and Mathematical Methods in Medicine. The relevant journals have higher H-index and good citation rates and consulted with specific keywords such as “machine learning,” “breast cancer detection,” “segmentation-based classification,” and “feature selection”, These keywords were used to identify relevant studies in academic databases like PubMed, Wiley, Springer, etc.

Applying the search string to the diverse digital repositories resulted in the acquirement of a huge amount of data, which needed to be shortlisted by going through a multi-stage shortlisting process. The research papers were selected on the basis of an H-Index criterion, Figure 4, and restricted to the publications from 2019 till 2024, Figure 5. After the removal of redundancy, the papers were scrutinized via abstract reading as well as results evaluation so that the most relevant articles were selected.

Benchmark datasets that are readily available to the public were used in shortlisted studies (e.g., WBCD, DDSM), such that results are reproducible. The experimental results focus was the ML techniques discussions, and the performance metrics reported were used to evaluate the articles. Studies with more complete experimental validation and comparison were preferred. Research papers across a wide range of ML techniques from traditional supervised learning to DL were included as part of the reporting effort to maintain a balanced review. Through adopting this systematic shortlisting approach, the survey conducts comprehensive and impartial scrutiny of the state-of- the-art ML methodologies applied for breast cancer detection.

 Figure 3 

PRISMA approach for this review.

J Cancer Image
 Figure 4 

Number of papers per H-index.

J Cancer Image
 Figure 5 

Year-wise publications.

J Cancer Image

3.2.1 Inclusion Criteria

The following kinds of research and datasets were incorporated in this survey.

  • Dataset Connectedness: Researches or sources that particularly used breast cancer datasets, including Histopathological images, Mammograms, Thermal images, Clinical data (e.g., reports, tabular data)
  • Public Available: The studies utilizing datasets which are publicly accessible were included.
  • Real time Individual Data: Real patient data incorporated in the literature was given preference.
  • Language: Articles published in English were chosen.
  • Time frame: Research articles published from 2019 to 2025 were selected.
  • Use of ML: Priority was given to the studies using ML or DL approaches in conjunction with breast cancer detection, diagnosis, or prognosis.

3.3.2 Exclusion Criteria

The following kinds of research were excluded while shortlisting.

  • Non-Dataset Articles: Publications that included the discussion of breast cancer but did not incorporate any datasets, were excluded.
  • Restricted Access Data: Studies including datasets not publicly accessible for academic or research use were excluded.
  • Redundant Research: Duplicate entries or multiple studies on the same topic were filtered out to avoid repetition.
  • Surveys, Books, Magazines: Surveys were only included for the purpose of comparison, however, books and magazines were completely omitted.
  • Not Related to ML or Breast Cancer: Studies of pure medical nature meaning that it does not incorporate ML for breast cancer detection were excluded. As well as, the research including cancer other than breast cancer or any other disease was also omitted.
  • Language Hurdle: Researches published in languages besides English were excluded.

Table 2 shows the distribution of selected papers with respect to the publisher.

3.3 Breast Cancer Overview

Breast cancer is one of the prevailing cancers across the world, influencing a wide range of individuals perennially. It is attributed to the unhampered or abnormal division of malicious cells in breast tissues, together they become a malignant or cancerous tumor invading the surrounding cells. Although this cancer primarily targets women, men can also become a victim of it yet at minimum frequency. As per the World Health Organization (WHO), the predominant cause of cancer-related casualties in women is breast cancer, with substantial distinctness in the rate of occurrence and death toll worldwide because of inequalities in healthcare availability, cognizance, and early diagnosis programs.

 Table 2 

Publisher-wise distribution of papers with corresponding references.

PublisherCountReference Papers
IEEE Access14[36, 45, 48, 53-63]
MDPI11[18, 34, 49, 64-70]
Springer5[30, 71-75]
BMC Series3[46, 76, 77]
Elsevier3[17, 32]
Wiley3[51, 78, 79]
Advances in Artificial Intelligence and Machine Learning2[66, 80]
Archives of Breast Cancer1[66]
International Journal of Advanced Computer Science and
Applications
1[28]
Asian Pacific Journal of Cancer Prevention1[29]
International Journal of Electrical and Computer Engineering1[41]
Bulletin of Electrical Engineering and Informatics1[42]
International Journal of Integrated Engineering1[81]
Concurrent Engineering Research and Applications1[52]
International Journal of Reconfigurable and Embedded Systems1[31]
International Journal of Image, Graphics and Signal Processing1[34]
Journal of Experimental and Theoretical Artificial Intelligence1[82]
IOP Conference Series Materials Science and Engineering1[83]
Journal of the Nigerian Society of Physical Sciences1[33]
Journal of the Chinese Institute of Engineers1[37]
Jundishapur Journal of Microbiology1[40]
African Journal of Biomedical Research1[44]
Journal of Electrical Systems1[84]
Automation Controls & Engineering1[85]
Simulation1[86]
Neural Network World1[39]
MATEC Web of Conferences1[43]
Jundishapur Journal of Natural Pharmaceutical Products1[50]

3.3.1 Societal and Healthcare Impacts of Breast Cancer

Substantial psychological distress including anxiety, fear of recrudescence, financial stress, and depression for the patient and their families is caused when breast cancer is diagnosed. The diagnosis of breast cancer often leads to a financial burden for low- income families as the cost associated with the surgeries, radiation, chemo-therapies, and related treatment is significant. The screening programs are scarce in third-world countries leading to the diagnosis at the last stage of cancer. Many people in such countries owing to the disparities are reluctant to regular scan of breast tissues due to the cost associated with it leading to distress and suffering.

A multidisciplinary approach is required for the treatment of breast cancer involving a versatile number of surgeons and oncologists which impose a substantial burden on healthcare systems. However, countries having early detection tools and well-established screening systems have higher chances of survival as compared to third-world countries where these facilities are scarce and the available programs are costly for log-wage families. So, there is a need for early detection including scalable and cost-effective AI-driven solutions for the diagnosis of breast cancer.

3.3.2 Early Detection is Imperative

Lowering breast cancer mortality rate and improving the health of breast cancer patients depend on early detection. However, if diagnosed early enough, there is a 90 out of 100 chance the person will survive breast cancer. These limitations do not preclude the use of ultrasound or mammography, classic diagnostic tools that suffer however operator dependency and the concomitant variability of its interpretation, failing to satisfy needs in underserved areas. What this means is that early detection and the capability of doing so are being greatly enhanced by ML as a powerful tool. Large datasets are used by ML models to uncover patterns and anomalies that a human observer might miss. This is a characterization of malignant tumors at a greater speed and with accuracy, facilitated by the use of image segmentation, feature extraction, and predictive modeling techniques. On top of these, ML-based tools can further filter out high-risk prospective patients, grade a case's urgency, and help clinicians update their diagnostic workflow and corresponding patient outcomes.

4. Findings

This section describes the conclusions and key findings attained after analyzing the 40 publications selected in this survey. All RQs are briefly described in order to clarify the respective exploring areas of the breast cancer detection domain.

4.1 Datasets Widely Used

Different publicly available datasets have largely contributed to the advancement of ML and DL techniques for breast cancer detection. These datasets differ in size, imaging modalities, annotations, and patient demographics, enabling researchers to develop and evaluate diverse models.

The datasets that have been widely utilized in the study are WDBC and WBC datasets as illustrated in Figure 6. This chart demonstrates the popularity of certain datasets in breast cancer detection literature.

  • The WDBC dataset is the commonly implied dataset, with 26 papers referencing it. This points out its importance in the research and its priority as a standard dataset.
  • The WBC is the second most widely cited dataset, to be referenced in 14 papers, proving its significance for the study of breast cancer diagnosis.
  • Other datasets i.e. the mammographic mass dataset (MM-Dataset) and Histopathological breast whole slide imaging (WSI), are of the least usage, referenced in about 3 or minimal studies.
  • Other datasets, such as VinDr-Mammo Dataset, DMR-IR DB, UPFE DB, etc. have been referenced in only 1 paper each, illustrating their restricted incorporation in the research.

Conclusively, the supremacy of WDBC and WBC emphasizes their powerful dominance in the literature of breast cancer detection, probably because of their fine data, availability, and continuous domination. Other datasets which have been least employed so far are either new or less accessible to the general public which contributes to their limited application.

The pie chart in Figure 7 represents the apportionment of the datasets utilized for breast cancer diagnosis using various ML methodologies, partitioned into three classes: WDBC, WBC, and WPBC. Following is a comprehensive analysis of the figure targeting its relativity to breast cancer detection via ML approaches.

  • WDBC dataset: This dataset is represented by the blue color section, and it comprises 32 instances, which makes it at the leading edge for the publicly available data.
  • Purpose: The WDBC dataset is usually applied in ML methodologies for the classification of breast tumors into benign or malignant categories, helping to meticulously detect breast cancer.
  • WBC dataset: This is the smallest dataset, illustrated in orange, and encompasses merely 10 samples.
  • Purpose: This dataset presents an underlying principle for the fundamental level ML algorithms in breast cancer diagnosis, mainly utilized in the investigation calling for relatively smaller datasets.
  • WPBC dataset: Shown in gray shade, this is the largest dataset containing 34 samples.
  • Purpose: WPBC is indispensable for prognostic modeling, assisting the ML models in predicting cancer recrudescence and patient results in the longer run.

4.1.1 Significance of Datasets

These datasets are widely utilized for the purpose of classification, prediction, and feature extraction or selection. WDBC and WDBC due to their enormous sizes are far suited for the diagnosis and prognosis scenarios.

A huge number of samples of these datasets contribute to the extrapolability of the ML models trained on these datasets. However, WBC due to its smaller size is not well suited for training complex or large-scale ML models but can be efficiently utilized for fundamental-level ML models.

The given chart in Figure 8 demonstrates the sizes of certain datasets based on images applied in breast cancer detection literature, presenting an analysis of their spectrum and scope. Following is a detailed overview:

 Figure 6 

Number of research papers per dataset.

J Cancer Image
 Figure 7 

Size of feature-based datasets.

J Cancer Image

i. BreakHis

  • Size: This is a huge dataset utilized in this contrast, comprising approximately 60,000 images.
  • Importance: Due to its substantial magnitude, BreakHis is extremely appropriate for the application of DL, which enables the training and performance evaluation of a robust ML model. The dominance of this dataset in the study reflects its large-scale usage and trustworthiness.

ii. VinDr-Mammo Dataset

  • Size: This dataset comprises nearly 20,000 images.
  • Importance: This dataset is comparatively sizable, which makes it invaluable for the development and testing purposes of advanced image-based breast cancer detection tools, specifically for mammography-based breast cancer studies.

iii. Breast Cancer Wisconsin (Diagnostic) Dataset

  • Size: Comparatively smaller, with only a few thousand images.
  • Significance: Although this is not as large as the BreakHis dataset, it is largely utilized because of its well-designated and premium attributes, which makes it a crucial resource for diagnosis of breast cancer detection.

iv. Dunya Women's Cancer Dataset

  • Size: This is comparatively smaller, consisting of fewer thousands of images.
  • Importance: Its small size makes it suitable for specific research as compared to it may be used for more specific or focused studies rather than high-level modeling purposes.

v. Invasive Ductal Carcinoma (IDC)

  • Size: This dataset encompasses a few number of images.
  • Importance: In spite of its relatively small size, It is invaluable for the focused research on invasive ductal carcinoma, a prominent sort of breast cancer.

vi. UPFE DB and DMR-IR DB

  • Size: Both of these datasets are very small in size, probably less than 1,000 images.
  • Importance: These datasets are likely utilized for small-scale or preliminary research because of their smaller size.

4.1.2 Key Observations

BreakHis and VinDr-Mammo are two huge datasets, which makes them ideal for data- driven approaches such as DL. Small-scale datasets, like the Dunya Women's Cancer Dataset, IDC, UPFE DB, and DMR-IR DB, are suitable for focused research purposes or pilot studies. The size of the dataset performs a significant role in determining its applications, with large-scale datasets advocating the advanced techniques while smaller ones facilitating the targeted research. This apportionment highlights the significance of the selection of datasets on the basis of the study objective, with larger datasets like BreakHis being essential for high-level research and smaller ones serving as the focused ones. Figure 8 shows the size of image-based datasets.

 Figure 8 

Size of image-based datasets.

J Cancer Image
 Figure 9 

Most popular ML approaches in the literature.

J Cancer Image

4.2 Most Popular ML Approaches

Approaches such as SVM, RF, DTs, and KNN have a huge number of applications for the binary classification of breast cancer tumors as is shown in Figure 9.

CNNs have been shown to be state-of-the-art in medical imaging data analysis and work in the best interest of image data classification.

  • RF algorithm (20 papers): This classifier has become the most cited one in breast cancer detection research. It is of great importance due to its ensemble learning abilities, which offer vigorousness and high accuracy for the classification of complex datasets.
  • KNN (16 papers): KNN is the second-most widely utilized algorithm, probably due to its lucidness and efficacy for small-to-medium-level datasets.
  • LR (15 papers): This conventional statistical classifier retains its significance, specifically in less complex or smaller datasets where lucidity is critical.
  • SVM (14 papers): SVM is commonly given priority for its capability to perform well in binary classification methods as well as to handle high-dimensional data effectively.

4.2.1 Moderately Popular Methods

Besides the widely used models like RF, KNN, etc, some ML models are moderately used in existing literature.

  • NB (12 papers): The probabilistic and less complex nature of this dataset makes it efficient for initial research in breast cancer diagnosis.
  • GB (16 papers): This model constructs complex and powerful classifiers by the combination of inconsistent or basic learning models. It is widely being embraced because of its robustness in manipulating and managing unbalanced or high-dimensional datasets.

4.2.2 Impending or Less Commonly Used Approaches

A few ML models have been used in a few studies for breast cancer detection.

  • DL models (3 papers), hlCNN (4 papers), and ANN (4 papers): Although these are least commonly referenced, however, these approaches are popularly bulging for breast cancer detection using image-based datasets, specifically along with the development in medical imaging methodologies.
  • GA (7 papers): Traditionally applied for the purpose of feature selection or dimensionality reduction, this approach assists in improving the performance and efficacy of the classification models.
  • Particle Swarm Optimization (PSO) (2 papers): It is a metaheuristic methodology mostly applied for the optimization of the model parameters feature selection.
  • AB (11 papers): Adaboost is an ensemble approach widely applied for enhancing the performance of fundamental or simpler classifiers.
  • Recursive feature elimination (3 papers): Mainly employed for feature reduction or elimination tasks, it assists in refining or moderating the features for classifier models.

4.2.3 Applications of the Methods

  • Image Processing and Preprocessing: Techniques such as CLAHE, YOLOv5, and Autoencoders are employed to enhance image lucidity or detection of breast tumors from mammographic images.
  • Feature Selection and Dimensionality Reduction: Approaches such as GA, PCA, and Relief algorithm are utilized for refining the input features to enhance the performance of the model.
  • Classification and Diagnosis: Methodologies like SVM, RF, KNN, and LR are employed for the classification of breast cancer as benign or malignant.
  • Optimization: Algorithms such as PSO and GA are utilized for improving or tuning the parameters of the model or classifier to enhance the prediction and detection accuracy of the models.

4.3 Machine Learning Techniques in Breast Cancer Detection

Using ML to automate diagnostic processes in breast cancer detection has advanced its detection vastly. They are able to process massive medical imaging and patient data to find hints at malignancies. ML techniques broadly fall into two categories: The methods that I explore are segmentation-based classification and feature selection- based classification. Analysis of medical datasets using these two approaches can lead to unique advantages in the understanding of cancer, and their integration into clinical workflows will bring transformative improvements in cancer care.

4.3.1 Segmentation-based Classification

Isolating regions of interest (e.g., tumors) from medical images as preprocessing is an important task, termed segmentation. This focuses on relevant anatomical structures and therefore enhances the reliability of subsequent diagnostic processes in case of accurate segmentation.

4.3.2 Key Techniques and Applications

  • hlCNNs: Spatial hierarchies can be captured by CNNs, making them the gold standard method of choice for image segmentation tasks. We demonstrate unusual success in segmenting mammograms and MRI scans using architectures such as U-Net and Mask R-CNN.
  • Fully Convolutional Networks (FCNs): Specifically, FCNs are built for pixel-wise classification tasks and are particularly suitable for medical image segmentation. The images can have different resolutions and they can scale and adapt to different datasets.
  • Semi-supervised and Unsupervised Methods: Semi-supervised techniques (i.e. GANs (Generative Adversarial Networks) and unsupervised clustering) can be used for segmentation of regions of interest in cases where the labeled data is scarce.

Radwan et al. [17] used YOLOv5, MedSAM segmentation models and contrast- limited adaptive histogram equalization (CLAHE) algorithm along with a Gaussian blur, ensemble deep random vector, functional link neural network algorithm for breast cancer diagnosis. While Sarfaraz et al. [16] applied H and E staining, Nuclei segmentation, nuclei-based instance segmentation as well as PCA and PSO for feature selection, RF, LR, NB, KNN, SVM, digital image analysis, and CNN for detection of breast cancer after analyzing WDBC, and WSI datasets.

4.3.3 Feature Selection-based Classification

Feature selection is the process that includes the identification of the very relevant and preferred features from multi-dimensional datasets available in the digital public repositories. This process ameliorates the overall performance of different ML classification models. The approach is specifically valuable for patient-related confidential data involving histopathology-based features, statistics, and genes-related data.

4.3.4 Major Techniques and Applications

i. Recursive Feature Elimination: This technique recurrently minimizes the least significant features enhancing the performance of the ML model, assuring that only the important attributes should be selected for the classification purposes.

ii. Principal Component Analysis: PCA is a feature selection technique for the data transformation to the group of stochastic elements, securing the crucial divergence. This technique has been effectively employed for the dimensionality reduction and duplication of the datasets of breast cancer.

iii. Evolutionary Feature selection: EFS is a technique utilized in the area of ML to ameliorate the performance of ML classification models. It applies the evolutionary algorithms (e.g. hlGA, PSO, ICA) for the identification of the subset of attributes that assist in the effective contribution to the anticipation of the accuracy of the classification model. By modeling the process of Darwinism (survival of the fittest), EFS recurrently chooses and combines the relevant attributes to search for the optimal combination through which the performance of the classification is enhanced while reducing the computational complexity.

Saeed et al. [82] has utilized ensemble classification based on MLP neural network, evolutionary algorithm (GA, PSO, and ICA) on WBCD original dataset seeking the classification accuracy of about 98.74%. Roger and the co-authors [18] applied GA and SVM for breast cancer detection over the datasets containing thermal images available in the database for mastology research with infrared image (DMR-IR) and private thermal image database of the Federal University of Pernambuco (UFPE) while achieving the accuracy of 97.18%. Sahar A. [74] employed GA, RFE, rough set feature selection, and PCA for feature selection along with DT, KNN, ANN, SVM, RF, and relief methods for the classification of breast cancer tumors. Khatereh [86] used GA for feature selection in the BreakHis dataset and CNN for classification purposes.

Through exploiting segmentation-based and feature selection-based methodologies, ML classification techniques for the diagnosis of breast cancer have become vigorous and trustworthy. These techniques work together side by side, providing extensive solutions for the analysis of both image-based and feature-based datasets.

4.4 Explainable AI and its Necessity for Breast Cancer Diagnosis

As the ML and DL approaches have become complicated to a greater extent, one of the crucial challenges in their real-world acceptance is the meagerness of interpretation and explainability. XAI signifies a collection of approaches developed to cause AI-led resolution more lucid, interpretable, and reasonable for clinicians and patients.

4.4.1 Why is Explainable AI Required?

  • Certitude and Trustworthiness: Healthcare professionals are more liable to embrace and utilize AI-driven diagnostic approaches if they can comprehend and verify the logic behind the predictions. Black-box DL models such as CNN, mostly lack lucidity, causing physicians reluctant to depend on them for crucial decisions.
  • Administrative Compliance: Various healthcare administrative departments necessitate that AI-driven methodologies utilized in clinical diagnostics should be explainable and accountable. XAI can assist in ensuring compliance with such rules.
  • Error Detection and Prejudice Alleviation: Comprehending how AI methodologies derive predictions enables the researchers to recognize potential prejudices and subjectivity, rectify errors, and enhance model impartiality across distinct populations of patients.
  • Enhanced Patient Correspondence: Providing vivid justifications for AI-based diagnostics allows clinicians to efficiently communicate with convalescents, strengthening logical decision-making and adopting AI-based clinical solutions.

4.4.2 Techniques of XAI

i. Analysis of Important Features: Approaches like local interpretable model- agnostic explanations (LIME) and Shapley additive explanations (SHAP) can assist in emphasizing the relevant features (e.g., tumor size, shape, or density) affected a model's prediction.

ii. Attention Procedure in DL Approaches: Models such as attention-driven neural networks offer visual interpretation, that makes it simpler to elucidate how a DL model handles clinical images.

iii. Rule-Driven Models and DTs: Although DL techniques provide high accuracies, more straightforward rule-based approaches or hlDTs can be utilized in conjunction with them to enhance interpretation.

iv. Saliency Maps and HeatMaps: CNNs can produce heatmaps for the visualization of the regions within histopathological images or mammograms that play an essential part in the classification decision.

4.4.3 Real-World Applications of SHAP and LIME

In real clinical settings, SHAP and LIME are mostly utilized to pinpoint the highly influential features or image regions behind the prediction of a model. For example, in mammography or biopsy image analysis, SHAP can visually highlight which parts of a tumor contributed highest to a malignancy prediction assisting radiologists validate or question the AI's outcome.

4.5 Performance Metrics for the Comparison

The ML models employed for the diagnosis of breast cancer are frequently evaluated through a number of performance metrics. These performance metrics cater to understanding the algorithms' abilities for the accurate classification of cancerous or non-cancer tumors. The most popular of them is the accuracy. Following is an examination on the basis of Figure 10.

i. Accuracy

  • Definition: This metric depicts the apportionment of properly classified samples (case of true positives and true negatives) into the total number of samples [25, 26] and [32].
J Cancer inline graphic  (1) 

where TP means true positive, TN is true negative, FP is false positive and FN is false negative. The accuracy of various algorithms ranges between 85 and 100%.

  • Top Performing Algorithms: The accuracy of GB was 100% proving it an efficient algorithm for the classification of breast cancer tumors into benign and malignant categories. Models such as RF, Xception, and SVM with RF integration attained accuracies of approximately 97% depicted in Figure 10, which reflects their smoothness and vigorousness for classification purposes. The accuracies of every method on different datasets are demonstrated via Figure 10.

ii. Sensitivity (Recall or Rate of True Positive)

  • Definition: This metric provides the ability to measure the model's capability in terms of identification of cancer cases accurately [28, 29], and [34].
J Cancer inline graphic  (2) 
 Figure 10 

Accuracy for each classification model.

J Cancer Image
  • Significance in Breast Cancer Diagnosis: When the sensitivity of an algorithm is high, it makes sure that the malignant or cancer cases are not bypassed and this is very critical for the early diagnosis of breast cancer and proper treatment. Approaches such as CNN, ensemble techniques, and GB algorithm mostly function in a good manner in terms of sensitivity rate because of their capability to grasp complicated patterns, especially in image-based datasets.

iii. Specificity (Rate of True Negative)

  • Definition: This performance metric measures the model's capability to accurately diagnose the negative cases or benign cases (non-cancerous cases) [39, 40], and [41].
J Cancer inline graphic  (3) 
  • Significance in Breast Cancer Diagnosis: When the specificity of a model is high, it minimizes the chances of false positive or inaccurate identification ultimately reducing the superfluous biopsies and the strain over patients. Ensemble approaches such as RF and GB are conventionally powerful in attaining an equilibrium between sensitivity and specificity.

iv. Precision

  • Definition: This metric evaluates the number of anticipated positive cases that are in fact in this way [30, 31], and [32].
J Cancer inline graphic  (4) 
  • Importance in Breast Cancer Diagnosis: Escalated precision minimizes the number of false positive rates, which ensures that merely real malignant cases are marked for advanced detection purposes.

v. F1 Score

  • Definition: The harmonic mean of recall or sensitivity and precision is called the F1 score which stabilizes the trade-off between precision and recall [30, 31], and [32].
J Cancer inline graphic  (5) 
  • Relativity to Breast Cancer Diagnosis: When this score is high, it points out a poised performance in the identification of true positive cases and reduction of false positive rates. Approaches with vigorous generalization abilities like GB and Xception algorithm mostly have a good F1 Score.

4.5.1 Performance Metrics Evaluation for Particular Techniques

  • GB (100% accuracy): It probably surpasses other models in terms of sensitivity and specificity having an excellent F1 Score. The iterative methodology enables it to perform effectively for various datasets specifically structured datasets such as WDBC or WBCD as shown in Figures 11 and 12.
  • Xception algorithm and RF (97% accuracy): Both of these models display top performance, where RF performs well in the management of features while the Xception algorithm excels across image-based datasets. RF performs well across IDC and BreakHis datasets while Xception achieves high accuracy in IDC and VinDr-Mammo.
  • SVM (93.95% accuracy): This algorithm showcases an exceptional performance in terms of binary classification tasks of particularly large-sized datasets such as WDBC. Other performance metrics such as sensitivity and specificity are usually shown as on peak but the number can be varied based on the adjustment of the model parameters.
  • CNN and Other DL models: These models attain high recall because of their capability of learning complicated patterns, particularly of image-based datasets. Precision showcases minor trade-offs if the sensitivity level of the algorithm is high.
  • LR 93% accuracy): LR exhibits a reliable performance across less-dimensional datasets having good sensitivity and specificity. It is easy to interpret in relation to DL approaches or ensemble techniques.

Note: For Figure 10 and 11, accuracy values are mainly derived from 10-fold cross validation or test datasets, as reported by respective authors. For Figure 12, performance metrics show test dataset accuracies or average cross-validation scores based source publications.

4.6 Discussions

4.6.1 RQ1: How is the performance of various ML models impacted or inflicted by the dataset choice such as WDBC, BreakHis etc. in the diagnosis of breast cancer?

In breast cancer detection research, the credibility of various ML algorithms is substantially affected by the choice of the dataset. Datasets such as BreakHis, WDBC, and WPBC vary enormously in terms of size, complications, and structure, which may have a strong influence on the performance benchmarks of certain models.

There are some large-sized datasets such as BreakHis utilized in [48] and [28] comprising a huge repository of images (histopathological images) which is a source of plenty of data for training DL models like hlCNNs achieving accuracies of more than 99 per- cent.

Contrarily, smaller datasets such as WDBC require vigilant feature selection and are more suitable for conventional ML algorithms like Random Forest (RF) and Support Vector Machines (SVM). These datasets benefit from hybrid approaches that involve preprocessing steps like feature selection prior to classification, often reaching accuracies between 96 percent and 99 percent.

When small sized datasets such as WPBC are employed with complex models, challenges such as overfitting arise. In such cases, hybrid or ensemble methods where combination of feature selection with classification is employed are critical to enhance generalization.

Conclusively, the performance of ML models is essentially related to dataset characteristics. Larger datasets are suitable for DL models, while small tabular datasets favor hybrid approaches. Future research should emphasize on leveraging diversity of the dataset and structure-aware techniques to attain optimal performance.

4.6.2 RQ2: Do the research results contain the prejudice just because of the excessive utilization of prevailing datasets and how this can be alleviated?

There is a considerable bias in breast cancer detection research because of the prevalent use of particular datasets like BreakHis and WDBC. This over-dependency halts the generalizability of models, as they often fail when employed to different or unseen forms of data.

 Figure 11 

Accuracy for different models for different datasets.

J Cancer Image
 Figure 12 

Accuracy for different models using WBCD and WDBC datasets.

J Cancer Image

Datasets such as BreakHis continue to dominate the research arena because of their huge size and image-based characteristics. However, models trained specifically on such datasets face performance issues on other datasets, such as clinical or thermal imaging data. For example, CNN-based models that best perform on BreakHis struggle with WDBC due to the differences in format and structure of the data [28].

To mitigate this prejudice, researchers should incorporate multiple variety of datasets and validate ML models across them. Integrating BreakHis with WPBC or thermal imaging datasets ameliorates the generalizability. Generative models like GANs can also be utilized to create synthetic illustrations, assisting in addressing the class imbalance and scarcity of rare cases.

One study showed excellent outcomes when training on BreakHis, but the same model failed to replicate the same performance over WDBC highlighting the significance of cross-dataset validation. Merging datasets like WDBC and WPBC, as done in [74], proves helpful.

In a nutshell, excessive dependency on a few datasets can crook the outcomes and reduce medical relevance. Introducing dataset diversity, carrying out cross-validation across various datasets, and synthesizing the benchmark standards can significantly enhance the fairness and robustness of ML models in breast cancer detection.

4.6.3 RQ 3: What are the determinants that impact the selection of algorithms for the fact finder in regards to breast cancer detection?

Many factors influence the selection of algorithm for breast cancer detection. These include type of the dataset (either image or tabular), dataset size, features structure, model complexity, and resource availability.

Convolutional Neural Networks (CNNs) are mostly selected for image-based datasets like BreakHis because of their strong feature extraction capabilities. Contrarily, algorithms like SVM and RF are prioritized for structured datasets like WDBC, where tabular features are usually relevant.

In researches preferring sensitivity, RF has shown strong performance over tabular datasets. Similarly, CNNs have shown high accuracy on image datasets. However, resource-intensive models like GANs may not be a good choice in the environment with limited computing power. In such cases, simpler models like Decision Trees (DT) and Logistic Regression (LR) provide more practical alternatives.

In clinical settings where interpretability is crucial, models such as SVM and DT are often preferred over complex DL models. For example, [31] used CNNs to achieve 99.78 percent accuracy on image-based data, Conversely, [29] employed SVM on WDBC, reaching 96.82 percent accuracy with engineered features.

Ensemble methodologies such as combining segmentation or feature extraction with DL models further ameliorate performance. Therefore, model selection must take not only accuracy into account but also the elements of deployment and resource constraints.

Conclusively, factors such as dataset type, desired performance metrics, interpretability, and computational cost altogether guide the algorithm selection. Using hybrid strategies, AutoML frameworks, and dataset-aware techniques can enhance performance of diagnosis while keeping robustness intact.

4.6.4 RQ 4: Is there any trade-off between interpretation and accuracy while selecting the algorithm for diagnosis of breast cancer

In breast cancer detection, attaining high accuracy is crucial, however, interpretability is equally important, especially in clinical environment. This often results in a trade- off where interpretable models miss accuracy, and highly accurate models are not transparent.

DL models such as CNNs trained on BreakHis can achieve accuracies more than 99 percent, but they are considered as “black boxes.” This opaqueness restricts their acceptance in clinical settings. Contrarily, simpler models trained on tabular datasets may offer less accuracy but are easier for clinicians to comprehend.

This trade-off offers challenges in model deployment. Healthcare professionals may hesitate to trust such non-transparent models, even if they are accurate. Conversely, relying solely on interpretable models could result into false positives or negatives if accuracy is compromised.

To tackle this issue, hybrid and ensemble approaches have been explored. These models combine several algorithms to balance accuracy and interpretability. For example, in [33], Random Forests offers 100 percent accuracy and are relatively interpretable. Other studies [28, 29, 32] reported top accuracies but do not provide model explanation techniques.

Explainable AI (XAI) tools like SHAP and LIME can assist in making DL models more interpretable. However, their incorporation into research and clinical environment remains scarce. More work is required to make these tools a benchmark practice.

In a nutshell, addressing the accuracy interpretability trade-off is critical. Hybrid methodologies, ensemble models, and XAI integration provide feasible paths forward to ensure reliable and trustworthy diagnostic systems for breast cancer.

4.7 Limitations of this Study

While the referenced literature on breast cancer detection techniques provides valuable information, it is important to acknowledge certain limitations and challenges identified in all studies.

  • Restricted Scope of the Dataset Usage: This review essentially targets the frequently utilized datasets across the literature, clearly ignoring the rising datasets that can provide invaluable insights into breast cancer research.
  • Prejudice in the Chosen Methodologies: The concentration on specific techniques such as segmentation and feature selection-based techniques might lead to overlooking other nascent methodologies like reinforcement learning.
  • Explanation of ML Models: Various ML approaches provided varying results for breast cancer detection. The study only looked into their overall performance such as accuracy, and F1 score, without looking at the rationale for their good or bad performance. Looking into their working mechanism might provide better insights.

5. Conclusion and Future Direction

The deduction of this study highlights that the performance and generalizability of various ML models are impacted by the dataset choice such as WDBC, BreakHis, etc. in the diagnosis of breast cancer. Large-sized image-based datasets like BreakHis facilitate DL algorithms and the accuracy of more than 99% was attained. Whereas, small-sized tabular-based datasets such as WDBC require cautious feature development and hybrid approaches for reaching sufficient accuracy. While the over-dependency on various datasets incorporates prejudice, restraining clinical applications. The priorities of future research should include the versatility of datasets, incorporating multi-faceted data such as image-based, and genome-based medical data to improve the robustness of the model. For progressing and developing in the area of ML, target of the future studies must be:

  • Explainable AI Development: The algorithms that provide translucence and interpretability should be preferred which will enable healthcare professionals to comprehend and believe in ML-driven predictions and solutions.
  • Enhancement of Dataset Heterogeneity: Incorporate more heterogeneous and diversified datasets including changing demographics, medical imaging process, and the properties of tumor.
  • Multi-Dimensional Data Integration: Image-based data should be combined with the genome, protein-based, and medical data for an inclusive approach to breast cancer detection.
  • Optimization of Efficiency of Resources: Engineering the compact ML models more suited for deploying in the environment where resources are scarce.
  • Nurturing the Combined Research: Promoting a diverse setting where the multidisciplinary cooperation among data scientists, radiologists, and oncologists narrows the gap between healthcare and technology.

Alongside the above dimensions, there is another research question arose i.e. what will be the future trend of this research? i.e. How can the simpler algorithms such as hlLR in contrast to the more complex models like DL concerning computational cost and accuracy?

In addition to this, the challenge of the trade-off between the accuracy and the interpretation will remain intact with the elementary models which provide lucidity but do not offer higher accuracies as offered by the DL models. Taking up the XAI, cross- dataset validation, and streamlining approaches are an important breakthrough. A roadmap is provided for progressing the diagnosis of breast cancer in this survey via rational, exact, and understandable ML approaches.

Acknowledgements

Funding

This study was funded by the Universidad Europea del Atlantico.

Author contributions

AS conceptualization, data curation, writing - the original draft.

MU formal analysis, conceptualization, writing - the original draft.

MTS methodology, data curation, formal analysis.

MZ software, project administration, methodology.

SAO funding acquisition, visualization, investigation.

RCI investigation, validation, resources.

SH visualization, software, formal analysis.

IA supervision, validation, writing- review and editing.

All authors read and approved the final manuscript.

Competing Interests

The authors have declared that no competing interest exists.

References

1. Alturki N, Umer M, Ishaq A, Abuzinadah N, Alnowaiser K, Mohamed A. et al. Combining CNN features with voting classifiers for optimizing performance of brain tumor classification. Cancers. 2023;15(6):1767

2. Ahmed KT, Rustam F, Mehmood A, Ashraf I, Choi GS. Predicting skin cancer melanoma using stacked convolutional neural networks model. Multimedia Tools and Applications. 2024;83(4):9503-9522

3. Sung H, Ferlay J, Siegel RL, Laversanne M, Soerjomataram I, Jemal A. et al. Global cancer statistics 2020: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA: a cancer journal for clinicians. 2021;71(3):209-249

4. Xu Y, Gong M, Wang Y, Yang Y, Liu S, Zeng Q. Global trends and forecasts of breast cancer incidence and deaths. Scientific data. 2023;10(1):334

5. Health at Hand. Breast cancer awareness month. 2020. Available from: https://www.myhealthathand.com/breast-cancer-awareness-2020/.

6. Siegel RL, Miller KD, Wagle NS, Jemal A. Cancer statistics, 2023. CA: a cancer journal for clinicians. 2023;73(1):17-48

7. Abuzinadah N, Kumar Posa S, Alarfaj AA, Alabdulqader EA, Umer M, Kim TH. et al. Improved prediction of ovarian cancer using ensemble classifier and Shapley explainable AI. Cancers. 2023;15(24):5793

8. Chen X, Aljrees T, Umer M, Saidani O, Almuqren L, Mzoughi O. et al. Cervical cancer detection using K nearest neighbor imputer and stacked ensemble learning model. Digital Health. 2023;9:20552076231203802

9. DAWN. Pakistan has highest rate of breast cancer cases in Asia: expert. Available from: https://www.dawn.com/news/1872838.

10. Shafi I, Din S, Khan A, Díez IDLT, Casanova RJP, Pifarre KT. et al. An effective method for lung cancer diagnosis from CT scan using deep learning-based support vector network. Cancers. 2022;14(21):5457

11. Chaganti R, Rustam F, De La Torre Díez I, Mazón JLV, Rodríguez CL, Ashraf I. Thyroid disease prediction using selective features and machine learning techniques. Cancers. 2022;14(16):3914

12. Shafique R, Rustam F, Choi GS, Díez IDLT, Mahmood A, Lipari V, Velasco CLR, Ashraf I. Breast cancer prediction using fine needle aspiration features and upsampling with supervised machine learning. Cancers. 2023;15(3):681

13. Umer M, Naveed M, Alrowais F, Ishaq A, Hejaili AA, Alsubai S, Eshmawi A, Mohamed A, Ashraf I. Breast cancer detection using convoluted features and ensemble machine learning algorithm. Cancers. 2022;14(23):6015

14. Karamti H, Alharthi R, Anizi AA, Alhebshi RM, Eshmawi A, Alsubai S, Umer M. Improving prediction of cervical cancer using KNN imputed SMOTE features and multi-model ensemble learning approach. Cancers. 2023;15(17):4412

15. Rupapara V, Rustam F, Aljedaani W, Shahzad HF, Lee E. et al. Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model. Scientific Reports. 2022;12(1):1000

16. Gopal VN, Al-Turjman F, Kumar R, Anand L, Rajesh M. Feature selection and classification in breast cancer prediction using IoT and machine learning. Measurement. 2021;178:109442

17. Qasrawi R, Daraghmeh O, Qdaih I, Thwib S, Polo SV, Owienah H, Al-Halawa DA, Atari S. Hybrid ensemble deep learning model for advancing breast cancer detection and classification in clinical applications. Heliyon. 2024 10(19)

18. Resmini R, Silva L, Araujo AS, Medeiros P, Muchaluat-Saade D, Conci A. Combining genetic algorithms and SVM for breast cancer diagnosis using infrared thermography. Sensors. 2021;21(14):4802

19. Chugh G, Kumar S, Singh N. Survey on machine learning and deep learning applications in breast cancer diagnosis. Cognitive Computation. 2021;13(6):1451-1470

20. Yadav RK, Singh P, Kashtriya P. Diagnosis of breast cancer using machine learning techniques - a survey. Procedia Computer Science. 2023;218:1434-1443

21. Priyanka KS. A review paper on breast cancer detection using deep learning. IOP Conference Series: Materials Science and Engineering. 2021;1022:012071

22. Meenalochini G, Ramkumar S. Survey of machine learning algorithms for breast cancer detection using mammogram images. Materials Today: Proceedings. 2021;37:2738-2743

23. Fatima N, Liu L, Hong S, Ahmed H. Prediction of breast cancer, comparative review of machine learning techniques, and their analysis. IEEE Access. 2020;8:150360-150376

24. Rautela K, Kumar D, Kumar V. A systematic review on breast cancer detection using deep learning techniques. Archives of Computational Methods in Engineering. 2022;29(7):4599-4629

25. Dhahri H, Rahmany I, Mahmood A, Al Maghayreh E, Elkilani W. Tabu search and machine-learning classification of benign and malignant proliferative breast lesions. Biomed Research International. 2020;2020:4671349

26. Abdollahi J, Keshandehghan A, Gardaneh M, Panahi Y, Gardaneh M. Accurate detection of breast cancer metastasis using a hybrid model of artificial intelligence algorithm. Archives of Breast Cancer. 2020;7(1):22-28

27. Punitha S, Al-Turjman F, Stephan T. An automated breast cancer diagnosis using feature selection and parameter optimization in ANN. Computers & Electrical Engineering. 2021;90:106958

28. Abunasser BS, Al-Hiealy MRJ, Zaqout IS, Abu-Naser SS. Breast cancer detection and classification using deep learning Xception algorithm. International Journal of Advanced Computer Science and Applications. 2022;13(7):566-574

29. Abunasser BS, Al-Hiealy MRJ, Zaqout IS, Abu-Naser SS. Convolution neural network for breast cancer detection and classification using deep learning. Asian Pacific Journal of Cancer Prevention. 2023;24(2):531-537

30. Singh LK, Khanna M, Singh R. An enhanced soft-computing based strategy for efficient feature selection for timely breast cancer prediction: Wisconsin diagnostic breast cancer dataset case. Multimedia Tools and Applications. 2024;83(31):76607-76672

31. Kadhim RR, Kamil MY. Comparison of breast cancer classification models on Wisconsin dataset. International Journal of Reconfigurable and Embedded Systems. 2022;11(1):49-55

32. Alshayeji MH, Ellethy H, Abed S, Gupta R. Computer-aided detection of breast cancer on the Wisconsin dataset: An artificial neural networks approach. Biomedical Signal Processing and Control. 2022;71:103141

33. Yusuf A, Dima R, Aina S. Optimized breast cancer classification using feature selection and outliers detection. Journal of the Nigerian Society of Physical Sciences. 2021;3(4):298-307

34. Hasan R, Shafi A. Feature selection based breast cancer prediction. International Journal of Image, Graphics and Signal Processing. 2021;13(2):13-21

35. Ibrahim S, Nazir S, Velastin SA. Feature selection using correlation analysis and principal component analysis for accurate breast cancer diagnosis. Journal of Imaging. 2021;7(11):225

36. Huang Z, Chen D. A breast cancer diagnosis method based on VIM feature selection and hierarchical clustering random forest algorithm. IEEE Access. 2021;10:3284-3293

37. Mushtaq Z, Yaqub A, Sani S, Khalid A. Effective k-nearest neighbor classifications for Wisconsin breast cancer datasets. Journal of the Chinese Institute of Engineers. 2020;43(1):80-92

38. Ibrahim MM, Salem DA, Seoud RAAAA. Deep learning hybrid with binary dragonfly feature selection for the Wisconsin breast cancer dataset. International Journal of Advanced Computer Science and Applications. 2021;12(3):74-82

39. Akkur E, Türk F, Eroğul O. Breast cancer classification using a novel hybrid feature selection approach. Neural Network World. 2023;33(2):77-94

40. Krishnan VG, Saradhi MV, Deepa J, Priya KH, Selvaraj D, Divya V. An ensemble filter-based feature selection with deep learning classification for breast cancer prediction using IoT. Journal of Ambient Intelligence and Humanized Computing. 2022;13(7):3239-3254

41. Al Tawil A, Almazaydeh L, Alqudah B, Abualkishik AZ, Alwan AA. Predictive modeling for breast cancer based on machine learning algorithms and feature selection methods. International Journal of Electrical and Computer Engineering. 2024;14(2):1103-1111

42. Hossin MM, Shamrat FJM, Bhuiyan MR, Hira RA, Khan T, Molla S. Breast cancer detection: an effective comparison of different machine learning algorithms on the Wisconsin dataset. Bulletin of Electrical Engineering and Informatics. 2023;12(4):2446-2456

43. Sundar R, Srinivasulu C, Anusha MB, Brahmaiah M, Srikanth T, Gupta KG. Enhancing breast cancer detection from histopathology images: A novel ensemble approach with deep learning-based feature extraction. MATEC Web of Conferences. 2024;392:01139

44. Veena S, Aravindhar DJ. Detection of breast cancer using support vector machine algorithm with fine-tuning and optimization. African Journal of Biomedical Research. 2024;27(3):2256-2261

45. Youssef D, Atef H, Gamal S, El-Azab J, Ismail T. Early breast cancer prediction using thermal images and hybrid feature extraction-based system. IEEE Access. 2025;13:1-10

46. Das SC, Tasnim W, Rana HK, Acharjee UK, Islam MM, Khatun R. Comprehensive bioinformatics and machine learning analyses for breast cancer staging using TCGA dataset. Briefings in Bioinformatics. 2025;26(1):628

47. Jagetiya A, Dadhech P. Optimizing breast cancer prognosis with machine learning for enhanced clinical decision-making. Bio-Algorithms and Med-Systems. 2024;20(1):37-48

48. Wuniri Q, Huangfu W, Liu Y, Lin X, Liu L, Yu Z. A generic-driven wrapper embedded with feature-type-aware hybrid Bayesian classifier for breast cancer classification. IEEE Access. 2019;7:119931-119942

49. Ali SH, Shehata M. A new breast cancer discovery strategy: A combined outlier rejection technique and an ensemble classification method. Bioengineering. 2024;11(11):1148

50. Dehghan MJ, Azizi A. A hybrid intelligent approach to breast cancer diagnosis and treatment using grey wolf optimization algorithm. Jundishapur Journal of Natural Pharmaceutical Products. 2024;18(4):e1345

51. Aamir S, Rahim A, Aamir Z, Abbasi SF, Khan MS, Alhaisoni M. et al. Predicting breast cancer leveraging supervised machine learning techniques. Computational and Mathematical Methods in Medicine. 2022;2022(1):5869529

52. Asare M. Evaluating feature selection methods in machine learning with class imbalance. Master's Thesis, The University of Texas Rio Grande Valley. 2024

53. Chen H, Mei K, Zhou Y, Wang N, Cai G. Auxiliary diagnosis of breast cancer based on machine learning and hybrid strategy. IEEE Access. 2023;11:96374-96386

54. Haq AU, Li JP, Saboor A, Khan J, Wali S, Ahmad S. et al. Detection of breast cancer through clinical data using supervised and unsupervised feature selection techniques. IEEE Access. 2021;9:22090-22105

55. Batool A, Byun YC. Toward improving breast cancer classification using an adaptive voting ensemble learning algorithm. IEEE Access. 2024;12:12869-12882

56. Rastogi M, Vijarania M, Goel N, Agrawal A, Biamba CN, Iwendi C. Conv1d-LSTM: Autonomous breast cancer detection using a one-dimensional convolutional neural network with long short-term memory. IEEE Access. 2024;12:104221

57. Naseem U, Rashid J, Ali L, Kim J, Haq QEU, Awan MJ. et al. An automatic detection of breast cancer diagnosis and prognosis based on machine learning using ensemble of classifiers. IEEE Access. 2022;10:78242-78252

58. Sharmin S, Ahammad T, Talukder MA, Ghose P. A hybrid dependable deep feature extraction and ensemble-based machine learning approach for breast cancer detection. IEEE Access. 2023;11:87694-87708

59. Zheng J, Lin D, Gao Z, Wang S, He M, Fan J. Deep learning assisted efficient AdaBoost algorithm for breast cancer detection and early diagnosis. IEEE Access. 2020;8:96946-96954

60. Rahman MA, Hamada M, Sharmin S, Rimi TA, Talukder AS, Imran N. et al. Enhancing early breast cancer detection through advanced data analysis. IEEE Access. 2024;12:104569

61. Jebarani PE, Umadevi N, Dang H, Pomplun M. A novel hybrid K-means and GMM machine learning model for breast cancer detection. IEEE Access. 2021;9:146153-146162

62. Routray N, Rout SK, Sahu B, Panda SK, Godavarthi D. Ensemble learning with symbiotic organism search optimization algorithm for breast cancer classification and risk identification of other organs on histopathological images. IEEE Access. 2023;11:110544-110557

63. Elkorany AS, Marey M, Almustafa KM, Elsharkawy ZF. Breast cancer diagnosis using support vector machines optimized by whale optimization and dragonfly algorithms. IEEE Access. 2022;10:69688-69699

64. Reshan MSA, Amin S, Zeb MA, Sulaiman A, Alshahrani H, Azar AT. et al. Enhancing breast cancer detection and classification using advanced multi-model features and ensemble machine learning techniques. Life. 2023;13(10):2093

65. Chen H, Wang N, Zhou Y, Mei K, Tang M, Cai G. Breast cancer prediction based on differential privacy and logistic regression optimization model. Applied Sciences. 2023;13(19):10755

66. Ak MF. A comparative analysis of breast cancer detection and diagnosis using data visualization and machine learning applications. Healthcare. 2020;8:111

67. Khalid A, Mehmood A, Alabrah A, Alkhamees BF, Amin F, AlSalman H. et al. Breast cancer detection and prevention using machine learning. Diagnostics. 2023;13(19):3113

68. Avci H, Karakaya J. A novel medical image enhancement algorithm for breast cancer detection on mammography images using machine learning. Diagnostics. 2023;13(3):348

69. Rasool A, Bunterngchit C, Tiejian L, Islam MR, Qu Q, Jiang Q. Improved machine learning-based predictive models for breast cancer diagnosis. International Journal of Environmental Research and Public Health. 2022;19(6):3211

70. Sureshkumar V, Prasad RSN, Balasubramaniam S, Jagannathan D, Daniel J, Dhanasekaran S. Breast cancer detection and analytics using hybrid CNN and extreme learning machine. Journal of Personalized Medicine. 2024;14(8):792

71. Nahid AA, Raihan MJ, Bulbul AAM. Breast cancer classification along with feature prioritization using machine learning algorithms. Health and Technology. 2022;12(6):1061-1069

72. Botlagunta M, Botlagunta MD, Myneni MB, Lakshmi D, Nayyar A, Gullapalli JS. et al. Classification and diagnostic prediction of breast cancer metastasis on clinical data using machine learning algorithms. Scientific Reports. 2023;13(1):485

73. Mohammed SA, Darrab S, Noaman SA, Saake G. Analysis of breast cancer detection using different machine learning techniques. Lecture Notes in Computer Science - Data Mining and Big Data. 2020;5:108-117

74. El Rahman SA. Predicting breast cancer survivability based on machine learning and features selection algorithms: a comparative study. Journal of Ambient Intelligence and Humanized Computing. 2021;12(8):8585-8623

75. Ajlan I, Murad H, Salim A, Yousif A. Extreme learning machine algorithm for breast cancer diagnosis. Multimedia Tools and Applications. 2024;83:1-20

76. 76. Taghizadeh E, Heydarheydari S, Saberi A, JafarpoorNesheli S, Rezaeijo SM. Breast cancer prediction with transcriptome profiling using feature selection and machine learning methods. BMC Bioinformatics. 2022;23(1):410

77. Almarri B, Gupta G, Kumar R, Vandana V, Asiri F, Khan SB. The BCPM method: decoding breast cancer with machine learning. BMC Medical Imaging. 2024;24(1):248

78. Alanazi SA, Kamruzzaman M, Islam Sarker MN, Alruwaili M, Alhwaiti Y, Alshammari N. et al. Boosting breast cancer detection using convolutional neural network. Journal of Healthcare Engineering. 2021;2021:5528622

79. Chen H, Wang N, Du X, Mei K, Zhou Y, Cai G. Classification prediction of breast cancer based on machine learning. Computational Intelligence and Neuroscience. 2023;2023:6530719

80. Mohammed SA, Abeysinghe SD, Ralescu AL. Feature selection and comparative analysis of breast cancer prediction using clinical data and histopathological whole slide images. Advances in Artificial Intelligence and Machine Learning. 2023;3(3):1494-1525

81. Mashudi NA, Rossli SA, Ahmad N, Noor NM. Breast cancer classification: features investigation using machine learning approaches. International Journal of Integrated Engineering. 2021;13(5):107-118

82. Talatian Azad S, Ahmadi G, Rezaeipanah A. An intelligent ensemble classification method based on multi-layer perceptron neural network and evolutionary algorithms for breast cancer diagnosis. Journal of Experimental & Theoretical Artificial Intelligence. 2022;34(6):949-969

83. Hardani D, Nugroho H. Feature selection using rough set theory algorithm for breast cancer diagnosis. IOP Conference Series: Materials Science and Engineering. 2020;771:012017

84. Dada EG, Oyewola DO, Misra S. Computer-aided diagnosis of breast cancer from mammogram images using deep learning algorithms. Journal of Electrical Systems and Information Technology. 2024;11(1):38

85. Singh K, Shastri S, Kumar S, Mansotra V. BC-Net: Early diagnostics of breast cancer using nested ensemble technique of machine learning. Automatic Control and Computer Sciences. 2023;57(6):646-659

86. Davoudi K, Thulasiraman P. Evolving convolutional neural network parameters through the genetic algorithm for the breast cancer classification problem. Simulation. 2021;97(8):511-527

Author contact

Corresponding address Corresponding authors: E-mail(s): nmtahirac.kr; imranashrafac.kr.


Citation styles

APA
Saleem, A., Umair, M., Naseem, M.T., Zubair, M., Obregon, S.A., Iglesias, R.C., Hassan, S., Ashraf, I. (2025). Divulging Patterns: An Analytical Review for Machine Learning Methodologies for Breast Cancer Detection. Journal of Cancer, 16(15), 4316-4337. https://doi.org/10.7150/jca.118698.

ACS
Saleem, A.; Umair, M.; Naseem, M.T.; Zubair, M.; Obregon, S.A.; Iglesias, R.C.; Hassan, S.; Ashraf, I. Divulging Patterns: An Analytical Review for Machine Learning Methodologies for Breast Cancer Detection. J. Cancer 2025, 16 (15), 4316-4337. DOI: 10.7150/jca.118698.

NLM
Saleem A, Umair M, Naseem MT, Zubair M, Obregon SA, Iglesias RC, Hassan S, Ashraf I. Divulging Patterns: An Analytical Review for Machine Learning Methodologies for Breast Cancer Detection. J Cancer 2025; 16(15):4316-4337. doi:10.7150/jca.118698. https://www.jcancer.org/v16p4316.htm

CSE
Saleem A, Umair M, Naseem MT, Zubair M, Obregon SA, Iglesias RC, Hassan S, Ashraf I. 2025. Divulging Patterns: An Analytical Review for Machine Learning Methodologies for Breast Cancer Detection. J Cancer. 16(15):4316-4337.

This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/). See https://ivyspring.com/terms for full terms and conditions.
Popup Image