Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 2054

Search results for: memetic algorithms

1454 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score

Procedia PDF Downloads 134

1453 The Benefits of End-To-End Integrated Planning from the Mine to Client Supply for Minimizing Penalties

Authors: G. Martino, F. Silva, E. Marchal

Abstract:

The control over delivered iron ore blend characteristics is one of the most important aspects of the mining business. The iron ore price is a function of its composition, which is the outcome of the beneficiation process. So, end-to-end integrated planning of mine operations can reduce risks of penalties on the iron ore price. In a standard iron mining company, the production chain is composed of mining, ore beneficiation, and client supply. When mine planning and client supply decisions are made uncoordinated, the beneficiation plant struggles to deliver the best blend possible. Technological improvements in several fields allowed bridging the gap between departments and boosting integrated decision-making processes. Clusterization and classification algorithms over historical production data generate reasonable previsions for quality and volume of iron ore produced for each pile of run-of-mine (ROM) processed. Mathematical modeling can use those deterministic relations to propose iron ore blends that better-fit specifications within a delivery schedule. Additionally, a model capable of representing the whole production chain can clearly compare the overall impact of different decisions in the process. This study shows how flexibilization combined with a planning optimization model between the mine and the ore beneficiation processes can reduce risks of out of specification deliveries. The model capabilities are illustrated on a hypothetical iron ore mine with magnetic separation process. Finally, this study shows ways of cost reduction or profit increase by optimizing process indicators across the production chain and integrating the different plannings with the sales decisions.

Keywords: clusterization and classification algorithms, integrated planning, mathematical modeling, optimization, penalty minimization

Procedia PDF Downloads 124

1452 Peril´s Environment of Energetic Infrastructure Complex System, Modelling by the Crisis Situation Algorithms

Authors: Jiří F. Urbánek, Alena Oulehlová, Hana Malachová, Jiří J. Urbánek Jr.

Abstract:

Crisis situations investigation and modelling are introduced and made within the complex system of energetic critical infrastructure, operating on peril´s environments. Every crisis situations and perils has an origin in the emergency/ crisis event occurrence and they need critical/ crisis interfaces assessment. Here, the emergency events can be expected - then crisis scenarios can be pre-prepared by pertinent organizational crisis management authorities towards their coping; or it may be unexpected - without pre-prepared scenario of event. But the both need operational coping by means of crisis management as well. The operation, forms, characteristics, behaviour and utilization of crisis management have various qualities, depending on real critical infrastructure organization perils, and prevention training processes. An aim is always - better security and continuity of the organization, which successful obtainment needs to find and investigate critical/ crisis zones and functions in critical infrastructure organization models, operating in pertinent perils environment. Our DYVELOP (Dynamic Vector Logistics of Processes) method is disposables for it. Here, it is necessary to derive and create identification algorithm of critical/ crisis interfaces. The locations of critical/ crisis interfaces are the flags of crisis situation in organization of critical infrastructure models. Then, the model of crisis situation will be displayed at real organization of Czech energetic crisis infrastructure subject in real peril environment. These efficient measures are necessary for the infrastructure protection. They will be derived for peril mitigation, crisis situation coping and for environmentally friendly organization survival, continuity and its sustainable development advanced possibilities.

Keywords: algorithms, energetic infrastructure complex system, modelling, peril´s environment

Procedia PDF Downloads 403

1451 Finding a Set of Long Common Substrings with Repeats from m Input Strings

Authors: Tiantian Li, Lusheng Wang, Zhaohui Zhan, Daming Zhu

Abstract:

In this paper, we propose two string problems, and study algorithms and complexity of various versions for those problems. Let S = {s₁, s₂, . . . , sₘ} be a set of m strings. A common substring of S is a substring appearing in every string in S. Given a set of m strings S = {s₁, s₂, . . . , sₘ} and a positive integer k, we want to find a set C of k common substrings of S such that the k common substrings in C appear in the same order and have no overlap among the m input strings in S, and the total length of the k common substring in C is maximized. This problem is referred to as the longest total length of k common substrings from m input strings (LCSS(k, m) for short). The other problem we study here is called the longest total length of a set of common substrings with length more than l from m input string (LSCSS(l, m) for short). Given a set of m strings S = {s₁, s₂, . . . , sₘ} and a positive integer l, for LSCSS(l, m), we want to find a set of common substrings of S, each is of length more than l, such that the total length of all the common substrings is maximized. We show that both problems are NP-hard when k and m are variables. We propose dynamic programming algorithms with time complexity O(k n₁n₂) and O(n₁n₂) to solve LCSS(k, 2) and LSCSS(l, 2), respectively, where n1 and n₂ are the lengths of the two input strings. We then design an algorithm for LSCSS(l, m) when every length > l common substring appears once in each of the m − 1 input strings. The running time is O(n₁²m), where n1 is the length of the input string with no restriction on length > l common substrings. Finally, we propose a fixed parameter algorithm for LSCSS(l, m), where each length > l common substring appears m − 1 + c times among the m − 1 input strings (other than s1). In other words, each length > l common substring may repeatedly appear at most c times among the m − 1 input strings {s₂, s₃, . . . , sₘ}. The running time of the proposed algorithm is O((n12ᶜ)²m), where n₁ is the input string with no restriction on repeats. The LSCSS(l, m) is proposed to handle whole chromosome sequence alignment for different strains of the same species, where more than 98% of letters in core regions are identical.

Keywords: dynamic programming, algorithm, common substrings, string

Procedia PDF Downloads 22

1450 Numerical Iteration Method to Find New Formulas for Nonlinear Equations

Authors: Kholod Mohammad Abualnaja

Abstract:

A new algorithm is presented to find some new iterative methods for solving nonlinear equations F(x)=0 by using the variational iteration method. The efficiency of the considered method is illustrated by example. The results show that the proposed iteration technique, without linearization or small perturbation, is very effective and convenient.

Keywords: variational iteration method, nonlinear equations, Lagrange multiplier, algorithms

Procedia PDF Downloads 545

1449 Development of Academic Software for Medial Axis Determination of Porous Media from High-Resolution X-Ray Microtomography Data

Authors: S. Jurado, E. Pazmino

Abstract:

Determination of the medial axis of a porous media sample is a non-trivial problem of interest for several disciplines, e.g., hydrology, fluid dynamics, contaminant transport, filtration, oil extraction, etc. However, the computational tools available for researchers are limited and restricted. The primary aim of this work was to develop a series of algorithms to extract porosity, medial axis structure, and pore-throat size distributions from porous media domains. A complementary objective was to provide the algorithms as free computational software available to the academic community comprising researchers and students interested in 3D data processing. The burn algorithm was tested on porous media data obtained from High-Resolution X-Ray Microtomography (HRXMT) and idealized computer-generated domains. The real data and idealized domains were discretized in voxels domains of 550³ elements and binarized to denote solid and void regions to determine porosity. Subsequently, the algorithm identifies the layer of void voxels next to the solid boundaries. An iterative process removes or 'burns' void voxels in sequence of layer by layer until all the void space is characterized. Multiples strategies were tested to optimize the execution time and use of computer memory, i.e., segmentation of the overall domain in subdomains, vectorization of operations, and extraction of single burn layer data during the iterative process. The medial axis determination was conducted identifying regions where burnt layers collide. The final medial axis structure was refined to avoid concave-grain effects and utilized to determine the pore throat size distribution. A graphic user interface software was developed to encompass all these algorithms, including the generation of idealized porous media domains. The software allows input of HRXMT data to calculate porosity, medial axis, and pore-throat size distribution and provide output in tabular and graphical formats. Preliminary tests of the software developed during this study achieved medial axis, pore-throat size distribution and porosity determination of 100³, 320³ and 550³ voxel porous media domains in 2, 22, and 45 minutes, respectively in a personal computer (Intel i7 processor, 16Gb RAM). These results indicate that the software is a practical and accessible tool in postprocessing HRXMT data for the academic community.

Keywords: medial axis, pore-throat distribution, porosity, porous media

Procedia PDF Downloads 116

1448 A Graph Library Development Based on the Service-‎Oriented Architecture: Used for Representation of the ‎Biological ‎Systems in the Computer Algorithms

Authors: Mehrshad Khosraviani, Sepehr Najjarpour

Abstract:

Considering the usage of graph-based approaches in systems and synthetic biology, and the various types of ‎the graphs employed by them, a comprehensive graph library based ‎on the three-tier architecture (3TA) was previously introduced for full representation of the biological systems. Although proposing a 3TA-based graph library, three following reasons motivated us to redesign the graph ‎library based on the service-oriented architecture (SOA): (1) Maintaining the accuracy of the data related to an input graph (including its edges, its ‎vertices, its topology, etc.) without involving the end user:‎ Since, in the case of using 3TA, the library files are available to the end users, they may ‎be utilized incorrectly, and consequently, the invalid graph data will be provided to the ‎computer algorithms. However, considering the usage of the SOA, the operation of the ‎graph registration is specified as a service by encapsulation of the library files. In other words, overall control operations needed for registration of the valid data will be the ‎responsibility of the services. (2) Partitioning of the library product into some different parts: Considering 3TA, a whole library product was provided in general. While here, the product ‎can be divided into smaller ones, such as an AND/OR graph drawing service, and each ‎one can be provided individually. As a result, the end user will be able to select any ‎parts of the library product, instead of all features, to add it to a project. (3) Reduction of the complexities: While using 3TA, several other libraries must be needed to add for connecting to the ‎database, responsibility of the provision of the needed library resources in the SOA-‎based graph library is entrusted with the services by themselves. Therefore, the end user ‎who wants to use the graph library is not involved with its complexity. In the end, in order to ‎make ‎the library easier to control in the system, and to restrict the end user from accessing the files, ‎it was preferred to use the service-oriented ‎architecture ‎‎(SOA) over the three-tier architecture (3TA) and to redevelop the previously proposed graph library based on it‎.

Keywords: Bio-Design Automation, Biological System, Graph Library, Service-Oriented Architecture, Systems and Synthetic Biology

Procedia PDF Downloads 311

1447 Machine Learning in Patent Law: How Genetic Breeding Algorithms Challenge Modern Patent Law Regimes

Authors: Stefan Papastefanou

Abstract:

Artificial intelligence (AI) is an interdisciplinary field of computer science with the aim of creating intelligent machine behavior. Early approaches to AI have been configured to operate in very constrained environments where the behavior of the AI system was previously determined by formal rules. Knowledge was presented as a set of rules that allowed the AI system to determine the results for specific problems; as a structure of if-else rules that could be traversed to find a solution to a particular problem or question. However, such rule-based systems typically have not been able to generalize beyond the knowledge provided. All over the world and especially in IT-heavy industries such as the United States, the European Union, Singapore, and China, machine learning has developed to be an immense asset, and its applications are becoming more and more significant. It has to be examined how such products of machine learning models can and should be protected by IP law and for the purpose of this paper patent law specifically, since it is the IP law regime closest to technical inventions and computing methods in technical applications. Genetic breeding models are currently less popular than recursive neural network method and deep learning, but this approach can be more easily described by referring to the evolution of natural organisms, and with increasing computational power; the genetic breeding method as a subset of the evolutionary algorithms models is expected to be regaining popularity. The research method focuses on patentability (according to the world’s most significant patent law regimes such as China, Singapore, the European Union, and the United States) of AI inventions and machine learning. Questions of the technical nature of the problem to be solved, the inventive step as such, and the question of the state of the art and the associated obviousness of the solution arise in the current patenting processes. Most importantly, and the key focus of this paper is the problem of patenting inventions that themselves are developed through machine learning. The inventor of a patent application must be a natural person or a group of persons according to the current legal situation in most patent law regimes. In order to be considered an 'inventor', a person must actually have developed part of the inventive concept. The mere application of machine learning or an AI algorithm to a particular problem should not be construed as the algorithm that contributes to a part of the inventive concept. However, when machine learning or the AI algorithm has contributed to a part of the inventive concept, there is currently a lack of clarity regarding the ownership of artificially created inventions. Since not only all European patent law regimes but also the Chinese and Singaporean patent law approaches include identical terms, this paper ultimately offers a comparative analysis of the most relevant patent law regimes.

Keywords: algorithms, inventor, genetic breeding models, machine learning, patentability

Procedia PDF Downloads 109

1446 Predicting Football Player Performance: Integrating Data Visualization and Machine Learning

Authors: Saahith M. S., Sivakami R.

Abstract:

In the realm of football analytics, particularly focusing on predicting football player performance, the ability to forecast player success accurately is of paramount importance for teams, managers, and fans. This study introduces an elaborate examination of predicting football player performance through the integration of data visualization methods and machine learning algorithms. The research entails the compilation of an extensive dataset comprising player attributes, conducting data preprocessing, feature selection, model selection, and model training to construct predictive models. The analysis within this study will involve delving into feature significance using methodologies like Select Best and Recursive Feature Elimination (RFE) to pinpoint pertinent attributes for predicting player performance. Various machine learning algorithms, including Random Forest, Decision Tree, Linear Regression, Support Vector Regression (SVR), and Artificial Neural Networks (ANN), will be explored to develop predictive models. The evaluation of each model's performance utilizing metrics such as Mean Squared Error (MSE) and R-squared will be executed to gauge their efficacy in predicting player performance. Furthermore, this investigation will encompass a top player analysis to recognize the top-performing players based on the anticipated overall performance scores. Nationality analysis will entail scrutinizing the player distribution based on nationality and investigating potential correlations between nationality and player performance. Positional analysis will concentrate on examining the player distribution across various positions and assessing the average performance of players in each position. Age analysis will evaluate the influence of age on player performance and identify any discernible trends or patterns associated with player age groups. The primary objective is to predict a football player's overall performance accurately based on their individual attributes, leveraging data-driven insights to enrich the comprehension of player success on the field. By amalgamating data visualization and machine learning methodologies, the aim is to furnish valuable tools for teams, managers, and fans to effectively analyze and forecast player performance. This research contributes to the progression of sports analytics by showcasing the potential of machine learning in predicting football player performance and offering actionable insights for diverse stakeholders in the football industry.

Keywords: football analytics, player performance prediction, data visualization, machine learning algorithms, random forest, decision tree, linear regression, support vector regression, artificial neural networks, model evaluation, top player analysis, nationality analysis, positional analysis

Procedia PDF Downloads 39

1445 An Energy-Balanced Clustering Method on Wireless Sensor Networks

Authors: Yu-Ting Tsai, Chiun-Chieh Hsu, Yu-Chun Chu

Abstract:

In recent years, due to the development of wireless network technology, many researchers have devoted to the study of wireless sensor networks. The applications of wireless sensor network mainly use the sensor nodes to collect the required information, and send the information back to the users. Since the sensed area is difficult to reach, there are many restrictions on the design of the sensor nodes, where the most important restriction is the limited energy of sensor nodes. Because of the limited energy, researchers proposed a number of ways to reduce energy consumption and balance the load of sensor nodes in order to increase the network lifetime. In this paper, we proposed the Energy-Balanced Clustering method with Auxiliary Members on Wireless Sensor Networks（EBCAM）based on the cluster routing. The main purpose is to balance the energy consumption on the sensed area and average the distribution of dead nodes in order to avoid excessive energy consumption because of the increasing in transmission distance. In addition, we use the residual energy and average energy consumption of the nodes within the cluster to choose the cluster heads, use the multi hop transmission method to deliver the data, and dynamically adjust the transmission radius according to the load conditions. Finally, we use the auxiliary cluster members to change the delivering path according to the residual energy of the cluster head in order to its load. Finally, we compare the proposed method with the related algorithms via simulated experiments and then analyze the results. It reveals that the proposed method outperforms other algorithms in the numbers of used rounds and the average energy consumption.

Keywords: auxiliary nodes, cluster, load balance, routing algorithm, wireless sensor network

Procedia PDF Downloads 275

1444 An Explanatory Study Approach Using Artificial Intelligence to Forecast Solar Energy Outcome

Authors: Agada N. Ihuoma, Nagata Yasunori

Abstract:

Artificial intelligence (AI) techniques play a crucial role in predicting the expected energy outcome and its performance, analysis, modeling, and control of renewable energy. Renewable energy is becoming more popular for economic and environmental reasons. In the face of global energy consumption and increased depletion of most fossil fuels, the world is faced with the challenges of meeting the ever-increasing energy demands. Therefore, incorporating artificial intelligence to predict solar radiation outcomes from the intermittent sunlight is crucial to enable a balance between supply and demand of energy on loads, predict the performance and outcome of solar energy, enhance production planning and energy management, and ensure proper sizing of parameters when generating clean energy. However, one of the major problems of forecasting is the algorithms used to control, model, and predict performances of the energy systems, which are complicated and involves large computer power, differential equations, and time series. Also, having unreliable data (poor quality) for solar radiation over a geographical location as well as insufficient long series can be a bottleneck to actualization. To overcome these problems, this study employs the anaconda Navigator (Jupyter Notebook) for machine learning which can combine larger amounts of data with fast, iterative processing and intelligent algorithms allowing the software to learn automatically from patterns or features to predict the performance and outcome of Solar Energy which in turns enables the balance of supply and demand on loads as well as enhance production planning and energy management.

Keywords: artificial Intelligence, backward elimination, linear regression, solar energy

Procedia PDF Downloads 158

1443 Multiscale Hub: An Open-Source Framework for Practical Atomistic-To-Continuum Coupling

Authors: Masoud Safdari, Jacob Fish

Abstract:

Despite vast amount of existing theoretical knowledge, the implementation of a universal multiscale modeling, analysis, and simulation software framework remains challenging. Existing multiscale software and solutions are often domain-specific, closed-source and mandate a high-level of experience and skills in both multiscale analysis and programming. Furthermore, tools currently existing for Atomistic-to-Continuum (AtC) multiscaling are developed with the assumptions such as accessibility of high-performance computing facilities to the users. These issues mentioned plus many other challenges have reduced the adoption of multiscale in academia and especially industry. In the current work, we introduce Multiscale Hub (MsHub), an effort towards making AtC more accessible through cloud services. As a joint effort between academia and industry, MsHub provides a universal web-enabled framework for practical multiscaling. Developed on top of universally acclaimed scientific programming language Python, the package currently provides an open-source, comprehensive, easy-to-use framework for AtC coupling. MsHub offers an easy to use interface to prominent molecular dynamics and multiphysics continuum mechanics packages such as LAMMPS and MFEM (a free, lightweight, scalable C++ library for finite element methods). In this work, we first report on the design philosophy of MsHub, challenges identified and issues faced regarding its implementation. MsHub takes the advantage of a comprehensive set of tools and algorithms developed for AtC that can be used for a variety of governing physics. We then briefly report key AtC algorithms implemented in MsHub. Finally, we conclude with a few examples illustrating the capabilities of the package and its future directions.

Keywords: atomistic, continuum, coupling, multiscale

Procedia PDF Downloads 177

1442 Artificial Law: Legal AI Systems and the Need to Satisfy Principles of Justice, Equality and the Protection of Human Rights

Authors: Begum Koru, Isik Aybay, Demet Celik Ulusoy

Abstract:

The discipline of law is quite complex and has its own terminology. Apart from written legal rules, there is also living law, which refers to legal practice. Basic legal rules aim at the happiness of individuals in social life and have different characteristics in different branches such as public or private law. On the other hand, law is a national phenomenon. The law of one nation and the legal system applied on the territory of another nation may be completely different. People who are experts in a particular field of law in one country may have insufficient expertise in the law of another country. Today, in addition to the local nature of law, international and even supranational law rules are applied in order to protect basic human values and ensure the protection of human rights around the world. Systems that offer algorithmic solutions to legal problems using artificial intelligence (AI) tools will perhaps serve to produce very meaningful results in terms of human rights. However, algorithms to be used should not be developed by only computer experts, but also need the contribution of people who are familiar with law, values, judicial decisions, and even the social and political culture of the society to which it will provide solutions. Otherwise, even if the algorithm works perfectly, it may not be compatible with the values of the society in which it is applied. The latest developments involving the use of AI techniques in legal systems indicate that artificial law will emerge as a new field in the discipline of law. More AI systems are already being applied in the field of law, with examples such as predicting judicial decisions, text summarization, decision support systems, and classification of documents. Algorithms for legal systems employing AI tools, especially in the field of prediction of judicial decisions and decision support systems, have the capacity to create automatic decisions instead of judges. When the judge is removed from this equation, artificial intelligence-made law created by an intelligent algorithm on its own emerges, whether the domain is national or international law. In this work, the aim is to make a general analysis of this new topic. Such an analysis needs both a literature survey and a perspective from computer experts' and lawyers' point of view. In some societies, the use of prediction or decision support systems may be useful to integrate international human rights safeguards. In this case, artificial law can serve to produce more comprehensive and human rights-protective results than written or living law. In non-democratic countries, it may even be thought that direct decisions and artificial intelligence-made law would be more protective instead of a decision "support" system. Since the values of law are directed towards "human happiness or well-being", it requires that the AI algorithms should always be capable of serving this purpose and based on the rule of law, the principle of justice and equality, and the protection of human rights.

Keywords: AI and law, artificial law, protection of human rights, AI tools for legal systems

Procedia PDF Downloads 76

1441 Signs, Signals and Syndromes: Algorithmic Surveillance and Global Health Security in the 21st Century

Authors: Stephen L. Roberts

Abstract:

This article offers a critical analysis of the rise of syndromic surveillance systems for the advanced detection of pandemic threats within contemporary global health security frameworks. The article traces the iterative evolution and ascendancy of three such novel syndromic surveillance systems for the strengthening of health security initiatives over the past two decades: 1) The Program for Monitoring Emerging Diseases (ProMED-mail); 2) The Global Public Health Intelligence Network (GPHIN); and 3) HealthMap. This article demonstrates how each newly introduced syndromic surveillance system has become increasingly oriented towards the integration of digital algorithms into core surveillance capacities to continually harness and forecast upon infinitely generating sets of digital, open-source data, potentially indicative of forthcoming pandemic threats. This article argues that the increased centrality of the algorithm within these next-generation syndromic surveillance systems produces a new and distinct form of infectious disease surveillance for the governing of emergent pathogenic contingencies. Conceptually, the article also shows how the rise of this algorithmic mode of infectious disease surveillance produces divergences in the governmental rationalities of global health security, leading to the rise of an algorithmic governmentality within contemporary contexts of Big Data and these surveillance systems. Empirically, this article demonstrates how this new form of algorithmic infectious disease surveillance has been rapidly integrated into diplomatic, legal, and political frameworks to strengthen the practice of global health security – producing subtle, yet distinct shifts in the outbreak notification and reporting transparency of states, increasingly scrutinized by the algorithmic gaze of syndromic surveillance.

Keywords: algorithms, global health, pandemic, surveillance

Procedia PDF Downloads 187

1440 Regret-Regression for Multi-Armed Bandit Problem

Authors: Deyadeen Ali Alshibani

Abstract:

In the literature, the multi-armed bandit problem as a statistical decision model of an agent trying to optimize his decisions while improving his information at the same time. There are several different algorithms models and their applications on this problem. In this paper, we evaluate the Regret-regression through comparing with Q-learning method. A simulation on determination of optimal treatment regime is presented in detail.

Keywords: optimal, bandit problem, optimization, dynamic programming

Procedia PDF Downloads 453

1439 Through Additive Manufacturing. A New Perspective for the Mass Production of Made in Italy Products

Authors: Elisabetta Cianfanelli, Paolo Pupparo, Maria Claudia Coppola

Abstract:

The recent evolutions in the innovation processes and in the intrinsic tendencies of the product development process, lead to new considerations on the design flow. The instability and complexity that contemporary life describes, defines new problems in the production of products, stimulating at the same time the adoption of new solutions across the entire design process. The advent of Additive Manufacturing, but also of IOT and AI technologies, continuously puts us in front of new paradigms regarding design as a social activity. The totality of these technologies from the point of view of application describes a whole series of problems and considerations immanent to design thinking. Addressing these problems may require some initial intuition and the use of some provisional set of rules or plausible strategies, i.e., heuristic reasoning. At the same time, however, the evolution of digital technology and the computational speed of new design tools describe a new and contrary design framework in which to operate. It is therefore interesting to understand the opportunities and boundaries of the new man-algorithm relationship. The contribution investigates the man-algorithm relationship starting from the state of the art of the Made in Italy model, the most known fields of application are described and then focus on specific cases in which the mutual relationship between man and AI becomes a new driving force of innovation for entire production chains. On the other hand, the use of algorithms could engulf many design phases, such as the definition of shape, dimensions, proportions, materials, static verifications, and simulations. Operating in this context, therefore, becomes a strategic action, capable of defining fundamental choices for the design of product systems in the near future. If there is a human-algorithm combination within a new integrated system, quantitative values can be controlled in relation to qualitative and material values. The trajectory that is described therefore becomes a new design horizon in which to operate, where it is interesting to highlight the good practices that already exist. In this context, the designer developing new forms can experiment with ways still unexpressed in the project and can define a new synthesis and simplification of algorithms, so that each artifact has a signature in order to define in all its parts, emotional and structural. This signature of the designer, a combination of values and design culture, will be internal to the algorithms and able to relate to digital technologies, creating a generative dialogue for design purposes. The result that is envisaged indicates a new vision of digital technologies, no longer understood only as of the custodians of vast quantities of information, but also as a valid integrated tool in close relationship with the design culture.

Keywords: decision making, design euristics, product design, product design process, design paradigms

Procedia PDF Downloads 119

1438 Optimal Operation of Bakhtiari and Roudbar Dam Using Differential Evolution Algorithms

Authors: Ramin Mansouri

Abstract:

Due to the contrast of rivers discharge regime with water demands, one of the best ways to use water resources is to regulate the natural flow of the rivers and supplying water needs to construct dams. Optimal utilization of reservoirs, consideration of multiple important goals together at the same is of very high importance. To study about analyzing this method, statistical data of Bakhtiari and Roudbar dam over 46 years (1955 until 2001) is used. Initially an appropriate objective function was specified and using DE algorithm, the rule curve was developed. In continue, operation policy using rule curves was compared to standard comparative operation policy. The proposed method distributed the lack to the whole year and lowest damage was inflicted to the system. The standard deviation of monthly shortfall of each year with the proposed algorithm was less deviated than the other two methods. The Results show that median values for the coefficients of F and Cr provide the optimum situation and cause DE algorithm not to be trapped in local optimum. The most optimal answer for coefficients are 0.6 and 0.5 for F and Cr coefficients, respectively. After finding the best combination of coefficients values F and CR, algorithms for solving the independent populations were examined. For this purpose, the population of 4, 25, 50, 100, 500 and 1000 members were studied in two generations (G=50 and 100). result indicates that the generation number 200 is suitable for optimizing. The increase in time per the number of population has almost a linear trend, which indicates the effect of population in the runtime algorithm. Hence specifying suitable population to obtain an optimal results is very important. Standard operation policy had better reversibility percentage, but inflicts severe vulnerability to the system. The results obtained in years of low rainfall had very good results compared to other comparative methods.

Keywords: reservoirs, differential evolution, dam, Optimal operation

Procedia PDF Downloads 78

1437 Recent Advances in Data Warehouse

Authors: Fahad Hanash Alzahrani

Abstract:

This paper describes some recent advances in a quickly developing area of data storing and processing based on Data Warehouses and Data Mining techniques, which are associated with software, hardware, data mining algorithms and visualisation techniques having common features for any specific problems and tasks of their implementation.

Keywords: data warehouse, data mining, knowledge discovery in databases, on-line analytical processing

Procedia PDF Downloads 404

1436 Refining Scheme Using Amphibious Epistemologies

Authors: David Blaine, George Raschbaum

Abstract:

The evaluation of DHCP has synthesized SCSI disks, and current trends suggest that the exploration of e-business that would allow for further study into robots will soon emerge. Given the current status of embedded algorithms, hackers worldwide obviously desire the exploration of replication, which embodies the confusing principles of programming languages. In our research we concentrate our efforts on arguing that erasure coding can be made "fuzzy", encrypted, and game-theoretic.

Keywords: SCHI disks, robot, algorithm, hacking, programming language

Procedia PDF Downloads 430

1435 Dimensionality Reduction in Modal Analysis for Structural Health Monitoring

Authors: Elia Favarelli, Enrico Testi, Andrea Giorgetti

Abstract:

Autonomous structural health monitoring (SHM) of many structures and bridges became a topic of paramount importance for maintenance purposes and safety reasons. This paper proposes a set of machine learning (ML) tools to perform automatic feature selection and detection of anomalies in a bridge from vibrational data and compare different feature extraction schemes to increase the accuracy and reduce the amount of data collected. As a case study, the Z-24 bridge is considered because of the extensive database of accelerometric data in both standard and damaged conditions. The proposed framework starts from the first four fundamental frequencies extracted through operational modal analysis (OMA) and clustering, followed by density-based time-domain filtering (tracking). The fundamental frequencies extracted are then fed to a dimensionality reduction block implemented through two different approaches: feature selection (intelligent multiplexer) that tries to estimate the most reliable frequencies based on the evaluation of some statistical features (i.e., mean value, variance, kurtosis), and feature extraction (auto-associative neural network (ANN)) that combine the fundamental frequencies to extract new damage sensitive features in a low dimensional feature space. Finally, one class classifier (OCC) algorithms perform anomaly detection, trained with standard condition points, and tested with normal and anomaly ones. In particular, a new anomaly detector strategy is proposed, namely one class classifier neural network two (OCCNN2), which exploit the classification capability of standard classifiers in an anomaly detection problem, finding the standard class (the boundary of the features space in normal operating conditions) through a two-step approach: coarse and fine boundary estimation. The coarse estimation uses classics OCC techniques, while the fine estimation is performed through a feedforward neural network (NN) trained that exploits the boundaries estimated in the coarse step. The detection algorithms vare then compared with known methods based on principal component analysis (PCA), kernel principal component analysis (KPCA), and auto-associative neural network (ANN). In many cases, the proposed solution increases the performance with respect to the standard OCC algorithms in terms of F1 score and accuracy. In particular, by evaluating the correct features, the anomaly can be detected with accuracy and an F1 score greater than 96% with the proposed method.

Keywords: anomaly detection, frequencies selection, modal analysis, neural network, sensor network, structural health monitoring, vibration measurement

Procedia PDF Downloads 124

1434 Early Gastric Cancer Prediction from Diet and Epidemiological Data Using Machine Learning in Mizoram Population

Authors: Brindha Senthil Kumar, Payel Chakraborty, Senthil Kumar Nachimuthu, Arindam Maitra, Prem Nath

Abstract:

Gastric cancer is predominantly caused by demographic and diet factors as compared to other cancer types. The aim of the study is to predict Early Gastric Cancer (ECG) from diet and lifestyle factors using supervised machine learning algorithms. For this study, 160 healthy individual and 80 cases were selected who had been followed for 3 years (2016-2019), at Civil Hospital, Aizawl, Mizoram. A dataset containing 11 features that are core risk factors for the gastric cancer were extracted. Supervised machine algorithms: Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Multilayer perceptron, and Random Forest were used to analyze the dataset using Python Jupyter Notebook Version 3. The obtained classified results had been evaluated using metrics parameters: minimum_false_positives, brier_score, accuracy, precision, recall, F1_score, and Receiver Operating Characteristics (ROC) curve. Data analysis results showed Naive Bayes - 88, 0.11; Random Forest - 83, 0.16; SVM - 77, 0.22; Logistic Regression - 75, 0.25 and Multilayer perceptron - 72, 0.27 with respect to accuracy and brier_score in percent. Naive Bayes algorithm out performs with very low false positive rates as well as brier_score and good accuracy. Naive Bayes algorithm classification results in predicting ECG showed very satisfactory results using only diet cum lifestyle factors which will be very helpful for the physicians to educate the patients and public, thereby mortality of gastric cancer can be reduced/avoided with this knowledge mining work.

Keywords: Early Gastric cancer, Machine Learning, Diet, Lifestyle Characteristics

Procedia PDF Downloads 164

1433 Sensor Registration in Multi-Static Sonar Fusion Detection

Authors: Longxiang Guo, Haoyan Hao, Xueli Sheng, Hanjun Yu, Jingwei Yin

Abstract:

In order to prevent target splitting and ensure the accuracy of fusion, system error registration is an important step in multi-static sonar fusion detection system. To eliminate the inherent system errors including distance error and angle error of each sonar in detection, this paper uses offline estimation method for error registration. Suppose several sonars from different platforms work together to detect a target. The target position detected by each sonar is based on each sonar’s own reference coordinate system. Based on the two-dimensional stereo projection method, this paper uses real-time quality control (RTQC) method and least squares (LS) method to estimate sensor biases. The RTQC method takes the average value of each sonar’s data as the observation value and the LS method makes the least square processing of each sonar’s data to get the observation value. In the underwater acoustic environment, matlab simulation is carried out and the simulation results show that both algorithms can estimate the distance and angle error of sonar system. The performance of the two algorithms is also compared through the root mean square error and the influence of measurement noise on registration accuracy is explored by simulation. The system error convergence of RTQC method is rapid, but the distribution of targets has a serious impact on its performance. LS method can not be affected by target distribution, but the increase of random noise will slow down the convergence rate. LS method is an improvement of RTQC method, which is widely used in two-dimensional registration. The improved method can be used for underwater multi-target detection registration.

Keywords: data fusion, multi-static sonar detection, offline estimation, sensor registration problem

Procedia PDF Downloads 169

1432 Evotrader: Bitcoin Trading Using Evolutionary Algorithms on Technical Analysis and Social Sentiment Data

Authors: Martin Pellon Consunji

Abstract:

Due to the rise in popularity of Bitcoin and other crypto assets as a store of wealth and speculative investment, there is an ever-growing demand for automated trading tools, such as bots, in order to gain an advantage over the market. Traditionally, trading in the stock market was done by professionals with years of training who understood patterns and exploited market opportunities in order to gain a profit. However, nowadays a larger portion of market participants are at minimum aided by market-data processing bots, which can generally generate more stable signals than the average human trader. The rise in trading bot usage can be accredited to the inherent advantages that bots have over humans in terms of processing large amounts of data, lack of emotions of fear or greed, and predicting market prices using past data and artificial intelligence, hence a growing number of approaches have been brought forward to tackle this task. However, the general limitation of these approaches can still be broken down to the fact that limited historical data doesn’t always determine the future, and that a lot of market participants are still human emotion-driven traders. Moreover, developing markets such as those of the cryptocurrency space have even less historical data to interpret than most other well-established markets. Due to this, some human traders have gone back to the tried-and-tested traditional technical analysis tools for exploiting market patterns and simplifying the broader spectrum of data that is involved in making market predictions. This paper proposes a method which uses neuro evolution techniques on both sentimental data and, the more traditionally human-consumed, technical analysis data in order to gain a more accurate forecast of future market behavior and account for the way both automated bots and human traders affect the market prices of Bitcoin and other cryptocurrencies. This study’s approach uses evolutionary algorithms to automatically develop increasingly improved populations of bots which, by using the latest inflows of market analysis and sentimental data, evolve to efficiently predict future market price movements. The effectiveness of the approach is validated by testing the system in a simulated historical trading scenario, a real Bitcoin market live trading scenario, and testing its robustness in other cryptocurrency and stock market scenarios. Experimental results during a 30-day period show that this method outperformed the buy and hold strategy by over 260% in terms of net profits, even when taking into consideration standard trading fees.

Keywords: neuro-evolution, Bitcoin, trading bots, artificial neural networks, technical analysis, evolutionary algorithms

Procedia PDF Downloads 124

1431 Medicompills Architecture: A Mathematical Precise Tool to Reduce the Risk of Diagnosis Errors on Precise Medicine

Authors: Adriana Haulica

Abstract:

Powered by Machine Learning, Precise medicine is tailored by now to use genetic and molecular profiling, with the aim of optimizing the therapeutic benefits for cohorts of patients. As the majority of Machine Language algorithms come from heuristics, the outputs have contextual validity. This is not very restrictive in the sense that medicine itself is not an exact science. Meanwhile, the progress made in Molecular Biology, Bioinformatics, Computational Biology, and Precise Medicine, correlated with the huge amount of human biology data and the increase in computational power, opens new healthcare challenges. A more accurate diagnosis is needed along with real-time treatments by processing as much as possible from the available information. The purpose of this paper is to present a deeper vision for the future of Artificial Intelligence in Precise medicine. In fact, actual Machine Learning algorithms use standard mathematical knowledge, mostly Euclidian metrics and standard computation rules. The loss of information arising from the classical methods prevents obtaining 100% evidence on the diagnosis process. To overcome these problems, we introduce MEDICOMPILLS, a new architectural concept tool of information processing in Precise medicine that delivers diagnosis and therapy advice. This tool processes poly-field digital resources: global knowledge related to biomedicine in a direct or indirect manner but also technical databases, Natural Language Processing algorithms, and strong class optimization functions. As the name suggests, the heart of this tool is a compiler. The approach is completely new, tailored for omics and clinical data. Firstly, the intrinsic biological intuition is different from the well-known “a needle in a haystack” approach usually used when Machine Learning algorithms have to process differential genomic or molecular data to find biomarkers. Also, even if the input is seized from various types of data, the working engine inside the MEDICOMPILLS does not search for patterns as an integrative tool. This approach deciphers the biological meaning of input data up to the metabolic and physiologic mechanisms, based on a compiler with grammars issued from bio-algebra-inspired mathematics. It translates input data into bio-semantic units with the help of contextual information iteratively until Bio-Logical operations can be performed on the base of the “common denominator “rule. The rigorousness of MEDICOMPILLS comes from the structure of the contextual information on functions, built to be analogous to mathematical “proofs”. The major impact of this architecture is expressed by the high accuracy of the diagnosis. Detected as a multiple conditions diagnostic, constituted by some main diseases along with unhealthy biological states, this format is highly suitable for therapy proposal and disease prevention. The use of MEDICOMPILLS architecture is highly beneficial for the healthcare industry. The expectation is to generate a strategic trend in Precise medicine, making medicine more like an exact science and reducing the considerable risk of errors in diagnostics and therapies. The tool can be used by pharmaceutical laboratories for the discovery of new cures. It will also contribute to better design of clinical trials and speed them up.

Keywords: bio-semantic units, multiple conditions diagnosis, NLP, omics

Procedia PDF Downloads 70

1430 Advancing Urban Sustainability through Data-Driven Machine Learning Solutions

Authors: Nasim Eslamirad, Mahdi Rasoulinezhad, Francesco De Luca, Sadok Ben Yahia, Kimmo Sakari Lylykangas, Francesco Pilla

Abstract:

With the ongoing urbanization, cities face increasing environmental challenges impacting human well-being. To tackle these issues, data-driven approaches in urban analysis have gained prominence, leveraging urban data to promote sustainability. Integrating Machine Learning techniques enables researchers to analyze and predict complex environmental phenomena like Urban Heat Island occurrences in urban areas. This paper demonstrates the implementation of data-driven approach and interpretable Machine Learning algorithms with interpretability techniques to conduct comprehensive data analyses for sustainable urban design. The developed framework and algorithms are demonstrated for Tallinn, Estonia to develop sustainable urban strategies to mitigate urban heat waves. Geospatial data, preprocessed and labeled with UHI levels, are used to train various ML models, with Logistic Regression emerging as the best-performing model based on evaluation metrics to derive a mathematical equation representing the area with UHI or without UHI effects, providing insights into UHI occurrences based on buildings and urban features. The derived formula highlights the importance of building volume, height, area, and shape length to create an urban environment with UHI impact. The data-driven approach and derived equation inform mitigation strategies and sustainable urban development in Tallinn and offer valuable guidance for other locations with varying climates.

Keywords: data-driven approach, machine learning transparent models, interpretable machine learning models, urban heat island effect

Procedia PDF Downloads 41

1429 Multivariate Data Analysis for Automatic Atrial Fibrillation Detection

Authors: Zouhair Haddi, Stephane Delliaux, Jean-Francois Pons, Ismail Kechaf, Jean-Claude De Haro, Mustapha Ouladsine

Abstract:

Atrial fibrillation (AF) has been considered as the most common cardiac arrhythmia, and a major public health burden associated with significant morbidity and mortality. Nowadays, telemedical approaches targeting cardiac outpatients situate AF among the most challenged medical issues. The automatic, early, and fast AF detection is still a major concern for the healthcare professional. Several algorithms based on univariate analysis have been developed to detect atrial fibrillation. However, the published results do not show satisfactory classification accuracy. This work was aimed at resolving this shortcoming by proposing multivariate data analysis methods for automatic AF detection. Four publicly-accessible sets of clinical data (AF Termination Challenge Database, MIT-BIH AF, Normal Sinus Rhythm RR Interval Database, and MIT-BIH Normal Sinus Rhythm Databases) were used for assessment. All time series were segmented in 1 min RR intervals window and then four specific features were calculated. Two pattern recognition methods, i.e., Principal Component Analysis (PCA) and Learning Vector Quantization (LVQ) neural network were used to develop classification models. PCA, as a feature reduction method, was employed to find important features to discriminate between AF and Normal Sinus Rhythm. Despite its very simple structure, the results show that the LVQ model performs better on the analyzed databases than do existing algorithms, with high sensitivity and specificity (99.19% and 99.39%, respectively). The proposed AF detection holds several interesting properties, and can be implemented with just a few arithmetical operations which make it a suitable choice for telecare applications.

Keywords: atrial fibrillation, multivariate data analysis, automatic detection, telemedicine

Procedia PDF Downloads 269

1428 Parametric Analysis of Lumped Devices Modeling Using Finite-Difference Time-Domain

Authors: Felipe M. de Freitas, Icaro V. Soares, Lucas L. L. Fortes, Sandro T. M. Gonçalves, Úrsula D. C. Resende

Abstract:

The SPICE-based simulators are quite robust and widely used for simulation of electronic circuits, their algorithms support linear and non-linear lumped components and they can manipulate an expressive amount of encapsulated elements. Despite the great potential of these simulators based on SPICE in the analysis of quasi-static electromagnetic field interaction, that is, at low frequency, these simulators are limited when applied to microwave hybrid circuits in which there are both lumped and distributed elements. Usually the spatial discretization of the FDTD (Finite-Difference Time-Domain) method is done according to the actual size of the element under analysis. After spatial discretization, the Courant Stability Criterion calculates the maximum temporal discretization accepted for such spatial discretization and for the propagation velocity of the wave. This criterion guarantees the stability conditions for the leapfrogging of the Yee algorithm; however, it is known that for the field update, the stability of the complete FDTD procedure depends on factors other than just the stability of the Yee algorithm, because the FDTD program needs other algorithms in order to be useful in engineering problems. Examples of these algorithms are Absorbent Boundary Conditions (ABCs), excitation sources, subcellular techniques, grouped elements, and non-uniform or non-orthogonal meshes. In this work, the influence of the stability of the FDTD method in the modeling of concentrated elements such as resistive sources, resistors, capacitors, inductors and diode will be evaluated. In this paper is proposed, therefore, the electromagnetic modeling of electronic components in order to create models that satisfy the needs for simulations of circuits in ultra-wide frequencies. The models of the resistive source, the resistor, the capacitor, the inductor, and the diode will be evaluated, among the mathematical models for lumped components in the LE-FDTD method (Lumped-Element Finite-Difference Time-Domain), through the parametric analysis of Yee cells size which discretizes the lumped components. In this way, it is sought to find an ideal cell size so that the analysis in FDTD environment is in greater agreement with the expected circuit behavior, maintaining the stability conditions of this method. Based on the mathematical models and the theoretical basis of the required extensions of the FDTD method, the computational implementation of the models in Matlab® environment is carried out. The boundary condition Mur is used as the absorbing boundary of the FDTD method. The validation of the model is done through the comparison between the obtained results by the FDTD method through the electric field values and the currents in the components, and the analytical results using circuit parameters.

Keywords: hybrid circuits, LE-FDTD, lumped element, parametric analysis

Procedia PDF Downloads 155

1427 Optimizing Parallel Computing Systems: A Java-Based Approach to Modeling and Performance Analysis

Authors: Maher Ali Rusho, Sudipta Halder

Abstract:

The purpose of the study is to develop optimal solutions for models of parallel computing systems using the Java language. During the study, programmes were written for the examined models of parallel computing systems. The result of the parallel sorting code is the output of a sorted array of random numbers. When processing data in parallel, the time spent on processing and the first elements of the list of squared numbers are displayed. When processing requests asynchronously, processing completion messages are displayed for each task with a slight delay. The main results include the development of optimisation methods for algorithms and processes, such as the division of tasks into subtasks, the use of non-blocking algorithms, effective memory management, and load balancing, as well as the construction of diagrams and comparison of these methods by characteristics, including descriptions, implementation examples, and advantages. In addition, various specialised libraries were analysed to improve the performance and scalability of the models. The results of the work performed showed a substantial improvement in response time, bandwidth, and resource efficiency in parallel computing systems. Scalability and load analysis assessments were conducted, demonstrating how the system responds to an increase in data volume or the number of threads. Profiling tools were used to analyse performance in detail and identify bottlenecks in models, which improved the architecture and implementation of parallel computing systems. The obtained results emphasise the importance of choosing the right methods and tools for optimising parallel computing systems, which can substantially improve their performance and efficiency.

Keywords: algorithm optimisation, memory management, load balancing, performance profiling, asynchronous programming.

Procedia PDF Downloads 14

1426 Predicting Blockchain Technology Installation Cost in Supply Chain System through Supervised Learning

Authors: Hossein Havaeji, Tony Wong, Thien-My Dao

Abstract:

1. Research Problems and Research Objectives: Blockchain Technology-enabled Supply Chain System (BT-enabled SCS) is the system using BT to drive SCS transparency, security, durability, and process integrity as SCS data is not always visible, available, or trusted. The costs of operating BT in the SCS are a common problem in several organizations. The costs must be estimated as they can impact existing cost control strategies. To account for system and deployment costs, it is necessary to overcome the following hurdle. The problem is that the costs of developing and running a BT in SCS are not yet clear in most cases. Many industries aiming to use BT have special attention to the importance of BT installation cost which has a direct impact on the total costs of SCS. Predicting BT installation cost in SCS may help managers decide whether BT is to be an economic advantage. The purpose of the research is to identify some main BT installation cost components in SCS needed for deeper cost analysis. We then identify and categorize the main groups of cost components in more detail to utilize them in the prediction process. The second objective is to determine the suitable Supervised Learning technique in order to predict the costs of developing and running BT in SCS in a particular case study. The last aim is to investigate how the running BT cost can be involved in the total cost of SCS. 2. Work Performed: Applied successfully in various fields, Supervised Learning is a method to set the data frame, treat the data, and train/practice the method sort. It is a learning model directed to make predictions of an outcome measurement based on a set of unforeseen input data. The following steps must be conducted to search for the objectives of our subject. The first step is to make a literature review to identify the different cost components of BT installation in SCS. Based on the literature review, we should choose some Supervised Learning methods which are suitable for BT installation cost prediction in SCS. According to the literature review, some Supervised Learning algorithms which provide us with a powerful tool to classify BT installation components and predict BT installation cost are the Support Vector Regression (SVR) algorithm, Back Propagation (BP) neural network, and Artificial Neural Network (ANN). Choosing a case study to feed data into the models comes into the third step. Finally, we will propose the best predictive performance to find the minimum BT installation costs in SCS. 3. Expected Results and Conclusion: This study tends to propose a cost prediction of BT installation in SCS with the help of Supervised Learning algorithms. At first attempt, we will select a case study in the field of BT-enabled SCS, and then use some Supervised Learning algorithms to predict BT installation cost in SCS. We continue to find the best predictive performance for developing and running BT in SCS. Finally, the paper will be presented at the conference.

Keywords: blockchain technology, blockchain technology-enabled supply chain system, installation cost, supervised learning

Procedia PDF Downloads 122

1425 Comparison of Deep Learning and Machine Learning Algorithms to Diagnose and Predict Breast Cancer

Authors: F. Ghazalnaz Sharifonnasabi, Iman Makhdoom

Abstract:

Breast cancer is a serious health concern that affects many people around the world. According to a study published in the Breast journal, the global burden of breast cancer is expected to increase significantly over the next few decades. The number of deaths from breast cancer has been increasing over the years, but the age-standardized mortality rate has decreased in some countries. It’s important to be aware of the risk factors for breast cancer and to get regular check- ups to catch it early if it does occur. Machin learning techniques have been used to aid in the early detection and diagnosis of breast cancer. These techniques, that have been shown to be effective in predicting and diagnosing the disease, have become a research hotspot. In this study, we consider two deep learning approaches including: Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN). We also considered the five-machine learning algorithm titled: Decision Tree (C4.5), Naïve Bayesian (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) Algorithm and XGBoost (eXtreme Gradient Boosting) on the Breast Cancer Wisconsin Diagnostic dataset. We have carried out the process of evaluating and comparing classifiers involving selecting appropriate metrics to evaluate classifier performance and selecting an appropriate tool to quantify this performance. The main purpose of the study is predicting and diagnosis breast cancer, applying the mentioned algorithms and also discovering of the most effective with respect to confusion matrix, accuracy and precision. It is realized that CNN outperformed all other classifiers and achieved the highest accuracy (0.982456). The work is implemented in the Anaconda environment based on Python programing language.

Keywords: breast cancer, multi-layer perceptron, Naïve Bayesian, SVM, decision tree, convolutional neural network, XGBoost, KNN

Procedia PDF Downloads 78