Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 13

Search results for: Shapley value

13 Node Pair Selection Scheme in Relay-Aided Communication Based on Stable Marriage Problem

Authors: Tetsuki Taniguchi, Yoshio Karasawa

Abstract:

This paper describes a node pair selection scheme in relay-aided multiple source multiple destination communication system based on stable marriage problem. A general case is assumed in which all of source, relay and destination nodes are equipped with multiantenna and carry out multistream transmission. Based on several metrics introduced from inter-node channel condition, the preference order is determined about all source-relay and relay-destination relations, and then the node pairs are determined using Gale-Shapley algorithm. The computer simulations show that the effectiveness of node pair selection is larger in multihop communication. Some additional aspects which are different from relay-less case are also investigated.

Keywords: relay, multiple input multiple output (MIMO), multiuser, amplify and forward, stable marriage problem, Gale-Shapley algorithm

Procedia PDF Downloads 394

12 Optimality of Shapley Value Mechanism under Sybil Strategies

Authors: Bruno Mazorra Roig

Abstract:

In the realm of cost-sharing mechanisms, the vulnerability to Sybil strategies, where agents can create fake identities to manipulate outcomes, has not yet been studied. In this paper, we delve into the intricacies of different cost-sharing mechanisms proposed in the literature, highlighting its non-Sybil-resistance nature. Furthermore, we prove that under mild conditions, a Sybil-proof cost-sharing mechanism for public excludable goods is at least (n/2 + 1)−approximate. This finding reveals an exponential increase in the worst-case social cost in environments where agents are restricted from using Sybil strategies. We introduce the concept of Sybil Welfare Invariant mechanisms, where a mechanism maintains its worst-case welfare under Sybil strategies for every set of prior beliefs with full support even when the mechanism is not Sybil-proof. Finally, we prove that the Shapley value mechanism for public excludable goods holds this property and so deduce that the worst-case social cost of this mechanism is the nth harmonic number Hn under the equilibrium of the game with Sybil strategies, matching the worst-case social cost bound for cost-sharing mechanisms. This finding carries important implications for decentralized autonomous organizations (DAOs), indicating that they are capable of funding public excludable goods efficiently, even when the total number of agents is unknown.

Keywords: game theory, mechanism design, cost sharing, false-name proofness

Procedia PDF Downloads 58

11 Using Machine Learning to Classify Human Fetal Health and Analyze Feature Importance

Authors: Yash Bingi, Yiqiao Yin

Abstract:

Reduction of child mortality is an ongoing struggle and a commonly used factor in determining progress in the medical field. The under-5 mortality number is around 5 million around the world, with many of the deaths being preventable. In light of this issue, Cardiotocograms (CTGs) have emerged as a leading tool to determine fetal health. By using ultrasound pulses and reading the responses, CTGs help healthcare professionals assess the overall health of the fetus to determine the risk of child mortality. However, interpreting the results of the CTGs is time-consuming and inefficient, especially in underdeveloped areas where an expert obstetrician is hard to come by. Using a support vector machine (SVM) and oversampling, this paper proposed a model that classifies fetal health with an accuracy of 99.59%. To further explain the CTG measurements, an algorithm based on Randomized Input Sampling for Explanation ((RISE) of Black-box Models was created, called Feature Alteration for explanation of Black Box Models (FAB), and compared the findings to Shapley Additive Explanations (SHAP) and Local Interpretable Model Agnostic Explanations (LIME). This allows doctors and medical professionals to classify fetal health with high accuracy and determine which features were most influential in the process.

Keywords: machine learning, fetal health, gradient boosting, support vector machine, Shapley values, local interpretable model agnostic explanations

Procedia PDF Downloads 141

10 An Owen Value for Cooperative Games with Pairwise a Priori Incompatibilities

Authors: Jose M. Gallardo, Nieves Jimenez, Andres Jimenez-Losada, Esperanza Lebron

Abstract:

A game with a priori incompatibilities is a triple (N,v,g) where (N,v) is a cooperative game, and (N,g) is a graph which establishes initial incompatibilities between some players. In these games, the negotiation has two stages. In the first stage, players can only negotiate with others with whom they are compatible. In the second stage, the grand coalition will be formed. We introduce a value for these games. Given a game with a priori incompatibility (N,v,g), we consider the family of coalitions without incompatibility relations among their players. This family is a normal set system or coalition configuration Ig. Therefore, we can assign to each game with a priori incompatibilities (N,v,g) a game with coalition configuration (N,v, Ig). Now, in order to obtain a payoff vector for (N,v,g), it suffices to calculate a payoff vector for (N,v, Ig). To this end, we apply a value for games with coalition configuration. In our case, we will use the dual configuration value, which has been studied in the literature. With this method, we obtain a value for games with a priori incompatibilities, which is called the Owen value for a priori incompatibilities. We provide a characterization of this value.

Keywords: cooperative game, game with coalition configuration, graph, independent set, Owen value, Shapley value

Procedia PDF Downloads 123

9 Machine Learning for Feature Selection and Classification of Systemic Lupus Erythematosus

Authors: H. Zidoum, A. AlShareedah, S. Al Sawafi, A. Al-Ansari, B. Al Lawati

Abstract:

Systemic lupus erythematosus (SLE) is an autoimmune disease with genetic and environmental components. SLE is characterized by a wide variability of clinical manifestations and a course frequently subject to unpredictable flares. Despite recent progress in classification tools, the early diagnosis of SLE is still an unmet need for many patients. This study proposes an interpretable disease classification model that combines the high and efficient predictive performance of CatBoost and the model-agnostic interpretation tools of Shapley Additive exPlanations (SHAP). The CatBoost model was trained on a local cohort of 219 Omani patients with SLE as well as other control diseases. Furthermore, the SHAP library was used to generate individual explanations of the model's decisions as well as rank clinical features by contribution. Overall, we achieved an AUC score of 0.945, F1-score of 0.92 and identified four clinical features (alopecia, renal disorders, cutaneous lupus, and hemolytic anemia) along with the patient's age that was shown to have the greatest contribution on the prediction.

Keywords: feature selection, classification, systemic lupus erythematosus, model interpretation, SHAP, Catboost

Procedia PDF Downloads 77

8 Enhanced Extra Trees Classifier for Epileptic Seizure Prediction

Authors: Maurice Ntahobari, Levin Kuhlmann, Mario Boley, Zhinoos Razavi Hesabi

Abstract:

For machine learning based epileptic seizure prediction, it is important for the model to be implemented in small implantable or wearable devices that can be used to monitor epilepsy patients; however, current state-of-the-art methods are complex and computationally intensive. We use Shapley Additive Explanation (SHAP) to find relevant intracranial electroencephalogram (iEEG) features and improve the computational efficiency of a state-of-the-art seizure prediction method based on the extra trees classifier while maintaining prediction performance. Results for a small contest dataset and a much larger dataset with continuous recordings of up to 3 years per patient from 15 patients yield better than chance prediction performance (p < 0.004). Moreover, while the performance of the SHAP-based model is comparable to that of the benchmark, the overall training and prediction time of the model has been reduced by a factor of 1.83. It can also be noted that the feature called zero crossing value is the best EEG feature for seizure prediction. These results suggest state-of-the-art seizure prediction performance can be achieved using efficient methods based on optimal feature selection.

Keywords: machine learning, seizure prediction, extra tree classifier, SHAP, epilepsy

Procedia PDF Downloads 106

7 Feature Analysis of Predictive Maintenance Models

Authors: Zhaoan Wang

Abstract:

Research in predictive maintenance modeling has improved in the recent years to predict failures and needed maintenance with high accuracy, saving cost and improving manufacturing efficiency. However, classic prediction models provide little valuable insight towards the most important features contributing to the failure. By analyzing and quantifying feature importance in predictive maintenance models, cost saving can be optimized based on business goals. First, multiple classifiers are evaluated with cross-validation to predict the multi-class of failures. Second, predictive performance with features provided by different feature selection algorithms are further analyzed. Third, features selected by different algorithms are ranked and combined based on their predictive power. Finally, linear explainer SHAP (SHapley Additive exPlanations) is applied to interpret classifier behavior and provide further insight towards the specific roles of features in both local predictions and global model behavior. The results of the experiments suggest that certain features play dominant roles in predictive models while others have significantly less impact on the overall performance. Moreover, for multi-class prediction of machine failures, the most important features vary with type of machine failures. The results may lead to improved productivity and cost saving by prioritizing sensor deployment, data collection, and data processing of more important features over less importance features.

Keywords: automated supply chain, intelligent manufacturing, predictive maintenance machine learning, feature engineering, model interpretation

Procedia PDF Downloads 125

6 Understanding Tourism Innovation through Fuzzy Measures

Authors: Marcella De Filippo, Delio Colangelo, Luca Farnia

Abstract:

In recent decades, the hyper-competition of tourism scenario has implicated the maturity of many businesses, attributing a central role to innovative processes and their dissemination in the economy of company management. At the same time, it has defined the need for monitoring the application of innovations, in order to govern and improve the performance of companies and destinations. The study aims to analyze and define the innovation in the tourism sector. The research actions have concerned, on the one hand, some in-depth interviews with experts, identifying innovation in terms of process and product, digitalization, sustainability policies and, on the other hand, to evaluate the interaction between these factors, in terms of substitutability and complementarity in management scenarios, in order to identify which one is essential to be competitive in the global scenario. Fuzzy measures and Choquet integral were used to elicit Experts’ preferences. This method allows not only to evaluate the relative importance of each pillar, but also and more interestingly, the level of interaction, ranging from complementarity to substitutability, between pairs of factors. The results of the survey are the following: in terms of Shapley values, Experts assert that Innovation is the most important factor (32.32), followed by digitalization (31.86), Network (20.57) and Sustainability (15.25). In terms of Interaction indices, given the low degree of consensus among experts, the interaction between couples of criteria on average could be ignored; however, it is worth to note that the factors innovations and digitalization are those in which experts express the highest degree of interaction. However for some of them, these factors have a moderate level of complementarity (with a pick of 57.14), and others consider them moderately substitutes (with a pick of -39.58). Another example, although outlier is the interaction between network and digitalization, in which an expert consider them markedly substitutes (-77.08).

Keywords: innovation, business model, tourism, fuzzy

Procedia PDF Downloads 261

5 Efficiency and Equity in Italian Secondary School

Authors: Giorgia Zotti

Abstract:

This research comprehensively investigates the multifaceted interplay determining school performance, individual backgrounds, and regional disparities within the landscape of Italian secondary education. Leveraging data gleaned from the INVALSI 2021-2022 database, the analysis meticulously scrutinizes two fundamental distributions of educational achievements: the standardized Invalsi test scores and official grades in Italian and Mathematics, focusing specifically on final-year secondary school students in Italy. Applying a comprehensive methodology, the study initially employs Data Envelopment Analysis (DEA) to assess school performances. This methodology involves constructing a production function encompassing inputs (hours spent at school) and outputs (Invalsi scores in Italian and Mathematics, along with official grades in Italian and Math). The DEA approach is applied in both of its versions: traditional and conditional. The latter incorporates environmental variables such as school type, size, demographics, technological resources, and socio-economic indicators. Additionally, the analysis delves into regional disparities by leveraging the Theil Index, providing insights into disparities within and between regions. Moreover, in the frame of the inequality of opportunity theory, the study quantifies the inequality of opportunity in students' educational achievements. The methodology applied is the Parametric Approach in the ex-ante version, considering diverse circumstances like parental education and occupation, gender, school region, birthplace, and language spoken at home. Consequently, a Shapley decomposition is applied to understand how much each circumstance affects the outcomes. The outcomes of this comprehensive investigation unveil pivotal determinants of school performance, notably highlighting the influence of school type (Liceo) and socioeconomic status. The research unveils regional disparities, elucidating instances where specific schools outperform others in official grades compared to Invalsi scores, shedding light on the intricate nature of regional educational inequalities. Furthermore, it emphasizes a heightened inequality of opportunity within the distribution of Invalsi test scores in contrast to official grades, underscoring pronounced disparities at the student level. This analysis provides insights for policymakers, educators, and stakeholders, fostering a nuanced understanding of the complexities within Italian secondary education.

Keywords: inequality, education, efficiency, DEA approach

Procedia PDF Downloads 71

4 Machine Learning Framework: Competitive Intelligence and Key Drivers Identification of Market Share Trends among Healthcare Facilities

Authors: Anudeep Appe, Bhanu Poluparthi, Lakshmi Kasivajjula, Udai Mv, Sobha Bagadi, Punya Modi, Aditya Singh, Hemanth Gunupudi, Spenser Troiano, Jeff Paul, Justin Stovall, Justin Yamamoto

Abstract:

The necessity of data-driven decisions in healthcare strategy formulation is rapidly increasing. A reliable framework which helps identify factors impacting a healthcare provider facility or a hospital (from here on termed as facility) market share is of key importance. This pilot study aims at developing a data-driven machine learning-regression framework which aids strategists in formulating key decisions to improve the facility’s market share which in turn impacts in improving the quality of healthcare services. The US (United States) healthcare business is chosen for the study, and the data spanning 60 key facilities in Washington State and about 3 years of historical data is considered. In the current analysis, market share is termed as the ratio of the facility’s encounters to the total encounters among the group of potential competitor facilities. The current study proposes a two-pronged approach of competitor identification and regression approach to evaluate and predict market share, respectively. Leveraged model agnostic technique, SHAP, to quantify the relative importance of features impacting the market share. Typical techniques in literature to quantify the degree of competitiveness among facilities use an empirical method to calculate a competitive factor to interpret the severity of competition. The proposed method identifies a pool of competitors, develops Directed Acyclic Graphs (DAGs) and feature level word vectors, and evaluates the key connected components at the facility level. This technique is robust since its data-driven, which minimizes the bias from empirical techniques. The DAGs factor in partial correlations at various segregations and key demographics of facilities along with a placeholder to factor in various business rules (for ex. quantifying the patient exchanges, provider references, and sister facilities). Identified are the multiple groups of competitors among facilities. Leveraging the competitors' identified developed and fine-tuned Random Forest Regression model to predict the market share. To identify key drivers of market share at an overall level, permutation feature importance of the attributes was calculated. For relative quantification of features at a facility level, incorporated SHAP (SHapley Additive exPlanations), a model agnostic explainer. This helped to identify and rank the attributes at each facility which impacts the market share. This approach proposes an amalgamation of the two popular and efficient modeling practices, viz., machine learning with graphs and tree-based regression techniques to reduce the bias. With these, we helped to drive strategic business decisions.

Keywords: competition, DAGs, facility, healthcare, machine learning, market share, random forest, SHAP

Procedia PDF Downloads 86

3 Early Impact Prediction and Key Factors Study of Artificial Intelligence Patents: A Method Based on LightGBM and Interpretable Machine Learning

Authors: Xingyu Gao, Qiang Wu

Abstract:

Patents play a crucial role in protecting innovation and intellectual property. Early prediction of the impact of artificial intelligence (AI) patents helps researchers and companies allocate resources and make better decisions. Understanding the key factors that influence patent impact can assist researchers in gaining a better understanding of the evolution of AI technology and innovation trends. Therefore, identifying highly impactful patents early and providing support for them holds immeasurable value in accelerating technological progress, reducing research and development costs, and mitigating market positioning risks. Despite the extensive research on AI patents, accurately predicting their early impact remains a challenge. Traditional methods often consider only single factors or simple combinations, failing to comprehensively and accurately reflect the actual impact of patents. This paper utilized the artificial intelligence patent database from the United States Patent and Trademark Office and the Len.org patent retrieval platform to obtain specific information on 35,708 AI patents. Using six machine learning models, namely Multiple Linear Regression, Random Forest Regression, XGBoost Regression, LightGBM Regression, Support Vector Machine Regression, and K-Nearest Neighbors Regression, and using early indicators of patents as features, the paper comprehensively predicted the impact of patents from three aspects: technical, social, and economic. These aspects include the technical leadership of patents, the number of citations they receive, and their shared value. The SHAP (Shapley Additive exPlanations) metric was used to explain the predictions of the best model, quantifying the contribution of each feature to the model's predictions. The experimental results on the AI patent dataset indicate that, for all three target variables, LightGBM regression shows the best predictive performance. Specifically, patent novelty has the greatest impact on predicting the technical impact of patents and has a positive effect. Additionally, the number of owners, the number of backward citations, and the number of independent claims are all crucial and have a positive influence on predicting technical impact. In predicting the social impact of patents, the number of applicants is considered the most critical input variable, but it has a negative impact on social impact. At the same time, the number of independent claims, the number of owners, and the number of backward citations are also important predictive factors, and they have a positive effect on social impact. For predicting the economic impact of patents, the number of independent claims is considered the most important factor and has a positive impact on economic impact. The number of owners, the number of sibling countries or regions, and the size of the extended patent family also have a positive influence on economic impact. The study primarily relies on data from the United States Patent and Trademark Office for artificial intelligence patents. Future research could consider more comprehensive data sources, including artificial intelligence patent data, from a global perspective. While the study takes into account various factors, there may still be other important features not considered. In the future, factors such as patent implementation and market applications may be considered as they could have an impact on the influence of patents.

Keywords: patent influence, interpretable machine learning, predictive models, SHAP

Procedia PDF Downloads 39

2 The Proposal for a Framework to Face Opacity and Discrimination ‘Sins’ Caused by Consumer Creditworthiness Machines in the EU

Authors: Diogo José Morgado Rebelo, Francisco António Carneiro Pacheco de Andrade, Paulo Jorge Freitas de Oliveira Novais

Abstract:

Not everything in AI-power consumer credit scoring turns out to be a wonder. When using AI in Creditworthiness Assessment (CWA), opacity and unfairness ‘sins’ must be considered to the task be deemed Responsible. AI software is not always 100% accurate, which can lead to misclassification. Discrimination of some groups can be exponentiated. A hetero personalized identity can be imposed on the individual(s) affected. Also, autonomous CWA sometimes lacks transparency when using black box models. However, for this intended purpose, human analysts ‘on-the-loop’ might not be the best remedy consumers are looking for in credit. This study seeks to explore the legality of implementing a Multi-Agent System (MAS) framework in consumer CWA to ensure compliance with the regulation outlined in Article 14(4) of the Proposal for an Artificial Intelligence Act (AIA), dated 21 April 2021 (as per the last corrigendum by the European Parliament on 19 April 2024), Especially with the adoption of Art. 18(8)(9) of the EU Directive 2023/2225, of 18 October, which will go into effect on 20 November 2026, there should be more emphasis on the need for hybrid oversight in AI-driven scoring to ensure fairness and transparency. In fact, the range of EU regulations on AI-based consumer credit will soon impact the AI lending industry locally and globally, as shown by the broad territorial scope of AIA’s Art. 2. Consequently, engineering the law of consumer’s CWA is imperative. Generally, the proposed MAS framework consists of several layers arranged in a specific sequence, as follows: firstly, the Data Layer gathers legitimate predictor sets from traditional sources; then, the Decision Support System Layer, whose Neural Network model is trained using k-fold Cross Validation, provides recommendations based on the feeder data; the eXplainability (XAI) multi-structure comprises Three-Step-Agents; and, lastly, the Oversight Layer has a 'Bottom Stop' for analysts to intervene in a timely manner. From the analysis, one can assure a vital component of this software is the XAY layer. It appears as a transparent curtain covering the AI’s decision-making process, enabling comprehension, reflection, and further feasible oversight. Local Interpretable Model-agnostic Explanations (LIME) might act as a pillar by offering counterfactual insights. SHapley Additive exPlanation (SHAP), another agent in the XAI layer, could address potential discrimination issues, identifying the contribution of each feature to the prediction. Alternatively, for thin or no file consumers, the Suggestion Agent can promote financial inclusion. It uses lawful alternative sources such as the share of wallet, among others, to search for more advantageous solutions to incomplete evaluation appraisals based on genetic programming. Overall, this research aspires to bring the concept of Machine-Centered Anthropocentrism to the table of EU policymaking. It acknowledges that, when put into service, credit analysts no longer exert full control over the data-driven entities programmers have given ‘birth’ to. With similar explanatory agents under supervision, AI itself can become self-accountable, prioritizing human concerns and values. AI decisions should not be vilified inherently. The issue lies in how they are integrated into decision-making and whether they align with non-discrimination principles and transparency rules.

Keywords: creditworthiness assessment, hybrid oversight, machine-centered anthropocentrism, EU policymaking

Procedia PDF Downloads 30

1 Towards Dynamic Estimation of Residential Building Energy Consumption in Germany: Leveraging Machine Learning and Public Data from England and Wales

Authors: Philipp Sommer, Amgad Agoub

Abstract:

The construction sector significantly impacts global CO₂ emissions, particularly through the energy usage of residential buildings. To address this, various governments, including Germany's, are focusing on reducing emissions via sustainable refurbishment initiatives. This study examines the application of machine learning (ML) to estimate energy demands dynamically in residential buildings and enhance the potential for large-scale sustainable refurbishment. A major challenge in Germany is the lack of extensive publicly labeled datasets for energy performance, as energy performance certificates, which provide critical data on building-specific energy requirements and consumption, are not available for all buildings or require on-site inspections. Conversely, England and other countries in the European Union (EU) have rich public datasets, providing a viable alternative for analysis. This research adapts insights from these English datasets to the German context by developing a comprehensive data schema and calibration dataset capable of predicting building energy demand effectively. The study proposes a minimal feature set, determined through feature importance analysis, to optimize the ML model. Findings indicate that ML significantly improves the scalability and accuracy of energy demand forecasts, supporting more effective emissions reduction strategies in the construction industry. Integrating energy performance certificates into municipal heat planning in Germany highlights the transformative impact of data-driven approaches on environmental sustainability. The goal is to identify and utilize key features from open data sources that significantly influence energy demand, creating an efficient forecasting model. Using Extreme Gradient Boosting (XGB) and data from energy performance certificates, effective features such as building type, year of construction, living space, insulation level, and building materials were incorporated. These were supplemented by data derived from descriptions of roofs, walls, windows, and floors, integrated into three datasets. The emphasis was on features accessible via remote sensing, which, along with other correlated characteristics, greatly improved the model's accuracy. The model was further validated using SHapley Additive exPlanations (SHAP) values and aggregated feature importance, which quantified the effects of individual features on the predictions. The refined model using remote sensing data showed a coefficient of determination (R²) of 0.64 and a mean absolute error (MAE) of 4.12, indicating predictions based on efficiency class 1-100 (G-A) may deviate by 4.12 points. This R² increased to 0.84 with the inclusion of more samples, with wall type emerging as the most predictive feature. After optimizing and incorporating related features like estimated primary energy consumption, the R² score for the training and test set reached 0.94, demonstrating good generalization. The study concludes that ML models significantly improve prediction accuracy over traditional methods, illustrating the potential of ML in enhancing energy efficiency analysis and planning. This supports better decision-making for energy optimization and highlights the benefits of developing and refining data schemas using open data to bolster sustainability in the building sector. The study underscores the importance of supporting open data initiatives to collect similar features and support the creation of comparable models in Germany, enhancing the outlook for environmental sustainability.

Keywords: machine learning, remote sensing, residential building, energy performance certificates, data-driven, heat planning

Procedia PDF Downloads 49