Search results for: predictive data mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 25287

Search results for: predictive data mining

24387 A New DIDS Design Based on a Combination Feature Selection Approach

Authors: Adel Sabry Eesa, Adnan Mohsin Abdulazeez Brifcani, Zeynep Orman

Abstract:

Feature selection has been used in many fields such as classification, data mining and object recognition and proven to be effective for removing irrelevant and redundant features from the original data set. In this paper, a new design of distributed intrusion detection system using a combination feature selection model based on bees and decision tree. Bees algorithm is used as the search strategy to find the optimal subset of features, whereas decision tree is used as a judgment for the selected features. Both the produced features and the generated rules are used by Decision Making Mobile Agent to decide whether there is an attack or not in the networks. Decision Making Mobile Agent will migrate through the networks, moving from node to another, if it found that there is an attack on one of the nodes, it then alerts the user through User Interface Agent or takes some action through Action Mobile Agent. The KDD Cup 99 data set is used to test the effectiveness of the proposed system. The results show that even if only four features are used, the proposed system gives a better performance when it is compared with the obtained results using all 41 features.

Keywords: distributed intrusion detection system, mobile agent, feature selection, bees algorithm, decision tree

Procedia PDF Downloads 388
24386 Charting Sentiments with Naive Bayes and Logistic Regression

Authors: Jummalla Aashrith, N. L. Shiva Sai, K. Bhavya Sri

Abstract:

The swift progress of web technology has not only amassed a vast reservoir of internet data but also triggered a substantial surge in data generation. The internet has metamorphosed into one of the dynamic hubs for online education, idea dissemination, as well as opinion-sharing. Notably, the widely utilized social networking platform Twitter is experiencing considerable expansion, providing users with the ability to share viewpoints, participate in discussions spanning diverse communities, and broadcast messages on a global scale. The upswing in online engagement has sparked a significant curiosity in subjective analysis, particularly when it comes to Twitter data. This research is committed to delving into sentiment analysis, focusing specifically on the realm of Twitter. It aims to offer valuable insights into deciphering information within tweets, where opinions manifest in a highly unstructured and diverse manner, spanning a spectrum from positivity to negativity, occasionally punctuated by neutrality expressions. Within this document, we offer a comprehensive exploration and comparative assessment of modern approaches to opinion mining. Employing a range of machine learning algorithms such as Naive Bayes and Logistic Regression, our investigation plunges into the domain of Twitter data streams. We delve into overarching challenges and applications inherent in the realm of subjectivity analysis over Twitter.

Keywords: machine learning, sentiment analysis, visualisation, python

Procedia PDF Downloads 37
24385 Monitoring Large-Coverage Forest Canopy Height by Integrating LiDAR and Sentinel-2 Images

Authors: Xiaobo Liu, Rakesh Mishra, Yun Zhang

Abstract:

Continuous monitoring of forest canopy height with large coverage is essential for obtaining forest carbon stocks and emissions, quantifying biomass estimation, analyzing vegetation coverage, and determining biodiversity. LiDAR can be used to collect accurate woody vegetation structure such as canopy height. However, LiDAR’s coverage is usually limited because of its high cost and limited maneuverability, which constrains its use for dynamic and large area forest canopy monitoring. On the other hand, optical satellite images, like Sentinel-2, have the ability to cover large forest areas with a high repeat rate, but they do not have height information. Hence, exploring the solution of integrating LiDAR data and Sentinel-2 images to enlarge the coverage of forest canopy height prediction and increase the prediction repeat rate has been an active research topic in the environmental remote sensing community. In this study, we explore the potential of training a Random Forest Regression (RFR) model and a Convolutional Neural Network (CNN) model, respectively, to develop two predictive models for predicting and validating the forest canopy height of the Acadia Forest in New Brunswick, Canada, with a 10m ground sampling distance (GSD), for the year 2018 and 2021. Two 10m airborne LiDAR-derived canopy height models, one for 2018 and one for 2021, are used as ground truth to train and validate the RFR and CNN predictive models. To evaluate the prediction performance of the trained RFR and CNN models, two new predicted canopy height maps (CHMs), one for 2018 and one for 2021, are generated using the trained RFR and CNN models and 10m Sentinel-2 images of 2018 and 2021, respectively. The two 10m predicted CHMs from Sentinel-2 images are then compared with the two 10m airborne LiDAR-derived canopy height models for accuracy assessment. The validation results show that the mean absolute error (MAE) for year 2018 of the RFR model is 2.93m, CNN model is 1.71m; while the MAE for year 2021 of the RFR model is 3.35m, and the CNN model is 3.78m. These demonstrate the feasibility of using the RFR and CNN models developed in this research for predicting large-coverage forest canopy height at 10m spatial resolution and a high revisit rate.

Keywords: remote sensing, forest canopy height, LiDAR, Sentinel-2, artificial intelligence, random forest regression, convolutional neural network

Procedia PDF Downloads 68
24384 Measurement and Prediction of Speed of Sound in Petroleum Fluids

Authors: S. Ghafoori, A. Al-Harbi, B. Al-Ajmi, A. Al-Shaalan, A. Al-Ajmi, M. Ali Juma

Abstract:

Seismic methods play an important role in the exploration for hydrocarbon reservoirs. However, the success of the method depends strongly on the reliability of the measured or predicted information regarding the velocity of sound in the media. Speed of sound has been used to study the thermodynamic properties of fluids. In this study, experimental data are reported and analyzed on the speed of sound in toluene and octane binary mixture. Three-factor three-level Box-Benhkam design is used to determine the significance of each factor, the synergetic effects of the factors, and the most significant factors on speed of sound. The developed mathematical model and statistical analysis provided a critical analysis of the simultaneous interactive effects of the independent variables indicating that the developed quadratic models were highly accurate and predictive.

Keywords: experimental design, octane, speed of sound, toluene

Procedia PDF Downloads 255
24383 Analysis of Big Data

Authors: Sandeep Sharma, Sarabjit Singh

Abstract:

As per the user demand and growth trends of large free data the storage solutions are now becoming more challenge-able to protect, store and to retrieve data. The days are not so far when the storage companies and organizations are start saying 'no' to store our valuable data or they will start charging a huge amount for its storage and protection. On the other hand as per the environmental conditions it becomes challenge-able to maintain and establish new data warehouses and data centers to protect global warming threats. A challenge of small data is over now, the challenges are big that how to manage the exponential growth of data. In this paper we have analyzed the growth trend of big data and its future implications. We have also focused on the impact of the unstructured data on various concerns and we have also suggested some possible remedies to streamline big data.

Keywords: big data, unstructured data, volume, variety, velocity

Procedia PDF Downloads 527
24382 A Study of the Performance Parameter for Recommendation Algorithm Evaluation

Authors: C. Rana, S. K. Jain

Abstract:

The enormous amount of Web data has challenged its usage in efficient manner in the past few years. As such, a range of techniques are applied to tackle this problem; prominent among them is personalization and recommender system. In fact, these are the tools that assist user in finding relevant information of web. Most of the e-commerce websites are applying such tools in one way or the other. In the past decade, a large number of recommendation algorithms have been proposed to tackle such problems. However, there have not been much research in the evaluation criteria for these algorithms. As such, the traditional accuracy and classification metrics are still used for the evaluation purpose that provides a static view. This paper studies how the evolution of user preference over a period of time can be mapped in a recommender system using a new evaluation methodology that explicitly using time dimension. We have also presented different types of experimental set up that are generally used for recommender system evaluation. Furthermore, an overview of major accuracy metrics and metrics that go beyond the scope of accuracy as researched in the past few years is also discussed in detail.

Keywords: collaborative filtering, data mining, evolutionary, clustering, algorithm, recommender systems

Procedia PDF Downloads 394
24381 Quantitative Structure-Activity Relationship Study of Some Quinoline Derivatives as Antimalarial Agents

Authors: M. Ouassaf, S. Belaid

Abstract:

A series of quinoline derivatives with antimalarial activity were subjected to two-dimensional quantitative structure-activity relationship (2D-QSAR) studies. Three models were implemented using multiple regression linear MLR, a regression partial least squares (PLS), nonlinear regression (MNLR), to see which descriptors are closely related to the activity biologic. We relied on a principal component analysis (PCA). Based on our results, a comparison of the quality of, MLR, PLS, and MNLR models shows that the MNLR (R = 0.914 and R² = 0.835, RCV= 0.853) models have substantially better predictive capability because the MNLR approach gives better results than MLR (R = 0.835 and R² = 0,752, RCV=0.601)), PLS (R = 0.742 and R² = 0.552, RCV=0.550) The model of MNLR gave statistically significant results and showed good stability to data variation in leave-one-out cross-validation. The obtained results suggested that our proposed model MNLR may be useful to predict the biological activity of derivatives of quinoline.

Keywords: antimalarial, quinoline, QSAR, PCA, MLR , MNLR, MLR

Procedia PDF Downloads 138
24380 Genome-Wide Mining of Potential Guide RNAs for Streptococcus pyogenes and Neisseria meningitides CRISPR-Cas Systems for Genome Engineering

Authors: Farahnaz Sadat Golestan Hashemi, Mohd Razi Ismail, Mohd Y. Rafii

Abstract:

Clustered regularly interspaced short palindromic repeats (CRISPR) and CRISPR-associated protein (Cas) system can facilitate targeted genome editing in organisms. Dual or single guide RNA (gRNA) can program the Cas9 nuclease to cut target DNA in particular areas; thus, introducing concise mutations either via error-prone non-homologous end-joining repairing or via incorporating foreign DNAs by homologous recombination between donor DNA and target area. In spite of high demand of such promising technology, developing a well-organized procedure in order for reliable mining of potential target sites for gRNAs in large genomic data is still challenging. Hence, we aimed to perform high-throughput detection of target sites by specific PAMs for not only common Streptococcus pyogenes (SpCas9) but also for Neisseria meningitides (NmCas9) CRISPR-Cas systems. Previous research confirmed the successful application of such RNA-guided Cas9 orthologs for effective gene targeting and subsequently genome manipulation. However, Cas9 orthologs need their particular PAM sequence for DNA cleavage activity. Activity levels are based on the sequence of the protospacer and specific combinations of favorable PAM bases. Therefore, based on the specific length and sequence of PAM followed by a constant length of the target site for the two orthogonals of Cas9 protein, we created a reliable procedure to explore possible gRNA sequences. To mine CRISPR target sites, four different searching modes of sgRNA binding to target DNA strand were applied. These searching modes are as follows i) coding strand searching, ii) anti-coding strand searching, iii) both strand searching, and iv) paired-gRNA searching. Finally, a complete list of all potential gRNAs along with their locations, strands, and PAMs sequence orientation can be provided for both SpCas9 as well as another potential Cas9 ortholog (NmCas9). The artificial design of potential gRNAs in a genome of interest can accelerate functional genomic studies. Consequently, the application of such novel genome editing tool (CRISPR/Cas technology) will enhance by presenting increased versatility and efficiency.

Keywords: CRISPR/Cas9 genome editing, gRNA mining, SpCas9, NmCas9

Procedia PDF Downloads 240
24379 Knowledge, Attitude and Practice of Anemia among Females Attending Bolan Medical Complex Quetta, Balochistan

Authors: A. Abdullah, N. ul Haq, A. Nasim

Abstract:

Objectives: This study was aimed to assess the knowledge, attitude, and practice of anemia among females attending Bolan Medical Complex Quetta, Balochistan. Methods: A quantitative cross-sectional study by adopting a questionnaire containing 3 dimensions knowledge (15 questions), Attitude (5 questions), and Practice (4 questions) for the assessment of knowledge, attitude and practice of anemia among females was conducted. All females attending Bolan Medical Complex Quetta, Balochistan were approached for the study. Descriptive statistics were used to describe demographic and KAP related characteristics of the females regarding anemia.All data were analyzed by using SPSS (Statistical Package of Social Sciences) software program version 20.0. Results: Data was collected from six hundred and thirteen (613) participants. Majority of the respondents (n=180, 29.4%) were categorized in the age group of 29-33 years. Participants had knowledge regarding anemia was (n= 564, 91.9%), and attitude was (n= 516, 84.0%) whereas practice was (n=437, 71.3%). Multitative analysis revealed the negative correlation between Attitude-practice (P= -0.040) and a significant figure (0.001) was present between knowledge-attitude. Occupation and reason of diagnosis were not predictive of better KAP. Conclusions: Knowledge, attitude, and practice of Anemia shows a satisfactory response in this study. Furthermore, study finding implicates the need for health promotion among females. Improving nutritional knowledge and information related Anemia can result in better control and management.

Keywords: anemia, knowledge attitude and practice, females, college

Procedia PDF Downloads 179
24378 Biofilm Text Classifiers Developed Using Natural Language Processing and Unsupervised Learning Approach

Authors: Kanika Gupta, Ashok Kumar

Abstract:

Biofilms are dense, highly hydrated cell clusters that are irreversibly attached to a substratum, to an interface or to each other, and are embedded in a self-produced gelatinous matrix composed of extracellular polymeric substances. Research in biofilm field has become very significant, as biofilm has shown high mechanical resilience and resistance to antibiotic treatment and constituted as a significant problem in both healthcare and other industry related to microorganisms. The massive information both stated and hidden in the biofilm literature are growing exponentially therefore it is not possible for researchers and practitioners to automatically extract and relate information from different written resources. So, the current work proposes and discusses the use of text mining techniques for the extraction of information from biofilm literature corpora containing 34306 documents. It is very difficult and expensive to obtain annotated material for biomedical literature as the literature is unstructured i.e. free-text. Therefore, we considered unsupervised approach, where no annotated training is necessary and using this approach we developed a system that will classify the text on the basis of growth and development, drug effects, radiation effects, classification and physiology of biofilms. For this, a two-step structure was used where the first step is to extract keywords from the biofilm literature using a metathesaurus and standard natural language processing tools like Rapid Miner_v5.3 and the second step is to discover relations between the genes extracted from the whole set of biofilm literature using pubmed.mineR_v1.0.11. We used unsupervised approach, which is the machine learning task of inferring a function to describe hidden structure from 'unlabeled' data, in the above-extracted datasets to develop classifiers using WinPython-64 bit_v3.5.4.0Qt5 and R studio_v0.99.467 packages which will automatically classify the text by using the mentioned sets. The developed classifiers were tested on a large data set of biofilm literature which showed that the unsupervised approach proposed is promising as well as suited for a semi-automatic labeling of the extracted relations. The entire information was stored in the relational database which was hosted locally on the server. The generated biofilm vocabulary and genes relations will be significant for researchers dealing with biofilm research, making their search easy and efficient as the keywords and genes could be directly mapped with the documents used for database development.

Keywords: biofilms literature, classifiers development, text mining, unsupervised learning approach, unstructured data, relational database

Procedia PDF Downloads 151
24377 A Hierarchical Method for Multi-Class Probabilistic Classification Vector Machines

Authors: P. Byrnes, F. A. DiazDelaO

Abstract:

The Support Vector Machine (SVM) has become widely recognised as one of the leading algorithms in machine learning for both regression and binary classification. It expresses predictions in terms of a linear combination of kernel functions, referred to as support vectors. Despite its popularity amongst practitioners, SVM has some limitations, with the most significant being the generation of point prediction as opposed to predictive distributions. Stemming from this issue, a probabilistic model namely, Probabilistic Classification Vector Machines (PCVM), has been proposed which respects the original functional form of SVM whilst also providing a predictive distribution. As physical system designs become more complex, an increasing number of classification tasks involving industrial applications consist of more than two classes. Consequently, this research proposes a framework which allows for the extension of PCVM to a multi class setting. Additionally, the original PCVM framework relies on the use of type II maximum likelihood to provide estimates for both the kernel hyperparameters and model evidence. In a high dimensional multi class setting, however, this approach has been shown to be ineffective due to bad scaling as the number of classes increases. Accordingly, we propose the application of Markov Chain Monte Carlo (MCMC) based methods to provide a posterior distribution over both parameters and hyperparameters. The proposed framework will be validated against current multi class classifiers through synthetic and real life implementations.

Keywords: probabilistic classification vector machines, multi class classification, MCMC, support vector machines

Procedia PDF Downloads 213
24376 The Problem of the Use of Learning Analytics in Distance Higher Education: An Analytical Study of the Open and Distance University System in Mexico

Authors: Ismene Ithai Bras-Ruiz

Abstract:

Learning Analytics (LA) is employed by universities not only as a tool but as a specialized ground to enhance students and professors. However, not all the academic programs apply LA with the same goal and use the same tools. In fact, LA is formed by five main fields of study (academic analytics, action research, educational data mining, recommender systems, and personalized systems). These fields can help not just to inform academic authorities about the situation of the program, but also can detect risk students, professors with needs, or general problems. The highest level applies Artificial Intelligence techniques to support learning practices. LA has adopted different techniques: statistics, ethnography, data visualization, machine learning, natural language process, and data mining. Is expected that any academic program decided what field wants to utilize on the basis of his academic interest but also his capacities related to professors, administrators, systems, logistics, data analyst, and the academic goals. The Open and Distance University System (SUAYED in Spanish) of the University National Autonomous of Mexico (UNAM), has been working for forty years as an alternative to traditional programs; one of their main supports has been the employ of new information and communications technologies (ICT). Today, UNAM has one of the largest network higher education programs, twenty-six academic programs in different faculties. This situation means that every faculty works with heterogeneous populations and academic problems. In this sense, every program has developed its own Learning Analytic techniques to improve academic issues. In this context, an investigation was carried out to know the situation of the application of LA in all the academic programs in the different faculties. The premise of the study it was that not all the faculties have utilized advanced LA techniques and it is probable that they do not know what field of study is closer to their program goals. In consequence, not all the programs know about LA but, this does not mean they do not work with LA in a veiled or, less clear sense. It is very important to know the grade of knowledge about LA for two reasons: 1) This allows to appreciate the work of the administration to improve the quality of the teaching and, 2) if it is possible to improve others LA techniques. For this purpose, it was designed three instruments to determinate the experience and knowledge in LA. These were applied to ten faculty coordinators and his personnel; thirty members were consulted (academic secretary, systems manager, or data analyst, and coordinator of the program). The final report allowed to understand that almost all the programs work with basic statistics tools and techniques, this helps the administration only to know what is happening inside de academic program, but they are not ready to move up to the next level, this means applying Artificial Intelligence or Recommender Systems to reach a personalized learning system. This situation is not related to the knowledge of LA, but the clarity of the long-term goals.

Keywords: academic improvements, analytical techniques, learning analytics, personnel expertise

Procedia PDF Downloads 113
24375 Character Development Outcomes: A Predictive Model for Behaviour Analysis in Tertiary Institutions

Authors: Rhoda N. Kayongo

Abstract:

As behavior analysts in education continue to debate on how higher institutions can continue to benefit from their social and academic related programs, higher education is facing challenges in the area of character development. This is manifested in the percentages of college completion rates, teen pregnancies, drug abuse, sexual abuse, suicide, plagiarism, lack of academic integrity, and violence among their students. Attending college is a perceived opportunity to positively influence the actions and behaviors of the next generation of society; thus colleges and universities have to provide opportunities to develop students’ values and behaviors. Prior studies were mainly conducted in private institutions and more so in developed countries. However, with the complexity of the nature of student body currently due to the changing world, a multidimensional approach combining multiple factors that enhance character development outcomes is needed to suit the changing trends. The main purpose of this study was to identify opportunities in colleges and develop a model for predicting character development outcomes. A survey questionnaire composed of 7 scales including in-classroom interaction, out-of-classroom interaction, school climate, personal lifestyle, home environment, and peer influence as independent variables and character development outcomes as the dependent variable was administered to a total of five hundred and one students of 3rd and 4th year level in selected public colleges and universities in the Philippines and Rwanda. Using structural equation modelling, a predictive model explained 57% of the variance in character development outcomes. Findings from the results of the analysis showed that in-classroom interactions have a substantial direct influence on character development outcomes of the students (r = .75, p < .05). In addition, out-of-classroom interaction, school climate, and home environment contributed to students’ character development outcomes but in an indirect way. The study concluded that in the classroom are many opportunities for teachers to teach, model and integrate character development among their students. Thus, suggestions are made to public colleges and universities to deliberately boost and implement experiences that cultivate character within the classroom. These may contribute tremendously to the students' character development outcomes and hence render effective models of behaviour analysis in higher education.

Keywords: character development, tertiary institutions, predictive model, behavior analysis

Procedia PDF Downloads 120
24374 The Predictive Power of Successful Scientific Theories: An Explanatory Study on Their Substantive Ontologies through Theoretical Change

Authors: Damian Islas

Abstract:

Debates on realism in science concern two different questions: (I) whether the unobservable entities posited by theories can be known; and (II) whether any knowledge we have of them is objective or not. Question (I) arises from the doubt that since observation is the basis of all our factual knowledge, unobservable entities cannot be known. Question (II) arises from the doubt that since scientific representations are inextricably laden with the subjective, idiosyncratic, and a priori features of human cognition and scientific practice, they cannot convey any reliable information on how their objects are in themselves. A way of understanding scientific realism (SR) is through three lines of inquiry: ontological, semantic, and epistemological. Ontologically, scientific realism asserts the existence of a world independent of human mind. Semantically, scientific realism assumes that theoretical claims about reality show truth values and, thus, should be construed literally. Epistemologically, scientific realism believes that theoretical claims offer us knowledge of the world. Nowadays, the literature on scientific realism has proceeded rather far beyond the realism versus antirealism debate. This stance represents a middle-ground position between the two according to which science can attain justified true beliefs concerning relational facts about the unobservable realm but cannot attain justified true beliefs concerning the intrinsic nature of any objects occupying that realm. That is, the structural content of scientific theories about the unobservable can be known, but facts about the intrinsic nature of the entities that figure as place-holders in those structures cannot be known. There are two possible versions of SR: Epistemological Structural Realism (ESR) and Ontic Structural Realism (OSR). On ESR, an agnostic stance is preserved with respect to the natures of unobservable entities, but the possibility of knowing the relations obtaining between those entities is affirmed. OSR includes the rather striking claim that when it comes to the unobservables theorized about within fundamental physics, relations exist, but objects do not. Focusing on ESR, questions arise concerning its ability to explain the empirical success of a theory. Empirical success certainly involves predictive success, and predictive success implies a theory’s power to make accurate predictions. But a theory’s power to make any predictions at all seems to derive precisely from its core axioms or laws concerning unobservable entities and mechanisms, and not simply the sort of structural relations often expressed in equations. The specific challenge to ESR concerns its ability to explain the explanatory and predictive power of successful theories without appealing to their substantive ontologies, which are often not preserved by their successors. The response to this challenge will depend on the various and subtle different versions of ESR and OSR stances, which show a sort of progression through eliminativist OSR to moderate OSR of gradual increase in the ontological status accorded to objects. Knowing the relations between unobserved entities is methodologically identical to assert that these relations between unobserved entities exist.

Keywords: eliminativist ontic structural realism, epistemological structuralism, moderate ontic structural realism, ontic structuralism

Procedia PDF Downloads 103
24373 Risk Based Maintenance Planning for Loading Equipment in Underground Hard Rock Mine: Case Study

Authors: Sidharth Talan, Devendra Kumar Yadav, Yuvraj Singh Rajput, Subhajit Bhattacharjee

Abstract:

Mining industry is known for its appetite to spend sizeable capital on mine equipment. However, in the current scenario, the mining industry is challenged by daunting factors of non-uniform geological conditions, uneven ore grade, uncontrollable and volatile mineral commodity prices and the ever increasing quest to optimize the capital and operational costs. Thus, the role of equipment reliability and maintenance planning inherits a significant role in augmenting the equipment availability for the operation and in turn boosting the mine productivity. This paper presents the Risk Based Maintenance (RBM) planning conducted on mine loading equipment namely Load Haul Dumpers (LHDs) at Vedanta Resources Ltd subsidiary Hindustan Zinc Limited operated Sindesar Khurd Mines, an underground zinc and lead mine situated in Dariba, Rajasthan, India. The mining equipment at the location is maintained by the Original Equipment Manufacturers (OEMs) namely Sandvik and Atlas Copco, who carry out the maintenance and inspection operations for the equipment. Based on the downtime data extracted for the equipment fleet over the period of 6 months spanning from 1st January 2017 until 30th June 2017, it was revealed that significant contribution of three downtime issues related to namely Engine, Hydraulics, and Transmission to be common among all the loading equipment fleet and substantiated by Pareto Analysis. Further scrutiny through Bubble Matrix Analysis of the given factors revealed the major influence of selective factors namely Overheating, No Load Taken (NTL) issues, Gear Changing issues and Hose Puncture and leakage issues. Utilizing the equipment wise analysis of all the downtime factors obtained, spares consumed, and the alarm logs extracted from the machines, technical design changes in the equipment and pre shift critical alarms checklist were proposed for the equipment maintenance. The given analysis is beneficial to allow OEMs or mine management to focus on the critical issues hampering the reliability of mine equipment and design necessary maintenance strategies to mitigate them.

Keywords: bubble matrix analysis, LHDs, OEMs, Pareto chart analysis, spares consumption matrix, critical alarms checklist

Procedia PDF Downloads 133
24372 The Curse of Natural Resources: An Empirical Analysis Applied to the Case of Copper Mining in Zambia

Authors: Chomba Kalunga

Abstract:

Many developing countries have a rich endowment of natural resources. Yet, amidst that wealth, living standards remain poor. At the same time, international markets have been surged with an increase in copper prices in the last twenty years. This is a presentation of the findings on the causal economic impact of Zambia’s copper mines, a country located in sub-Saharan Africa endowed with vast copper deposits on living standards using household data from 1996 to 2010, exploiting an episode where the copper prices on the international market were rising. Using an Instrumental Variable approach and controlling for constituency-level and microeconomic factors, the results show a significant impact of copper production on living standards. After splitting the constituencies close to and far away from the nearest mine, the results document that constituencies close to the mines benefited significantly from the increase in copper production, compared to their counterparts through increased levels of employment. Finally, the results are not consistent with the natural resource curse hypothesis; findings show a positive causal relationship between the presence of natural resources and socioeconomic outcomes in less developed countries, particularly for constituencies close to the mines in Zambia. Some key policy implications follow from the findings. The finding that increased copper production led to an increase in employment suggests that, in Zambias’ context, policies that promote local employment may be more beneficial to residents. Meaning that it is government policies that can help improve the living standards were government needs to work towards making this impact more substantial.

Keywords: copper prices, local development, mining, natural resources

Procedia PDF Downloads 196
24371 Development of a Data-Driven Method for Diagnosing the State of Health of Battery Cells, Based on the Use of an Electrochemical Aging Model, with a View to Their Use in Second Life

Authors: Desplanches Maxime

Abstract:

Accurate estimation of the remaining useful life of lithium-ion batteries for electronic devices is crucial. Data-driven methodologies encounter challenges related to data volume and acquisition protocols, particularly in capturing a comprehensive range of aging indicators. To address these limitations, we propose a hybrid approach that integrates an electrochemical model with state-of-the-art data analysis techniques, yielding a comprehensive database. Our methodology involves infusing an aging phenomenon into a Newman model, leading to the creation of an extensive database capturing various aging states based on non-destructive parameters. This database serves as a robust foundation for subsequent analysis. Leveraging advanced data analysis techniques, notably principal component analysis and t-Distributed Stochastic Neighbor Embedding, we extract pivotal information from the data. This information is harnessed to construct a regression function using either random forest or support vector machine algorithms. The resulting predictor demonstrates a 5% error margin in estimating remaining battery life, providing actionable insights for optimizing usage. Furthermore, the database was built from the Newman model calibrated for aging and performance using data from a European project called Teesmat. The model was then initialized numerous times with different aging values, for instance, with varying thicknesses of SEI (Solid Electrolyte Interphase). This comprehensive approach ensures a thorough exploration of battery aging dynamics, enhancing the accuracy and reliability of our predictive model. Of particular importance is our reliance on the database generated through the integration of the electrochemical model. This database serves as a crucial asset in advancing our understanding of aging states. Beyond its capability for precise remaining life predictions, this database-driven approach offers valuable insights for optimizing battery usage and adapting the predictor to various scenarios. This underscores the practical significance of our method in facilitating better decision-making regarding lithium-ion battery management.

Keywords: Li-ion battery, aging, diagnostics, data analysis, prediction, machine learning, electrochemical model, regression

Procedia PDF Downloads 54
24370 Quantitative Structure Activity Relationship Model for Predicting the Aromatase Inhibition Activity of 1,2,3-Triazole Derivatives

Authors: M. Ouassaf, S. Belaidi

Abstract:

Aromatase is an estrogen biosynthetic enzyme belonging to the cytochrome P450 family, which catalyzes the limiting step in the conversion of androgens to estrogens. As it is relevant for the promotion of tumor cell growth. A set of thirty 1,2,3-triazole derivatives was used in the quantitative structure activity relationship (QSAR) study using regression multiple linear (MLR), We divided the data into two training and testing groups. The results showed a good predictive ability of the MLR model, the models were statistically robust internally (R² = 0.982) and the predictability of the model was tested by several parameters. including external criteria (R²pred = 0.851, CCC = 0.946). The knowledge gained in this study should provide relevant information that contributes to the origins of aromatase inhibitory activity and, therefore, facilitates our ongoing quest for aromatase inhibitors with robust properties.

Keywords: aromatase inhibitors, QSAR, MLR, 1, 2, 3-triazole

Procedia PDF Downloads 99
24369 Cleaning of Scientific References in Large Patent Databases Using Rule-Based Scoring and Clustering

Authors: Emiel Caron

Abstract:

Patent databases contain patent related data, organized in a relational data model, and are used to produce various patent statistics. These databases store raw data about scientific references cited by patents. For example, Patstat holds references to tens of millions of scientific journal publications and conference proceedings. These references might be used to connect patent databases with bibliographic databases, e.g. to study to the relation between science, technology, and innovation in various domains. Problematic in such studies is the low data quality of the references, i.e. they are often ambiguous, unstructured, and incomplete. Moreover, a complete bibliographic reference is stored in only one attribute. Therefore, a computerized cleaning and disambiguation method for large patent databases is developed in this work. The method uses rule-based scoring and clustering. The rules are based on bibliographic metadata, retrieved from the raw data by regular expressions, and are transparent and adaptable. The rules in combination with string similarity measures are used to detect pairs of records that are potential duplicates. Due to the scoring, different rules can be combined, to join scientific references, i.e. the rules reinforce each other. The scores are based on expert knowledge and initial method evaluation. After the scoring, pairs of scientific references that are above a certain threshold, are clustered by means of single-linkage clustering algorithm to form connected components. The method is designed to disambiguate all the scientific references in the Patstat database. The performance evaluation of the clustering method, on a large golden set with highly cited papers, shows on average a 99% precision and a 95% recall. The method is therefore accurate but careful, i.e. it weighs precision over recall. Consequently, separate clusters of high precision are sometimes formed, when there is not enough evidence for connecting scientific references, e.g. in the case of missing year and journal information for a reference. The clusters produced by the method can be used to directly link the Patstat database with bibliographic databases as the Web of Science or Scopus.

Keywords: clustering, data cleaning, data disambiguation, data mining, patent analysis, scientometrics

Procedia PDF Downloads 174
24368 Constructing a Physics Guided Machine Learning Neural Network to Predict Tonal Noise Emitted by a Propeller

Authors: Arthur D. Wiedemann, Christopher Fuller, Kyle A. Pascioni

Abstract:

With the introduction of electric motors, small unmanned aerial vehicle designers have to consider trade-offs between acoustic noise and thrust generated. Currently, there are few low-computational tools available for predicting acoustic noise emitted by a propeller into the far-field. Artificial neural networks offer a highly non-linear and adaptive model for predicting isolated and interactive tonal noise. But neural networks require large data sets, exceeding practical considerations in modeling experimental results. A methodology known as physics guided machine learning has been applied in this study to reduce the required data set to train the network. After building and evaluating several neural networks, the best model is investigated to determine how the network successfully predicts the acoustic waveform. Lastly, a post-network transfer function is developed to remove discontinuity from the predicted waveform. Overall, methodologies from physics guided machine learning show a notable improvement in prediction performance, but additional loss functions are necessary for constructing predictive networks on small datasets.

Keywords: aeroacoustics, machine learning, propeller, rotor, neural network, physics guided machine learning

Procedia PDF Downloads 201
24367 Educase–Intelligent System for Pedagogical Advising Using Case-Based Reasoning

Authors: Elionai Moura, José A. Cunha, César Analide

Abstract:

This work introduces a proposal scheme for an Intelligent System applied to Pedagogical Advising using Case-Based Reasoning, to find consolidated solutions before used for the new problems, making easier the task of advising students to the pedagogical staff. We do intend, through this work, introduce the motivation behind the choices for this system structure, justifying the development of an incremental and smart web system who learns bests solutions for new cases when it’s used, showing technics and technology.

Keywords: case-based reasoning, pedagogical advising, educational data-mining (EDM), machine learning

Procedia PDF Downloads 403
24366 6-Degree-Of-Freedom Spacecraft Motion Planning via Model Predictive Control and Dual Quaternions

Authors: Omer Burak Iskender, Keck Voon Ling, Vincent Dubanchet, Luca Simonini

Abstract:

This paper presents Guidance and Control (G&C) strategy to approach and synchronize with potentially rotating targets. The proposed strategy generates and tracks a safe trajectory for space servicing missions, including tasks like approaching, inspecting, and capturing. The main objective of this paper is to validate the G&C laws using a Hardware-In-the-Loop (HIL) setup with realistic rendezvous and docking equipment. Throughout this work, the assumption of full relative state feedback is relaxed by onboard sensors that bring realistic errors and delays and, while the proposed closed loop approach demonstrates the robustness to the above mentioned challenge. Moreover, G&C blocks are unified via the Model Predictive Control (MPC) paradigm, and the coupling between translational motion and rotational motion is addressed via dual quaternion based kinematic description. In this work, G&C is formulated as a convex optimization problem where constraints such as thruster limits and the output constraints are explicitly handled. Furthermore, the Monte-Carlo method is used to evaluate the robustness of the proposed method to the initial condition errors, the uncertainty of the target's motion and attitude, and actuator errors. A capture scenario is tested with the robotic test bench that has onboard sensors which estimate the position and orientation of a drifting satellite through camera imagery. Finally, the approach is compared with currently used robust H-infinity controllers and guidance profile provided by the industrial partner. The HIL experiments demonstrate that the proposed strategy is a potential candidate for future space servicing missions because 1) the algorithm is real-time implementable as convex programming offers deterministic convergence properties and guarantee finite time solution, 2) critical physical and output constraints are respected, 3) robustness to sensor errors and uncertainties in the system is proven, 4) couples translational motion with rotational motion.

Keywords: dual quaternion, model predictive control, real-time experimental test, rendezvous and docking, spacecraft autonomy, space servicing

Procedia PDF Downloads 128
24365 Research of Data Cleaning Methods Based on Dependency Rules

Authors: Yang Bao, Shi Wei Deng, WangQun Lin

Abstract:

This paper introduces the concept and principle of data cleaning, analyzes the types and causes of dirty data, and proposes several key steps of typical cleaning process, puts forward a well scalability and versatility data cleaning framework, in view of data with attribute dependency relation, designs several of violation data discovery algorithms by formal formula, which can obtain inconsistent data to all target columns with condition attribute dependent no matter data is structured (SQL) or unstructured (NoSQL), and gives 6 data cleaning methods based on these algorithms.

Keywords: data cleaning, dependency rules, violation data discovery, data repair

Procedia PDF Downloads 546
24364 A Comparative Analysis of Machine Learning Techniques for PM10 Forecasting in Vilnius

Authors: Mina Adel Shokry Fahim, Jūratė Sužiedelytė Visockienė

Abstract:

With the growing concern over air pollution (AP), it is clear that this has gained more prominence than ever before. The level of consciousness has increased and a sense of knowledge now has to be forwarded as a duty by those enlightened enough to disseminate it to others. This realisation often comes after an understanding of how poor air quality indices (AQI) damage human health. The study focuses on assessing air pollution prediction models specifically for Lithuania, addressing a substantial need for empirical research within the region. Concentrating on Vilnius, it specifically examines particulate matter concentrations 10 micrometers or less in diameter (PM10). Utilizing Gaussian Process Regression (GPR) and Regression Tree Ensemble, and Regression Tree methodologies, predictive forecasting models are validated and tested using hourly data from January 2020 to December 2022. The study explores the classification of AP data into anthropogenic and natural sources, the impact of AP on human health, and its connection to cardiovascular diseases. The study revealed varying levels of accuracy among the models, with GPR achieving the highest accuracy, indicated by an RMSE of 4.14 in validation and 3.89 in testing.

Keywords: air pollution, anthropogenic and natural sources, machine learning, Gaussian process regression, tree ensemble, forecasting models, particulate matter

Procedia PDF Downloads 36
24363 Cotton Crops Vegetative Indices Based Assessment Using Multispectral Images

Authors: Muhammad Shahzad Shifa, Amna Shifa, Muhammad Omar, Aamir Shahzad, Rahmat Ali Khan

Abstract:

Many applications of remote sensing to vegetation and crop response depend on spectral properties of individual leaves and plants. Vegetation indices are usually determined to estimate crop biophysical parameters like crop canopies and crop leaf area indices with the help of remote sensing. Cotton crops assessment is performed with the help of vegetative indices. Remotely sensed images from an optical multispectral radiometer MSR5 are used in this study. The interpretation is based on the fact that different materials reflect and absorb light differently at different wavelengths. Non-normalized and normalized forms of these datasets are analyzed using two complementary data mining algorithms; K-means and K-nearest neighbor (KNN). Our analysis shows that the use of normalized reflectance data and vegetative indices are suitable for an automated assessment and decision making.

Keywords: cotton, condition assessment, KNN algorithm, clustering, MSR5, vegetation indices

Procedia PDF Downloads 317
24362 Enhancing Early Detection of Coronary Heart Disease Through Cloud-Based AI and Novel Simulation Techniques

Authors: Md. Abu Sufian, Robiqul Islam, Imam Hossain Shajid, Mahesh Hanumanthu, Jarasree Varadarajan, Md. Sipon Miah, Mingbo Niu

Abstract:

Coronary Heart Disease (CHD) remains a principal cause of global morbidity and mortality, characterized by atherosclerosis—the build-up of fatty deposits inside the arteries. The study introduces an innovative methodology that leverages cloud-based platforms like AWS Live Streaming and Artificial Intelligence (AI) to early detect and prevent CHD symptoms in web applications. By employing novel simulation processes and AI algorithms, this research aims to significantly mitigate the health and societal impacts of CHD. Methodology: This study introduces a novel simulation process alongside a multi-phased model development strategy. Initially, health-related data, including heart rate variability, blood pressure, lipid profiles, and ECG readings, were collected through user interactions with web-based applications as well as API Integration. The novel simulation process involved creating synthetic datasets that mimic early-stage CHD symptoms, allowing for the refinement and training of AI algorithms under controlled conditions without compromising patient privacy. AWS Live Streaming was utilized to capture real-time health data, which was then processed and analysed using advanced AI techniques. The novel aspect of our methodology lies in the simulation of CHD symptom progression, which provides a dynamic training environment for our AI models enhancing their predictive accuracy and robustness. Model Development: it developed a machine learning model trained on both real and simulated datasets. Incorporating a variety of algorithms including neural networks and ensemble learning model to identify early signs of CHD. The model's continuous learning mechanism allows it to evolve adapting to new data inputs and improving its predictive performance over time. Results and Findings: The deployment of our model yielded promising results. In the validation phase, it achieved an accuracy of 92% in predicting early CHD symptoms surpassing existing models. The precision and recall metrics stood at 89% and 91% respectively, indicating a high level of reliability in identifying at-risk individuals. These results underscore the effectiveness of combining live data streaming with AI in the early detection of CHD. Societal Implications: The implementation of cloud-based AI for CHD symptom detection represents a significant step forward in preventive healthcare. By facilitating early intervention, this approach has the potential to reduce the incidence of CHD-related complications, decrease healthcare costs, and improve patient outcomes. Moreover, the accessibility and scalability of cloud-based solutions democratize advanced health monitoring, making it available to a broader population. This study illustrates the transformative potential of integrating technology and healthcare, setting a new standard for the early detection and management of chronic diseases.

Keywords: coronary heart disease, cloud-based ai, machine learning, novel simulation techniques, early detection, preventive healthcare

Procedia PDF Downloads 47
24361 Educational Leadership and Artificial Intelligence

Authors: Sultan Ghaleb Aldaihani

Abstract:

- The environment in which educational leadership takes place is becoming increasingly complex due to factors like globalization and rapid technological change. - This is creating a "leadership gap" where the complexity of the environment outpaces the ability of leaders to effectively respond. - Educational leadership involves guiding teachers and the broader school system towards improved student learning and achievement. 2. Implications of Artificial Intelligence (AI) in Educational Leadership: - AI has great potential to enhance education, such as through intelligent tutoring systems and automating routine tasks to free up teachers. - AI can also have significant implications for educational leadership by providing better information and data-driven decision-making capabilities. - Computer-adaptive testing can provide detailed, individualized data on student learning that leaders can use for instructional decisions and accountability. 3. Enhancing Decision-Making Processes: - Statistical models and data mining techniques can help identify at-risk students earlier, allowing for targeted interventions. - Probability-based models can diagnose students likely to drop out, enabling proactive support. - These data-driven approaches can make resource allocation and decision-making more effective. 4. Improving Efficiency and Productivity: - AI systems can automate tasks and change processes to improve the efficiency of educational leadership and administration. - Integrating AI can free up leaders to focus more on their role's human, interactive elements.

Keywords: Education, Leadership, Technology, Artificial Intelligence

Procedia PDF Downloads 13
24360 Multi-Criteria Inventory Classification Process Based on Logical Analysis of Data

Authors: Diana López-Soto, Soumaya Yacout, Francisco Ángel-Bello

Abstract:

Although inventories are considered as stocks of money sitting on shelve, they are needed in order to secure a constant and continuous production. Therefore, companies need to have control over the amount of inventory in order to find the balance between excessive and shortage of inventory. The classification of items according to certain criteria such as the price, the usage rate and the lead time before arrival allows any company to concentrate its investment in inventory according to certain ranking or priority of items. This makes the decision making process for inventory management easier and more justifiable. The purpose of this paper is to present a new approach for the classification of new items based on the already existing criteria. This approach is called the Logical Analysis of Data (LAD). It is used in this paper to assist the process of ABC items classification based on multiple criteria. LAD is a data mining technique based on Boolean theory that is used for pattern recognition. This technique has been tested in medicine, industry, credit risk analysis, and engineering with remarkable results. An application on ABC inventory classification is presented for the first time, and the results are compared with those obtained when using the well-known AHP technique and the ANN technique. The results show that LAD presented very good classification accuracy.

Keywords: ABC multi-criteria inventory classification, inventory management, multi-class LAD model, multi-criteria classification

Procedia PDF Downloads 858
24359 A Prospective Neurosurgical Registry Evaluating the Clinical Care of Traumatic Brain Injury Patients Presenting to Mulago National Referral Hospital in Uganda

Authors: Benjamin J. Kuo, Silvia D. Vaca, Joao Ricardo Nickenig Vissoci, Catherine A. Staton, Linda Xu, Michael Muhumuza, Hussein Ssenyonjo, John Mukasa, Joel Kiryabwire, Lydia Nanjula, Christine Muhumuza, Henry E. Rice, Gerald A. Grant, Michael M. Haglund

Abstract:

Background: Traumatic Brain Injury (TBI) is disproportionally concentrated in low- and middle-income countries (LMICs), with the odds of dying from TBI in Uganda more than 4 times higher than in high income countries (HICs). The disparities in the injury incidence and outcome between LMICs and resource-rich settings have led to increased health outcomes research for TBIs and their associated risk factors in LMICs. While there have been increasing TBI studies in LMICs over the last decade, there is still a need for more robust prospective registries. In Uganda, a trauma registry implemented in 2004 at the Mulago National Referral Hospital (MNRH) showed that RTI is the major contributor (60%) of overall mortality in the casualty department. While the prior registry provides information on injury incidence and burden, it’s limited in scope and doesn’t follow patients longitudinally throughout their hospital stay nor does it focus specifically on TBIs. And although these retrospective analyses are helpful for benchmarking TBI outcomes, they make it hard to identify specific quality improvement initiatives. The relationship among epidemiology, patient risk factors, clinical care, and TBI outcomes are still relatively unknown at MNRH. Objective: The objectives of this study are to describe the processes of care and determine risk factors predictive of poor outcomes for TBI patients presenting to a single tertiary hospital in Uganda. Methods: Prospective data were collected for 563 TBI patients presenting to a tertiary hospital in Kampala from 1 June – 30 November 2016. Research Electronic Data Capture (REDCap) was used to systematically collect variables spanning 8 categories. Univariate and multivariate analysis were conducted to determine significant predictors of mortality. Results: 563 TBI patients were enrolled from 1 June – 30 November 2016. 102 patients (18%) received surgery, 29 patients (5.1%) intended for surgery failed to receive it, and 251 patients (45%) received non-operative management. Overall mortality was 9.6%, which ranged from 4.7% for mild and moderate TBI to 55% for severe TBI patients with GCS 3-5. Within each TBI severity category, mortality differed by management pathway. Variables predictive of mortality were TBI severity, more than one intracranial bleed, failure to receive surgery, high dependency unit admission, ventilator support outside of surgery, and hospital arrival delayed by more than 4 hours. Conclusions: The overall mortality rate of 9.6% in Uganda for TBI is high, and likely underestimates the true TBI mortality. Furthermore, the wide-ranging mortality (3-82%), high ICU fatality, and negative impact of care delays suggest shortcomings with the current triaging practices. Lack of surgical intervention when needed was highly predictive of mortality in TBI patients. Further research into the determinants of surgical interventions, quality of step-up care, and prolonged care delays are needed to better understand the complex interplay of variables that affect patient outcome. These insights guide the development of future interventions and resource allocation to improve patient outcomes.

Keywords: care continuum, global neurosurgery, Kampala Uganda, LMIC, Mulago, prospective registry, traumatic brain injury

Procedia PDF Downloads 215
24358 Evaluation of the Urban Regeneration Project: Land Use Transformation and SNS Big Data Analysis

Authors: Ju-Young Kim, Tae-Heon Moon, Jung-Hun Cho

Abstract:

Urban regeneration projects have been actively promoted in Korea. In particular, Jeonju Hanok Village is evaluated as one of representative cases in terms of utilizing local cultural heritage sits in the urban regeneration project. However, recently, there has been a growing concern in this area, due to the ‘gentrification’, caused by the excessive commercialization and surging tourists. This trend was changing land and building use and resulted in the loss of identity of the region. In this regard, this study analyzed the land use transformation between 2010 and 2016 to identify the commercialization trend in Jeonju Hanok Village. In addition, it conducted SNS big data analysis on Jeonju Hanok Village from February 14th, 2016 to March 31st, 2016 to identify visitors’ awareness of the village. The study results demonstrate that rapid commercialization was underway, unlikely the initial intention, so that planners and officials in city government should reconsider the project direction and rebuild deliberate management strategies. This study is meaningful in that it analyzed the land use transformation and SNS big data to identify the current situation in urban regeneration area. Furthermore, it is expected that the study results will contribute to the vitalization of regeneration area.

Keywords: land use, SNS, text mining, urban regeneration

Procedia PDF Downloads 280