Search results for: multidimensional process mining
15993 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning
Authors: Walid Cherif
Abstract:
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.Keywords: data mining, knowledge discovery, machine learning, similarity measurement, supervised classification
Procedia PDF Downloads 46415992 Emergence of Information Centric Networking and Web Content Mining: A Future Efficient Internet Architecture
Authors: Sajjad Akbar, Rabia Bashir
Abstract:
With the growth of the number of users, the Internet usage has evolved. Due to its key design principle, there is an incredible expansion in its size. This tremendous growth of the Internet has brought new applications (mobile video and cloud computing) as well as new user’s requirements i.e. content distribution environment, mobility, ubiquity, security and trust etc. The users are more interested in contents rather than their communicating peer nodes. The current Internet architecture is a host-centric networking approach, which is not suitable for the specific type of applications. With the growing use of multiple interactive applications, the host centric approach is considered to be less efficient as it depends on the physical location, for this, Information Centric Networking (ICN) is considered as the potential future Internet architecture. It is an approach that introduces uniquely named data as a core Internet principle. It uses the receiver oriented approach rather than sender oriented. It introduces the naming base information system at the network layer. Although ICN is considered as future Internet architecture but there are lot of criticism on it which mainly concerns that how ICN will manage the most relevant content. For this Web Content Mining(WCM) approaches can help in appropriate data management of ICN. To address this issue, this paper contributes by (i) discussing multiple ICN approaches (ii) analyzing different Web Content Mining approaches (iii) creating a new Internet architecture by merging ICN and WCM to solve the data management issues of ICN. From ICN, Content-Centric Networking (CCN) is selected for the new architecture, whereas, Agent-based approach from Web Content Mining is selected to find most appropriate data.Keywords: agent based web content mining, content centric networking, information centric networking
Procedia PDF Downloads 47515991 the fairness of meritocracy and Korean Democracy-What makes the Korean youth accept the fairness of meritocracy??
Authors: WooJin KANG
Abstract:
Contrary to the ideal, in the cartelized democracy, meritocracy is revealed to be a system that gives arrogance to the winners and humiliation to the losers, and more and more studies are asserting the upper-class bias of meritocracy. However, only some studies have analyzed the determinants of the perception of meritocracy and fairness among young people. This article was an attempt to fill this gap. According to the empirical results of this article, the determinants of fairness of the meritocracy in the youth were multidimensional. The social status model, the political ideology model, and the future prospect model all significantly impacted the perception of meritocracy fairness among young people. Contrary to the predictions of the system justification theory and the compensatory control theory of previous studies, the lower-class youth were critical of meritocracy. In addition, the more negative the future outlook, the less they accepted the fairness of meritocracy. In addition, ideological debates over solutions to inequality of opportunity, which began in earnest during the 20th presidential election, turned out to be a variable that significantly influenced the perception of fairness based on meritocracy among young people. The results of the empirical analysis in this article reaffirmed the multidimensional structure of the youth. This suggests the need for policy responses leading to education tailored to various subgroups within the youth.Keywords: Meritocracy, Exam-Meritocracy, Fairness, Multiple-inequality
Procedia PDF Downloads 6215990 Focus-Latent Dirichlet Allocation for Aspect-Level Opinion Mining
Authors: Mohsen Farhadloo, Majid Farhadloo
Abstract:
Aspect-level opinion mining that aims at discovering aspects (aspect identification) and their corresponding ratings (sentiment identification) from customer reviews have increasingly attracted attention of researchers and practitioners as it provides valuable insights about products/services from customer's points of view. Instead of addressing aspect identification and sentiment identification in two separate steps, it is possible to simultaneously identify both aspects and sentiments. In recent years many graphical models based on Latent Dirichlet Allocation (LDA) have been proposed to solve both aspect and sentiment identifications in a single step. Although LDA models have been effective tools for the statistical analysis of document collections, they also have shortcomings in addressing some unique characteristics of opinion mining. Our goal in this paper is to address one of the limitations of topic models to date; that is, they fail to directly model the associations among topics. Indeed in many text corpora, it is natural to expect that subsets of the latent topics have higher probabilities. We propose a probabilistic graphical model called focus-LDA, to better capture the associations among topics when applied to aspect-level opinion mining. Our experiments on real-life data sets demonstrate the improved effectiveness of the focus-LDA model in terms of the accuracy of the predictive distributions over held out documents. Furthermore, we demonstrate qualitatively that the focus-LDA topic model provides a natural way of visualizing and exploring unstructured collection of textual data.Keywords: aspect-level opinion mining, document modeling, Latent Dirichlet Allocation, LDA, sentiment analysis
Procedia PDF Downloads 9415989 Q-Map: Clinical Concept Mining from Clinical Documents
Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.Keywords: information retrieval, unified medical language system, syntax based analysis, natural language processing, medical informatics
Procedia PDF Downloads 13315988 The Role of Strategic Alliances, Innovation Capability, Cost Reduction in Enhancing Customer Loyalty and Firm’s Competitive Advantage
Authors: Soebowo Musa
Abstract:
Mining industries are known to be very volatile due to their sensitive nature toward changes in the environment, particularly coal mining. Heavy equipment distributors and coal mining contractors are among heavily affected by such volatility. They are facing more uncertainty on the sustainability of the coal mining industry. Strategic alliances and organizational capabilities such as innovation capability have long been seen as ways to stay competitive with a focus more on the strategic alliances partner-to-partner in serving their customers. In today’s rapid change in the environment, a shift in consumer behaviors, and the human-centric business approach, this study looks at the strategic alliance partner-to-customer relationship in both the industrial organization and resource-based theories. This study was conducted based on 250 respondents from the strategic alliances partner-to-customer between heavy equipment distributors and coal mining contractors in Indonesia. This study finds strategic alliances have the highest association toward cost reduction, a proxy of operational efficiency followed by its association toward innovation capability. Further, strategic alliances and innovation capability have a positive relationship with customer loyalty, while innovation capability and customer loyalty have no significant relationships toward the firm’s competitive advantage. This study also indicates that cost reduction is not a condition to develop customer loyalty in the strategic alliance partner-to-customer relationship. It confirms strategic alliances are a strategy that creates a firm’s operational efficiency, innovation capability that develops customer loyalty, and competitive advantage.Keywords: strategic alliance, innovation capability, cost reduction, customer loyalty, competitive advantage
Procedia PDF Downloads 11915987 Multicriteria for Optimal Land Use after Mining
Authors: Carla Idely Palencia-Aguilar
Abstract:
Mining in Colombia represents around 2% of the GDP (USD 8 billion in 2018), with main productions represented by coal, nickel, gold, silver, emeralds, iron, limestone, gypsum, among others. Sand and Gravel had been decreasing its participation of the GDP with a reduction of 33.2 million m3 in 2015, to 27.4 in 2016, 22.7 in 2017 and 15.8 in 2018, with a consumption of approximately 3 tons/inhabitant. However, with the new government policies it is expected to increase in the following years. Mining causes temporary environmental impacts, once restoration and rehabilitation takes place, social, environmental and economic benefits are higher than the initial state. A way to demonstrate how the mining interventions had contributed to improve the characteristics of the region after sand and gravel mining, the NDVI (Normalized Difference Vegetation Index) from MODIS and ASTER were employed. The histograms show not only increments of vegetation in the area (8 times higher), but also topographies similar to the ones before the intervention, according to the application for sustainable development selected: either agriculture, forestry, cattle raising, artificial wetlands or do nothing. The decision was based upon a Multicriteria analysis for optimal land use, with three main variables: geostatistics, evapotranspiration and groundwater characteristics. The use of remote sensing, meteorological stations, piezometers, sunphotometers, geoelectric analysis among others; provide the information required for the multicriteria decision. For cattle raising and agricultural applications (where various crops were implemented), conservation of products were tested by means of nanotechnology. The results showed a duration of 2 years with no chemicals added for preservation and concentration of vitamins of the tested products.Keywords: ASTER, Geostatistics, MODIS, Multicriteria
Procedia PDF Downloads 12615986 Reuse of Huge Industrial Areas
Authors: Martina Perinkova, Lenka Kolarcikova, Marketa Twrda
Abstract:
Brownfields are one of the most important problems that must be solved by today's cities. The topic of this article is description of developing a comprehensive transformation of post-industrial area of the former iron factory national cultural heritage Lower Vítkovice. City of Ostrava used to be industrial superpower of the Czechoslovak Republic, especially in the area of coal mining and iron production, after declining industrial production and mining in the 80s left many unused areas of former factories generally brownfields and backfields. Since the late 90s we are observing how the city officials or private entities seeking to remedy this situation. Regeneration of brownfields is a very expensive and long-term process. The area is now rebuilt for tourists and residents of the city in the entertainment, cultural, and social center. It was necessary do the reconstruction of the industrial monuments. Equally important was the construction of new buildings, which helped reusing of the entire complex. This is a unique example of transformation of technical monuments and completion of necessary new objects, so that the area could start working again and reintegrate back into the urban system.Keywords: brown fields, conversion, historical and industrial buildings, reconstruction
Procedia PDF Downloads 32915985 Data Mining of Students' Performance Using Artificial Neural Network: Turkish Students as a Case Study
Authors: Samuel Nii Tackie, Oyebade K. Oyedotun, Ebenezer O. Olaniyi, Adnan Khashman
Abstract:
Artificial neural networks have been used in different fields of artificial intelligence, and more specifically in machine learning. Although, other machine learning options are feasible in most situations, but the ease with which neural networks lend themselves to different problems which include pattern recognition, image compression, classification, computer vision, regression etc. has earned it a remarkable place in the machine learning field. This research exploits neural networks as a data mining tool in predicting the number of times a student repeats a course, considering some attributes relating to the course itself, the teacher, and the particular student. Neural networks were used in this work to map the relationship between some attributes related to students’ course assessment and the number of times a student will possibly repeat a course before he passes. It is the hope that the possibility to predict students’ performance from such complex relationships can help facilitate the fine-tuning of academic systems and policies implemented in learning environments. To validate the power of neural networks in data mining, Turkish students’ performance database has been used; feedforward and radial basis function networks were trained for this task; and the performances obtained from these networks evaluated in consideration of achieved recognition rates and training time.Keywords: artificial neural network, data mining, classification, students’ evaluation
Procedia PDF Downloads 61315984 Application of Association Rule Using Apriori Algorithm for Analysis of Industrial Accidents in 2013-2014 in Indonesia
Authors: Triano Nurhikmat
Abstract:
Along with the progress of science and technology, the development of the industrialized world in Indonesia took place very rapidly. This leads to a process of industrialization of society Indonesia faster with the establishment of the company and the workplace are diverse. Development of the industry relates to the activity of the worker. Where in these work activities do not cover the possibility of an impending crash on either the workers or on a construction project. The cause of the occurrence of industrial accidents was the fault of electrical damage, work procedures, and error technique. The method of an association rule is one of the main techniques in data mining and is the most common form used in finding the patterns of data collection. In this research would like to know how relations of the association between the incidence of any industrial accidents. Therefore, by using methods of analysis association rule patterns associated with combination obtained two iterations item set (2 large item set) when every factor of industrial accidents with a West Jakarta so industrial accidents caused by the occurrence of an electrical value damage = 0.2 support and confidence value = 1, and the reverse pattern with value = 0.2 support and confidence = 0.75.Keywords: association rule, data mining, industrial accidents, rules
Procedia PDF Downloads 29915983 Arabic Light Stemmer for Better Search Accuracy
Authors: Sahar Khedr, Dina Sayed, Ayman Hanafy
Abstract:
Arabic is one of the most ancient and critical languages in the world. It has over than 250 million Arabic native speakers and more than twenty countries having Arabic as one of its official languages. In the past decade, we have witnessed a rapid evolution in smart devices, social network and technology sector which led to the need to provide tools and libraries that properly tackle the Arabic language in different domains. Stemming is one of the most crucial linguistic fundamentals. It is used in many applications especially in information extraction and text mining fields. The motivation behind this work is to enhance the Arabic light stemmer to serve the data mining industry and leverage it in an open source community. The presented implementation works on enhancing the Arabic light stemmer by utilizing and enhancing an algorithm that provides an extension for a new set of rules and patterns accompanied by adjusted procedure. This study has proven a significant enhancement for better search accuracy with an average 10% improvement in comparison with previous works.Keywords: Arabic data mining, Arabic Information extraction, Arabic Light stemmer, Arabic stemmer
Procedia PDF Downloads 30815982 Feature Selection for Production Schedule Optimization in Transition Mines
Authors: Angelina Anani, Ignacio Ortiz Flores, Haitao Li
Abstract:
The use of underground mining methods have increased significantly over the past decades. This increase has also been spared on by several mines transitioning from surface to underground mining. However, determining the transition depth can be a challenging task, especially when coupled with production schedule optimization. Several researchers have simplified the problem by excluding operational features relevant to production schedule optimization. Our research objective is to investigate the extent to which operational features of transition mines accounted for affect the optimal production schedule. We also provide a framework for factors to consider in production schedule optimization for transition mines. An integrated mixed-integer linear programming (MILP) model is developed that maximizes the NPV as a function of production schedule and transition depth. A case study is performed to validate the model, with a comparative sensitivity analysis to obtain operational insights.Keywords: underground mining, transition mines, mixed-integer linear programming, production schedule
Procedia PDF Downloads 16915981 Corpus Stylistics and Multidimensional Analysis for English for Specific Purposes Teaching and Assessment
Authors: Svetlana Strinyuk, Viacheslav Lanin
Abstract:
Academic English has become lingua franca for international scientific community which stimulates universities to introduce English for Specific Purposes (EAP) courses into curriculum. Teaching L2 EAP students might be fulfilled with corpus technologies and digital stylistics. A special software developed to reach the manifold task of teaching, assessing and researching academic writing of L2 students on basis of digital stylistics and multidimensional analysis was created. A set of annotations (style markers) – grammar, lexical and syntactic features most significant of academic writing was built. Contrastive comparison of two corpora “model corpus”, subject domain limited papers published by competent writers in leading academic journals, and “students’ corpus”, subject domain limited papers written by last year students allows to receive data about the features of academic writing underused or overused by L2 EAP student. Both corpora are tagged with a special software created in GATE Developer. Style markers within the framework of research might be replaced depending on the relevance and validity of the result which is achieved from research corpora. Thus, selecting relevant (high frequency) style markers and excluding less relevant, i.e. less frequent annotations, high validity of the model is achieved. Software allows to compare the data received from processing model corpus to students’ corpus and get reports which can be used in teaching and assessment. The less deviation from the model corpus students demonstrates in their writing the higher is academic writing skill acquisition. The research showed that several style markers (hedging devices) were underused by L2 EAP students whereas lexical linking devices were used excessively. A special software implemented into teaching of EAP courses serves as a successful visual aid, makes assessment more valid; it is indicative of the degree of writing skill acquisition, and provides data for further research.Keywords: corpus technologies in EAP teaching, multidimensional analysis, GATE Developer, corpus stylistics
Procedia PDF Downloads 19915980 Effect of Bacillus Pumilus Strains on Heavy Metal Accumulation in Lettuce Grown on Contaminated Soil
Authors: Sabeen Alam, Mehboob Alam
Abstract:
The research work entitled “Effect of Bacillus pumilus strains on heavy metal accumulation in lettuce grown on contaminated soil” focused on functional role of Bacillus pumilus strains inoculated with lettuce seed in mitigating heavy metal in chromite mining soil. In this experiment, factor A was three Bacillus pumilus strains (sequence C-2PMW-8, C-1 SSK-8 and C-1 PWK-7) while soil used for this experiment was collected from Prang Ghar mining site and lettuce seeds were grown in three levels of chromite mining soil (2.27, 4.65 and 7.14 %). For mining soil minimum days to germinate noted in lettuce grown on garden soil inoculated with sequence. Maximum germination percentage noted was for C-1 SSK-8 grown on garden soil, maximum lettuce height for sequence C-2 PWM-8, fresh leaf weight for C-1 PWK-7 inoculated lettuce, dry weight of lettuce leaf for lettuce inoculated with C-1 SSK-8 and C-1 PWK-7 strains, number of leaves per plant for lettuce inoculated with C-1 SSK-8, leaf area for C-2 PMW-8 inoculated lettuce, survival percentage for C-1 SSK-8 treated lettuce and chlorophyll content for C-2 PMW-8. Results related to heavy metals accumulation showed that minimum chromium was in lettuce and in soil for all three sequences, cadmium (Cd) in lettuce and in soil for all three sequences, manganese (Mn) in lettuce and in soil for three sequences, lead (Pb) in lettuce and in soil for three sequences. It can be concluded that chromite mining soil significantly reduced the growth and survival of lettuce, but when lettuce was inoculated with Bacillus.pumilus strains, it enhances growth and survival. Similarly, minimum heavy metal accumulation in plant and soil, regardless of type of Bacillus pumilus used, all three sequences has same mitigating effect on heavy metal in both soil and lettuce. All the three Bacillus pumilus strains ensured reduction in heavy metals content (Mn, Cd, Cr) in lettuce, below the maximum permissible limits of WHO 2011.Keywords: bacillus pumilus, heavy metals, permissible limits, lettuce, chromite mining soil, mitigating effect
Procedia PDF Downloads 5915979 The Human Right to a Safe, Clean and Healthy Environment in Corporate Social Responsibility's Strategies: An Approach to Understanding Mexico's Mining Sector
Authors: Thalia Viveros-Uehara
Abstract:
The virtues of Corporate Social Responsibility (CSR) are explored widely in the academic literature. However, few studies address its link to human rights, per se; specifically, the right to a safe, clean and healthy environment. Fewer still are the research works in this area that relate to developing countries, where a number of areas are biodiversity hotspots. In Mexico, despite the rise and evolution of CSR schemes, grave episodes of pollution persist, especially those caused by the mining industry. These cases set up the question of the correspondence between the current CSR practices of mining companies in the country and their responsibility to respect the right to a safe, clean and healthy environment. The present study approaches precisely such a bridge, which until now has not been fully tackled in light of Mexico's 2011 constitutional human rights amendment and the United Nation's Guiding Principles on Business and Human Rights (UN Guiding Principles), adopted by the Human Rights Council in 2011. To that aim, it initially presents a contextual framework; it then explores qualitatively the adoption of human rights’ language in the CSR strategies of the three main mining companies in Mexico, and finally, it examines their standing with respect to the UN Guiding Principles. The results reveal that human rights are included in the RSE strategies of the analysed businesses, at least at the rhetoric level; however, they do not embrace the right to a safe, clean and healthy environment as such. Moreover, we conclude that despite the finding that corporations publicly express their commitment to respect human rights, some operational weaknesses that hamper the exercise of such responsibility persist; for example, the systematic lack of human rights impact assessments per mining unit, the denial of actual and publicly-known negative episodes on the environment linked directly to their operations, and the absence of effective mechanisms to remediate adverse impacts.Keywords: corporate social responsibility, environmental impacts, human rights, right to a safe, clean and healthy environment, mining industry
Procedia PDF Downloads 32915978 Computerized Adaptive Testing for Ipsative Tests with Multidimensional Pairwise-Comparison Items
Authors: Wen-Chung Wang, Xue-Lan Qiu
Abstract:
Ipsative tests have been widely used in vocational and career counseling (e.g., the Jackson Vocational Interest Survey). Pairwise-comparison items are a typical item format of ipsative tests. When the two statements in a pairwise-comparison item measure two different constructs, the item is referred to as a multidimensional pairwise-comparison (MPC) item. A typical MPC item would be: Which activity do you prefer? (A) playing with young children, or (B) working with tools and machines. These two statements aim at the constructs of social interest and investigative interest, respectively. Recently, new item response theory (IRT) models for ipsative tests with MPC items have been developed. Among them, the Rasch ipsative model (RIM) deserves special attention because it has good measurement properties, in which the log-odds of preferring statement A to statement B are defined as a competition between two parts: the sum of a person’s latent trait to which statement A is measuring and statement A’s utility, and the sum of a person’s latent trait to which statement B is measuring and statement B’s utility. The RIM has been extended to polytomous responses, such as preferring statement A strongly, preferring statement A, preferring statement B, and preferring statement B strongly. To promote the new initiatives, in this study we developed computerized adaptive testing algorithms for MFC items and evaluated their performance using simulations and two real tests. Both the RIM and its polytomous extension are multidimensional, which calls for multidimensional computerized adaptive testing (MCAT). A particular issue in MCAT for MPC items is the within-person statement exposure (WPSE); that is, a respondent may keep seeing the same statement (e.g., my life is empty) for many times, which is certainly annoying. In this study, we implemented two methods to control the WPSE rate. In the first control method, items would be frozen when their statements had been administered more than a prespecified times. In the second control method, a random component was added to control the contribution of the information at different stages of MCAT. The second control method was found to outperform the first control method in our simulation studies. In addition, we investigated four item selection methods: (a) random selection (as a baseline), (b) maximum Fisher information method without WPSE control, (c) maximum Fisher information method with the first control method, and (d) maximum Fisher information method with the second control method. These four methods were applied to two real tests: one was a work survey with dichotomous MPC items and the other is a career interests survey with polytomous MPC items. There were three dependent variables: the bias and root mean square error across person measures, and measurement efficiency which was defined as the number of items needed to achieve the same degree of test reliability. Both applications indicated that the proposed MCAT algorithms were successful and there was no loss in measurement proficiency when the control methods were implemented, and among the four methods, the last method performed the best.Keywords: computerized adaptive testing, ipsative tests, item response theory, pairwise comparison
Procedia PDF Downloads 24615977 Opinion Mining to Extract Community Emotions on Covid-19 Immunization Possible Side Effects
Authors: Yahya Almurtadha, Mukhtar Ghaleb, Ahmed M. Shamsan Saleh
Abstract:
The world witnessed a fierce attack from the Covid-19 virus, which affected public life socially, economically, healthily and psychologically. The world's governments tried to confront the pandemic by imposing a number of precautionary measures such as general closure, curfews and social distancing. Scientists have also made strenuous efforts to develop an effective vaccine to train the immune system to develop antibodies to combat the virus, thus reducing its symptoms and limiting its spread. Artificial intelligence, along with researchers and medical authorities, has accelerated the vaccine development process through big data processing and simulation. On the other hand, one of the most important negatives of the impact of Covid 19 was the state of anxiety and fear due to the blowout of rumors through social media, which prompted governments to try to reassure the public with the available means. This study aims to proposed using Sentiment Analysis (AKA Opinion Mining) and deep learning as efficient artificial intelligence techniques to work on retrieving the tweets of the public from Twitter and then analyze it automatically to extract their opinions, expression and feelings, negatively or positively, about the symptoms they may feel after vaccination. Sentiment analysis is characterized by its ability to access what the public post in social media within a record time and at a lower cost than traditional means such as questionnaires and interviews, not to mention the accuracy of the information as it comes from what the public expresses voluntarily.Keywords: deep learning, opinion mining, natural language processing, sentiment analysis
Procedia PDF Downloads 17115976 Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining by Improving Apriori Algorithm with Fuzzy Logic
Authors: Pejman Hosseinioun, Hasan Shakeri, Ghasem Ghorbanirostam
Abstract:
In recent years, we have seen an increasing importance of research and study on knowledge source, decision support systems, data mining and procedure of knowledge discovery in data bases and it is considered that each of these aspects affects the others. In this article, we have merged information source and knowledge source to suggest a knowledge based system within limits of management based on storing and restoring of knowledge to manage information and improve decision making and resources. In this article, we have used method of data mining and Apriori algorithm in procedure of knowledge discovery one of the problems of Apriori algorithm is that, a user should specify the minimum threshold for supporting the regularity. Imagine that a user wants to apply Apriori algorithm for a database with millions of transactions. Definitely, the user does not have necessary knowledge of all existing transactions in that database, and therefore cannot specify a suitable threshold. Our purpose in this article is to improve Apriori algorithm. To achieve our goal, we tried using fuzzy logic to put data in different clusters before applying the Apriori algorithm for existing data in the database and we also try to suggest the most suitable threshold to the user automatically.Keywords: decision support system, data mining, knowledge discovery, data discovery, fuzzy logic
Procedia PDF Downloads 33515975 Investigating Data Normalization Techniques in Swarm Intelligence Forecasting for Energy Commodity Spot Price
Authors: Yuhanis Yusof, Zuriani Mustaffa, Siti Sakira Kamaruddin
Abstract:
Data mining is a fundamental technique in identifying patterns from large data sets. The extracted facts and patterns contribute in various domains such as marketing, forecasting, and medical. Prior to that, data are consolidated so that the resulting mining process may be more efficient. This study investigates the effect of different data normalization techniques, which are Min-max, Z-score, and decimal scaling, on Swarm-based forecasting models. Recent swarm intelligence algorithms employed includes the Grey Wolf Optimizer (GWO) and Artificial Bee Colony (ABC). Forecasting models are later developed to predict the daily spot price of crude oil and gasoline. Results showed that GWO works better with Z-score normalization technique while ABC produces better accuracy with the Min-Max. Nevertheless, the GWO is more superior that ABC as its model generates the highest accuracy for both crude oil and gasoline price. Such a result indicates that GWO is a promising competitor in the family of swarm intelligence algorithms.Keywords: artificial bee colony, data normalization, forecasting, Grey Wolf optimizer
Procedia PDF Downloads 47515974 Text Mining of Veterinary Forums for Epidemiological Surveillance Supplementation
Authors: Samuel Munaf, Kevin Swingler, Franz Brülisauer, Anthony O’Hare, George Gunn, Aaron Reeves
Abstract:
Web scraping and text mining are popular computer science methods deployed by public health researchers to augment traditional epidemiological surveillance. However, within veterinary disease surveillance, such techniques are still in the early stages of development and have not yet been fully utilised. This study presents an exploration into the utility of incorporating internet-based data to better understand the smallholder farming communities within Scotland by using online text extraction and the subsequent mining of this data. Web scraping of the livestock fora was conducted in conjunction with text mining of the data in search of common themes, words, and topics found within the text. Results from bi-grams and topic modelling uncover four main topics of interest within the data pertaining to aspects of livestock husbandry: feeding, breeding, slaughter, and disposal. These topics were found amongst both the poultry and pig sub-forums. Topic modeling appears to be a useful method of unsupervised classification regarding this form of data, as it has produced clusters that relate to biosecurity and animal welfare. Internet data can be a very effective tool in aiding traditional veterinary surveillance methods, but the requirement for human validation of said data is crucial. This opens avenues of research via the incorporation of other dynamic social media data, namely Twitter and Facebook/Meta, in addition to time series analysis to highlight temporal patterns.Keywords: veterinary epidemiology, disease surveillance, infodemiology, infoveillance, smallholding, social media, web scraping, sentiment analysis, geolocation, text mining, NLP
Procedia PDF Downloads 9815973 Lead and Cadmium Spatial Pattern and Risk Assessment around Coal Mine in Hyrcanian Forest, North Iran
Authors: Mahsa Tavakoli, Seyed Mohammad Hojjati, Yahya Kooch
Abstract:
In this study, the effect of coal mining activities on lead and cadmium concentrations and distribution in soil was investigated in Hyrcanian forest, North Iran. 16 plots (20×20 m2) were established by systematic-randomly (60×60 m2) in an area of 4 ha (200×200 m2-mine entrance placed at center). An area adjacent to the mine was not affected by the mining activity; considered as the controlled area. In order to investigate soil lead and cadmium concentration, one sample was taken from the 0-10 cm in each plot. To study the spatial pattern of soil properties and lead and cadmium concentrations in the mining area, an area of 80×80m2 (the mine as the center) was considered and 80 soil samples were systematic-randomly taken (10 m intervals). Geostatistical analysis was performed via Kriging method and GS+ software (version 5.1). In order to estimate the impact of coal mining activities on soil quality, pollution index was measured. Lead and cadmium concentrations were significantly higher in mine area (Pb: 10.97±0.30, Cd: 184.47±6.26 mg.kg-1) in comparison to control area (Pb: 9.42±0.17, Cd: 131.71±15.77 mg.kg-1). The mean values of the PI index indicate that Pb (1.16) and Cd (1.77) presented slightly polluted. Results of the NIPI index showed that Pb (1.44) and Cd (2.52) presented slight pollution and moderate pollution respectively. Results of variography and kriging method showed that it is possible to prepare interpolation maps of lead and cadmium around the mining areas in Hyrcanian forest. According to results of pollution and risk assessments, forest soil was contaminated by heavy metals (lead and cadmium); therefore, using reclamation and remediation techniques in these areas is necessary.Keywords: traditional coal mining, heavy metals, pollution indicators, geostatistics, Caspian forest
Procedia PDF Downloads 17815972 Norm Evolution through Contestation: Role of Legality from Humanitarian Intervention to Responsibility to Protect
Authors: Nazlı Üstünes Demirhan
Abstract:
International norms are subject to pressures of change through contestation during the course of their lifetimes. The nature of the contestation is one of the factors that are likely to have a determinative role in the direction of this change towards a stronger or weaker norm. This paper aims to understand the relation between the legality of contestation and the direction of change in norm strength. Based on a multidimensional norm strength conceptualization, it is hypothesized that use of legal logic and rhetoric of argumentation would have a positive influence for norm strength, whereas non-legal nature of contestation would lack this and weaken the norm. In order to show this, the evolution of the human protection norm between 1999 and 2018 will be examined with reference to two major contestation periods; Kosovo intervention of 1999, which led to the development of R2P doctrine, and Libya intervention of 2011, which is followed by the demise of the norm. The comparative analysis will be conducted through process tracing method with a document analysis on the Security Council meeting minutes, resolutions, and press releases. This study aims to contribute to the norm contestation literature with the introduction of legal process analysis. It also relates to further questions in IR/IL nexus, relating to the value added of norm legality as well as the politics of legalization.Keywords: humanitarian intervention, legality, norm contestation, norm dynamics, norm strength, responsibility to protect
Procedia PDF Downloads 15915971 Study and Analysis of the Factors Affecting Road Safety Using Decision Tree Algorithms
Authors: Naina Mahajan, Bikram Pal Kaur
Abstract:
The purpose of traffic accident analysis is to find the possible causes of an accident. Road accidents cannot be totally prevented but by suitable traffic engineering and management the accident rate can be reduced to a certain extent. This paper discusses the classification techniques C4.5 and ID3 using the WEKA Data mining tool. These techniques use on the NH (National highway) dataset. With the C4.5 and ID3 technique it gives best results and high accuracy with less computation time and error rate.Keywords: C4.5, ID3, NH(National highway), WEKA data mining tool
Procedia PDF Downloads 33815970 Phillips Curve Estimation in an Emerging Economy: Evidence from Sub-National Data of Indonesia
Authors: Harry Aginta
Abstract:
Using Phillips curve framework, this paper seeks for new empirical evidence on the relationship between inflation and output in a major emerging economy. By exploiting sub-national data, the contribution of this paper is threefold. First, it resolves the issue of using on-target national inflation rates that potentially causes weakening inflation-output nexus. This is very relevant for Indonesia as its central bank has been adopting inflation targeting framework based on national consumer price index (CPI) inflation. Second, the study tests the relevance of mining sector in output gap estimation. The test for mining sector is important to control for the effects of mining regulation and nominal effects of coal prices on real economic activities. Third, the paper applies panel econometric method by incorporating regional variation that help to improve model estimation. The results from this paper confirm the strong presence of Phillips curve in Indonesia. Positive output gap that reflects excess demand condition gives rise to the inflation rates. In addition, the elasticity of output gap is higher if the mining sector is excluded from output gap estimation. In addition to inflation adaptation, the dynamics of exchange rate and international commodity price are also found to affect inflation significantly. The results are robust to the alternative measurement of output gapKeywords: Phillips curve, inflation, Indonesia, panel data
Procedia PDF Downloads 12215969 Research of the Three-Dimensional Visualization Geological Modeling of Mine Based on Surpac
Authors: Honggang Qu, Yong Xu, Rongmei Liu, Zhenji Gao, Bin Wang
Abstract:
Today's mining industry is advancing gradually toward digital and visual direction. The three-dimensional visualization geological modeling of mine is the digital characterization of mineral deposits and is one of the key technology of digital mining. Three-dimensional geological modeling is a technology that combines geological spatial information management, geological interpretation, geological spatial analysis and prediction, geostatistical analysis, entity content analysis and graphic visualization in a three-dimensional environment with computer technology and is used in geological analysis. In this paper, the three-dimensional geological modeling of an iron mine through the use of Surpac is constructed, and the weight difference of the estimation methods between the distance power inverse ratio method and ordinary kriging is studied, and the ore body volume and reserves are simulated and calculated by using these two methods. Compared with the actual mine reserves, its result is relatively accurate, so it provides scientific bases for mine resource assessment, reserve calculation, mining design and so on.Keywords: three-dimensional geological modeling, geological database, geostatistics, block model
Procedia PDF Downloads 7715968 Application of Knowledge Discovery in Database Techniques in Cost Overruns of Construction Projects
Authors: Mai Ghazal, Ahmed Hammad
Abstract:
Cost overruns in construction projects are considered as worldwide challenges since the cost performance is one of the main measures of success along with schedule performance. To overcome this problem, studies were conducted to investigate the cost overruns' factors, also projects' historical data were analyzed to extract new and useful knowledge from it. This research is studying and analyzing the effect of some factors causing cost overruns using the historical data from completed construction projects. Then, using these factors to estimate the probability of cost overrun occurrence and predict its percentage for future projects. First, an intensive literature review was done to study all the factors that cause cost overrun in construction projects, then another review was done for previous researcher papers about mining process in dealing with cost overruns. Second, a proposed data warehouse was structured which can be used by organizations to store their future data in a well-organized way so it can be easily analyzed later. Third twelve quantitative factors which their data are frequently available at construction projects were selected to be the analyzed factors and suggested predictors for the proposed model.Keywords: construction management, construction projects, cost overrun, cost performance, data mining, data warehousing, knowledge discovery, knowledge management
Procedia PDF Downloads 37015967 Using Data Mining Technique for Scholarship Disbursement
Authors: J. K. Alhassan, S. A. Lawal
Abstract:
This work is on decision tree-based classification for the disbursement of scholarship. Tree-based data mining classification technique is used in other to determine the generic rule to be used to disburse the scholarship. The system based on the defined rules from the tree is able to determine the class (status) to which an applicant shall belong whether Granted or Not Granted. The applicants that fall to the class of granted denote a successful acquirement of scholarship while those in not granted class are unsuccessful in the scheme. An algorithm that can be used to classify the applicants based on the rules from tree-based classification was also developed. The tree-based classification is adopted because of its efficiency, effectiveness, and easy to comprehend features. The system was tested with the data of National Information Technology Development Agency (NITDA) Abuja, a Parastatal of Federal Ministry of Communication Technology that is mandated to develop and regulate information technology in Nigeria. The system was found working according to the specification. It is therefore recommended for all scholarship disbursement organizations.Keywords: classification, data mining, decision tree, scholarship
Procedia PDF Downloads 37515966 Assessing Carbon Stock and Sequestration of Reforestation Species on Old Mining Sites in Morocco Using the DNDC Model
Authors: Nabil Elkhatri, Mohamed Louay Metougui, Ngonidzashe Chirinda
Abstract:
Mining activities have left a legacy of degraded landscapes, prompting urgent efforts for ecological restoration. Reforestation holds promise as a potent tool to rehabilitate these old mining sites, with the potential to sequester carbon and contribute to climate change mitigation. This study focuses on evaluating the carbon stock and sequestration potential of reforestation species in the context of Morocco's mining areas, employing the DeNitrification-DeComposition (DNDC) model. The research is grounded in recognizing the need to connect theoretical models with practical implementation, ensuring that reforestation efforts are informed by accurate and context-specific data. Field data collection encompasses growth patterns, biomass accumulation, and carbon sequestration rates, establishing an empirical foundation for the study's analyses. By integrating the collected data with the DNDC model, the study aims to provide a comprehensive understanding of carbon dynamics within reforested ecosystems on old mining sites. The major findings reveal varying sequestration rates among different reforestation species, indicating the potential for species-specific optimization of reforestation strategies to enhance carbon capture. This research's significance lies in its potential to contribute to sustainable land management practices and climate change mitigation strategies. By quantifying the carbon stock and sequestration potential of reforestation species, the study serves as a valuable resource for policymakers, land managers, and practitioners involved in ecological restoration and carbon management. Ultimately, the study aligns with global objectives to rejuvenate degraded landscapes while addressing pressing climate challenges.Keywords: carbon stock, carbon sequestration, DNDC model, ecological restoration, mining sites, Morocco, reforestation, sustainable land management.
Procedia PDF Downloads 7615965 Integrating High-Performance Transport Modes into Transport Networks: A Multidimensional Impact Analysis
Authors: Sarah Pfoser, Lisa-Maria Putz, Thomas Berger
Abstract:
In the EU, the transport sector accounts for roughly one fourth of the total greenhouse gas emissions. In fact, the transport sector is one of the main contributors of greenhouse gas emissions. Climate protection targets aim to reduce the negative effects of greenhouse gas emissions (e.g. climate change, global warming) worldwide. Achieving a modal shift to foster environmentally friendly modes of transport such as rail and inland waterways is an important strategy to fulfill the climate protection targets. The present paper goes beyond these conventional transport modes and reflects upon currently emerging high-performance transport modes that yield the potential of complementing future transport systems in an efficient way. It will be defined which properties describe high-performance transport modes, which types of technology are included and what is their potential to contribute to a sustainable future transport network. The first step of this paper is to compile state-of-the-art information about high-performance transport modes to find out which technologies are currently emerging. A multidimensional impact analysis will be conducted afterwards to evaluate which of the technologies is most promising. This analysis will be performed from a spatial, social, economic and environmental perspective. Frequently used instruments such as cost-benefit analysis and SWOT analysis will be applied for the multidimensional assessment. The estimations for the analysis will be derived based on desktop research and discussions in an interdisciplinary team of researchers. For the purpose of this work, high-performance transport modes are characterized as transport modes with very fast and very high throughput connections that could act as efficient extension to the existing transport network. The recently proposed hyperloop system represents a potential high-performance transport mode which might be an innovative supplement for the current transport networks. The idea of hyperloops is that persons and freight are shipped in a tube at more than airline speed. Another innovative technology consists in drones for freight transport. Amazon already tests drones for their parcel shipments, they aim for delivery times of 30 minutes. Drones can, therefore, be considered as high-performance transport modes as well. The Trans-European Transport Networks program (TEN-T) addresses the expansion of transport grids in Europe and also includes high speed rail connections to better connect important European cities. These services should increase competitiveness of rail and are intended to replace aviation, which is known to be a polluting transport mode. In this sense, the integration of high-performance transport modes as described above facilitates the objectives of the TEN-T program. The results of the multidimensional impact analysis will reveal potential future effects of the integration of high-performance modes into transport networks. Building on that, a recommendation on the following (research) steps can be given which are necessary to ensure the most efficient implementation and integration processes.Keywords: drones, future transport networks, high performance transport modes, hyperloops, impact analysis
Procedia PDF Downloads 33215964 Using Textual Pre-Processing and Text Mining to Create Semantic Links
Authors: Ricardo Avila, Gabriel Lopes, Vania Vidal, Jose Macedo
Abstract:
This article offers a approach to the automatic discovery of semantic concepts and links in the domain of Oil Exploration and Production (E&P). Machine learning methods combined with textual pre-processing techniques were used to detect local patterns in texts and, thus, generate new concepts and new semantic links. Even using more specific vocabularies within the oil domain, our approach has achieved satisfactory results, suggesting that the proposal can be applied in other domains and languages, requiring only minor adjustments.Keywords: semantic links, data mining, linked data, SKOS
Procedia PDF Downloads 179