Search results for: multidimensional process mining
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 16260

Search results for: multidimensional process mining

15750 Decision Making Communication in the Process of Technologies Commercialization: Archival Analysis of the Process Content

Authors: Vaida Zemlickiene

Abstract:

Scientists around the world and practitioners are working to identify the factors that influence the results of technology commercialization and to propose the ideal model for the technology commercialization process. In other words, all stakeholders of technology commercialization seek to find a formula or set of rules to succeed in commercializing technologies in order to avoid unproductive investments. In this article, the process of commercialization technology is understood as the process of transforming inventions into marketable products, services, and processes, or the path from the idea of using an invention to a product that incorporates process from 1 to 9 technology readiness level (TRL). There are many publications in the field of management literature, which are aimed at managing the commercialization process. However, there is an apparent lack of research for communication in decision-making in the process of technology commercialization. Works were done in the past, and the last decade's global research analysis led to the unambiguous conclusion that the methodological framework is not mature enough to be of practical use in business. The process of technology commercialization and the decisions made in the process should be explored in-depth. An archival analysis is performed to find insights into decision-making communication in the process of technologies commercialization, to find out the content of technology commercialization process: decision-making stages and participants, to analyze the internal factors of technology commercialization, to perform their critical analysis, to analyze the concept of successful/unsuccessful technology commercialization.

Keywords: the process of technology commercialization, communication in decision-making process, the content of technology commercialization process, successful/unsuccessful technology commercialization

Procedia PDF Downloads 153
15749 Voice of Customer: Mining Customers' Reviews on On-Line Car Community

Authors: Kim Dongwon, Yu Songjin

Abstract:

This study identifies the business value of VOC (Voice of Customer) on the business. Precisely, we intend to demonstrate how much negative and positive sentiment of VOC has an influence on car sales market share in the unites states. We extract 7 emotions such as sadness, shame, anger, fear, frustration, delight and satisfaction from the VOC data, 23,204 pieces of opinions, that had been posted on car-related on-line community from 2007 to 2009(a part of data collection from 2007 to 2015), and intend to clarify the correlation between negative and positive sentimental keywords and contribution to market share. In order to develop a lexicon for each category of negative and positive sentiment, we took advantage of Corpus program, Antconc 3.4.1.w and on-line sentimental data, SentiWordNet and identified the part of speech(POS) information of words in the customers' opinion by using a part-of-speech tagging function provided by TextAnalysisOnline. For the purpose of this present study, a total of 45,741 pieces of customers' opinions of 28 car manufacturing companies had been collected including titles and status information. We conducted an experiment to examine whether the inclusion, frequency and intensity of terms with negative and positive emotions in each category affect the adoption of customer opinions for vehicle organizations' market share. In the experiment, we statistically verified that there is correlation between customer ideas containing negative and positive emotions and variation of marker share. Particularly, "Anger," a domain of negative domains, is significantly influential to car sales market share. The domain "Delight" and "Satisfaction" increased in proportion to growth of market share.

Keywords: data mining, opinion mining, sentiment analysis, VOC

Procedia PDF Downloads 214
15748 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: classification, data mining, spam filtering, naive bayes, decision tree

Procedia PDF Downloads 411
15747 PitMod: The Lorax Pit Lake Hydrodynamic and Water Quality Model

Authors: Silvano Salvador, Maryam Zarrinderakht, Alan Martin

Abstract:

Open pits, which are the result of mining, are filled by water over time until the water reaches the elevation of the local water table and generates mine pit lakes. There are several specific regulations about the water quality of pit lakes, and mining operations should keep the quality of groundwater above pre-defined standards. Therefore, an accurate, acceptable numerical model predicting pit lakes’ water balance and water quality is needed in advance of mine excavation. We carry on analyzing and developing the model introduced by Crusius, Dunbar, et al. (2002) for pit lakes. This model, called “PitMod”, simulates the physical and geochemical evolution of pit lakes over time scales ranging from a few months up to a century or more. Here, a lake is approximated as one-dimensional, horizontally averaged vertical layers. PitMod calculates the time-dependent vertical distribution of physical and geochemical pit lake properties, like temperature, salinity, conductivity, pH, trace metals, and dissolved oxygen, within each model layer. This model considers the effect of pit morphology, climate data, multiple surface and subsurface (groundwater) inflows/outflows, precipitation/evaporation, surface ice formation/melting, vertical mixing due to surface wind stress, convection, background turbulence and equilibrium geochemistry using PHREEQC and linking that to the geochemical reactions. PitMod, which is used and validated in over 50 mines projects since 2002, incorporates physical processes like those found in other lake models such as DYRESM (Imerito 2007). However, unlike DYRESM PitMod also includes geochemical processes, pit wall runoff, and other effects. In addition, PitMod is actively under development and can be customized as required for a particular site.

Keywords: pit lakes, mining, modeling, hydrology

Procedia PDF Downloads 158
15746 The Clash between Environmental and Heritage Laws: An Australian Case Study

Authors: Andrew R. Beatty

Abstract:

The exploitation of Australia’s vast mineral wealth is regulated by a matrix of planning, environment and heritage legislation, and despite the desire for a ‘balance’ between economic, environmental and heritage values, Aboriginal objects and places are often detrimentally impacted by mining approvals. The Australian experience is not novel. There are other cases of clashes between the rights of traditional landowners and businesses seeking to exploit mineral or other resources on or beneath those lands, including in the United States, Canada, and Brazil. How one reconciles the rights of traditional owners with those of resource companies is an ongoing legal problem of general interest. In Australia, planning and environmental approvals for resource projects are ordinarily issued by State or Territory governments. Federal legislation such as the Aboriginal and Torres Strait Islander Heritage Protection Act 1984 (Cth) is intended to act as a safety net when State or Territory legislation is incapable of protecting Indigenous objects or places in the context of approvals for resource projects. This paper will analyse the context and effectiveness of legislation enacted to protect Indigenous heritage in the planning process. In particular, the paper will analyse how the statutory objects of such legislation need to be weighed against the statutory objects of competing legislation designed to facilitate and control resource exploitation. Using a current claim in the Federal Court of Australia for the protection of a culturally significant landscape as a case study, this paper will examine the challenges faced in ascribing value to cultural heritage within the wider context of environmental and planning laws. Our findings will reveal that there is an inherent difficulty in defining and weighing competing economic, environmental and heritage considerations. An alternative framework will be proposed to guide regulators towards making decisions that result in better protection of Indigenous heritage in the context of resource management.

Keywords: environmental law, heritage law, indigenous rights, mining

Procedia PDF Downloads 96
15745 Optimization of the Transfer Molding Process by Implementation of Online Monitoring Techniques for Electronic Packages

Authors: Burcu Kaya, Jan-Martin Kaiser, Karl-Friedrich Becker, Tanja Braun, Klaus-Dieter Lang

Abstract:

Quality of the molded packages is strongly influenced by the process parameters of the transfer molding. To achieve a better package quality and a stable transfer molding process, it is necessary to understand the influence of the process parameters on the package quality. This work aims to comprehend the relationship between the process parameters, and to identify the optimum process parameters for the transfer molding process in order to achieve less voids and wire sweep. To achieve this, a DoE is executed for process optimization and a regression analysis is carried out. A systematic approach is represented to generate models which enable an estimation of the number of voids and wire sweep. Validation experiments are conducted to verify the model and the results are presented.

Keywords: dielectric analysis, electronic packages, epoxy molding compounds, transfer molding process

Procedia PDF Downloads 382
15744 Optimal Performance of Plastic Extrusion Process Using Fuzzy Goal Programming

Authors: Abbas Al-Refaie

Abstract:

This study optimized the performance of plastic extrusion process of drip irrigation pipes using fuzzy goal programming. Two main responses were of main interest; roll thickness and hardness. Four main process factors were studied. The L18 array was then used for experimental design. The individual-moving range control charts were used to assess the stability of the process, while the process capability index was used to assess process performance. Confirmation experiments were conducted at the obtained combination of optimal factor setting by fuzzy goal programming. The results revealed that process capability was improved significantly from -1.129 to 0.8148 for roll thickness and from 0.0965 to 0.714 and hardness. Such improvement results in considerable savings in production and quality costs.

Keywords: fuzzy goal programming, extrusion process, process capability, irrigation plastic pipes

Procedia PDF Downloads 267
15743 Embedding Knowledge Management in Business Process

Authors: Paul Ihuoma Oluikpe

Abstract:

The purpose of this paper is to explore and highlight the process of creating value for strategy management by embedding knowledge management in the business process. Knowledge management can be seen from a three-dimensional perspective of content, connections and competencies. These dimensions can be embedded in the knowledge processes (create, capture, share, and apply) and operationalized within a business process to effectively create a scenario where knowledge can be focused on enabling a process and the process in turn generates outcomes. The application of knowledge management on business processes of organizations is rare and underreported. Few researches have explored this paradigm although researches have tended to reinforce the notion that competitive advantage sits within the internal aspects of the firm. Given this notion, it is surprising that knowledge management research and practice have not focused sufficiently on the business process which is the basic unit of organizational decision implementation. This research serves to generate understanding on applying KM in business process using a large multinational in Sub-Saharan Africa.

Keywords: knowledge management, business process, strategy, multinational

Procedia PDF Downloads 693
15742 Model for Introducing Products to New Customers through Decision Tree Using Algorithm C4.5 (J-48)

Authors: Komol Phaisarn, Anuphan Suttimarn, Vitchanan Keawtong, Kittisak Thongyoun, Chaiyos Jamsawang

Abstract:

This article is intended to analyze insurance information which contains information on the customer decision when purchasing life insurance pay package. The data were analyzed in order to present new customers with Life Insurance Perfect Pay package to meet new customers’ needs as much as possible. The basic data of insurance pay package were collect to get data mining; thus, reducing the scattering of information. The data were then classified in order to get decision model or decision tree using Algorithm C4.5 (J-48). In the classification, WEKA tools are used to form the model and testing datasets are used to test the decision tree for the accurate decision. The validation of this model in classifying showed that the accurate prediction was 68.43% while 31.25% were errors. The same set of data were then tested with other models, i.e. Naive Bayes and Zero R. The results showed that J-48 method could predict more accurately. So, the researcher applied the decision tree in writing the program used to introduce the product to new customers to persuade customers’ decision making in purchasing the insurance package that meets the new customers’ needs as much as possible.

Keywords: decision tree, data mining, customers, life insurance pay package

Procedia PDF Downloads 427
15741 Modeling of International Financial Integration: A Multicriteria Decision

Authors: Zouari Ezzeddine, Tarchoun Monaem

Abstract:

Despite the multiplicity of advanced approaches, the concept of financial integration couldn’t be an explicit analysis. Indeed, empirical studies appear that the measures of international financial integration are one-dimensional analyses. For the ambivalence of the concept and its multiple determinants, it must be analyzed in multidimensional level. The interest of this research is a proposal of a decision support by multicriteria approach for determining the positions of countries according to their international and financial dependencies links with the behavior of financial actors (trying to make governance decisions or diversification strategies of international portfolio ...

Keywords: financial integration, decision support, behavior, multicriteria approach, governance and diversification

Procedia PDF Downloads 526
15740 Optimization of Electrocoagulation Process Using Duelist Algorithm

Authors: Totok R. Biyanto, Arif T. Mardianto, M. Farid R. R., Luthfi Machmudi, kandi mulakasti

Abstract:

The main objective of this research is optimizing the electrocoagulation process design as a post-treatment for biologically vinasse effluent process. The first principle model with three independent variables that affect the energy consumption of electrocoagulation process i.e. current density, electrode distance, and time of treatment process are chosen as optimized variables. The process condition parameters were determined with the value of pH, electrical conductivity, and temperature of vinasse about 6.5, 28.5 mS/cm, 52 oC, respectively. Aluminum was chosen as the electrode material of electrocoagulation process. Duelist algorithm was used as optimization technique due to its capability to reach a global optimum. The optimization results show that the optimal process can be reached in the conditions of current density of 2.9976 A/m2, electrode distance of 1.5 cm and electrolysis time of 119 min. The optimized energy consumption during process is 34.02 Wh.

Keywords: optimization, vinasse effluent, electrocoagulation, energy consumption

Procedia PDF Downloads 469
15739 Improve Student Performance Prediction Using Majority Vote Ensemble Model for Higher Education

Authors: Wade Ghribi, Abdelmoty M. Ahmed, Ahmed Said Badawy, Belgacem Bouallegue

Abstract:

In higher education institutions, the most pressing priority is to improve student performance and retention. Large volumes of student data are used in Educational Data Mining techniques to find new hidden information from students' learning behavior, particularly to uncover the early symptom of at-risk pupils. On the other hand, data with noise, outliers, and irrelevant information may provide incorrect conclusions. By identifying features of students' data that have the potential to improve performance prediction results, comparing and identifying the most appropriate ensemble learning technique after preprocessing the data, and optimizing the hyperparameters, this paper aims to develop a reliable students' performance prediction model for Higher Education Institutions. Data was gathered from two different systems: a student information system and an e-learning system for undergraduate students in the College of Computer Science of a Saudi Arabian State University. The cases of 4413 students were used in this article. The process includes data collection, data integration, data preprocessing (such as cleaning, normalization, and transformation), feature selection, pattern extraction, and, finally, model optimization and assessment. Random Forest, Bagging, Stacking, Majority Vote, and two types of Boosting techniques, AdaBoost and XGBoost, are ensemble learning approaches, whereas Decision Tree, Support Vector Machine, and Artificial Neural Network are supervised learning techniques. Hyperparameters for ensemble learning systems will be fine-tuned to provide enhanced performance and optimal output. The findings imply that combining features of students' behavior from e-learning and students' information systems using Majority Vote produced better outcomes than the other ensemble techniques.

Keywords: educational data mining, student performance prediction, e-learning, classification, ensemble learning, higher education

Procedia PDF Downloads 107
15738 Reduction in Hot Metal Silicon through Statistical Analysis at G-Blast Furnace, Tata Steel Jamshedpur

Authors: Shoumodip Roy, Ankit Singhania, Santanu Mallick, Abhiram Jha, M. K. Agarwal, R. V. Ramna, Uttam Singh

Abstract:

The quality of hot metal at any blast furnace is judged by the silicon content in it. Lower hot metal silicon not only enhances process efficiency at steel melting shops but also reduces hot metal costs. The Hot metal produced at G-Blast furnace Tata Steel Jamshedpur has a significantly higher Si content than Benchmark Blast furnaces. The higher content of hot metal Si is mainly due to inferior raw material quality than those used in benchmark blast furnaces. With minimum control over raw material quality, the only option left to control hot metal Si is via optimizing the furnace parameters. Therefore, in order to identify the levers to reduce hot metal Si, Data mining was carried out, and multiple regression models were developed. The statistical analysis revealed that Slag B3{(CaO+MgO)/SiO2}, Slag Alumina and Hot metal temperature are key controllable parameters affecting hot metal silicon. Contour Plots were used to determine the optimum range of levels identified through statistical analysis. A trial plan was formulated to operate relevant parameters, at G blast furnace, in the identified range to reduce hot metal silicon. This paper details out the process followed and subsequent reduction in hot metal silicon by 15% at G blast furnace.

Keywords: blast furnace, optimization, silicon, statistical tools

Procedia PDF Downloads 223
15737 Progress in Accuracy, Reliability and Safety in Firedamp Detection

Authors: José Luis Lorenzo Bayona, Ljiljana Medic-Pejic, Isabel Amez Arenillas, Blanca Castells Somoza

Abstract:

The communication presents the study results carried out by the Official Laboratory J. M. Madariaga (LOM) of the Polytechnic University of Madrid to analyze the reliability of methane detection systems used in underground mining. Poor firedamp control in work can cause from production stoppages to fatal accidents and since there is currently a great variety of equipment with different functional characteristics, a study is needed to indicate which measurement principles have the highest degree of confidence. For the development of the project, a series of fixed, transportable and portable methane detectors with different measurement principles have been selected to subject them to laboratory tests following the methods described in the applicable regulations. The test equipment has been the one usually used in the certification and calibration of these devices, subject to the LOM quality system, and the tests have been carried out on detectors accessible in the market. The conclusions establish the main advantages and disadvantages of the equipment according to the measurement principle used; catalytic combustion, interferometry and infrared absorption.

Keywords: ATEX standards, gas detector, methane meter, mining safety

Procedia PDF Downloads 137
15736 Student Performance and Confidence Analysis on Education Virtual Environments through Different Assessment Strategies

Authors: Rubén Manrique, Delio Balcázar, José Parrado, Sebastián Rodríguez

Abstract:

Hand in hand with the evolution of technology, education systems have moved to virtual environments to provide increased coverage and facilitate the access to education. However, measuring student performance in virtual environments presents significant challenges to ensure students are acquiring the expected skills. In this study, the confidence and performance of engineering students in virtual environments is analyzed through different evaluation strategies. The effect of the assessment strategy in student confidence is identified using educational data mining techniques. Four assessment strategies were used. First, a conventional multiple choice test; second, a multiple choice test with feedback; third, a multiple choice test with a second chance; and fourth; a multiple choice test with feedback and second chance. Our results show that applying testing with online feedback strategies can influence positively student confidence.

Keywords: assessment strategies, educational data mining, student performance, student confidence

Procedia PDF Downloads 354
15735 Hierarchical Piecewise Linear Representation of Time Series Data

Authors: Vineetha Bettaiah, Heggere S. Ranganath

Abstract:

This paper presents a Hierarchical Piecewise Linear Approximation (HPLA) for the representation of time series data in which the time series is treated as a curve in the time-amplitude image space. The curve is partitioned into segments by choosing perceptually important points as break points. Each segment between adjacent break points is recursively partitioned into two segments at the best point or midpoint until the error between the approximating line and the original curve becomes less than a pre-specified threshold. The HPLA representation achieves dimensionality reduction while preserving prominent local features and general shape of time series. The representation permits course-fine processing at different levels of details, allows flexible definition of similarity based on mathematical measures or general time series shape, and supports time series data mining operations including query by content, clustering and classification based on whole or subsequence similarity.

Keywords: data mining, dimensionality reduction, piecewise linear representation, time series representation

Procedia PDF Downloads 275
15734 Data Analysis to Uncover Terrorist Attacks Using Data Mining Techniques

Authors: Saima Nazir, Mustansar Ali Ghazanfar, Sanay Muhammad Umar Saeed, Muhammad Awais Azam, Saad Ali Alahmari

Abstract:

Terrorism is an important and challenging concern. The entire world is threatened by only few sophisticated terrorist groups and especially in Gulf Region and Pakistan, it has become extremely destructive phenomena in recent years. Predicting the pattern of attack type, attack group and target type is an intricate task. This study offers new insight on terrorist group’s attack type and its chosen target. This research paper proposes a framework for prediction of terrorist attacks using the historical data and making an association between terrorist group, their attack type and target. Analysis shows that the number of attacks per year will keep on increasing, and Al-Harmayan in Saudi Arabia, Al-Qai’da in Gulf Region and Tehreek-e-Taliban in Pakistan will remain responsible for many future terrorist attacks. Top main targets of each group will be private citizen & property, police, government and military sector under constant circumstances.

Keywords: data mining, counter terrorism, machine learning, SVM

Procedia PDF Downloads 408
15733 Assessing Lithium Recovery from Secondary Sources

Authors: Carolina A. Santos, Alexandra B. Ribeiro

Abstract:

Climate change and environmental degradation are threats to humanity. Europe has been addressing these problems, namely through the Green Deal, with the use of batteries in mobility and energy fields. However, these require the use of critical raw materials, like lithium, which demand is estimated to grow 60 times in the next 30 years. Thus, it is fundamental to promote a circular economy with lithium recovery from secondary resources. These are nowadays key topics, which will be even more relevant in the future, so a new way to approach them is needed and must be encouraged. Therefore, one of our main goals is to analyse two methods of lithium retrieval from secondary sources, bioleaching, and electrodialysis, and assess them regarding their sustainability. The latest results show good efficiency of removal with both methods, even though there are some matrix interferences. Hence, further investment and research are needed in order to make this process sustainable and our society more circular.

Keywords: lithium, sustainable mining, social license to operate, bioleaching, electrodialysis

Procedia PDF Downloads 130
15732 Data-Driven Performance Evaluation of Surgical Doctors Based on Fuzzy Analytic Hierarchy Processes

Authors: Yuguang Gao, Qiang Yang, Yanpeng Zhang, Mingtao Deng

Abstract:

To enhance the safety, quality and efficiency of healthcare services provided by surgical doctors, we propose a comprehensive approach to the performance evaluation of individual doctors by incorporating insights from performance data as well as views of different stakeholders in the hospital. Exploratory factor analysis was first performed on collective multidimensional performance data of surgical doctors, where key factors were extracted that encompass assessment of professional experience and service performance. A two-level indicator system was then constructed, for which we developed a weighted interval-valued spherical fuzzy analytic hierarchy process to analyze the relative importance of the indicators while handling subjectivity and disparity in the decision-making of multiple parties involved. Our analytical results reveal that, for the key factors identified as instrumental for evaluating surgical doctors’ performance, the overall importance of clinical workload and complexity of service are valued more than capacity of service and professional experience, while the efficiency of resource consumption ranks comparatively the lowest in importance. We also provide a retrospective case study to illustrate the effectiveness and robustness of our quantitative evaluation model by assigning meaningful performance ratings to individual doctors based on the weights developed through our approach.

Keywords: analytic hierarchy processes, factor analysis, fuzzy logic, performance evaluation

Procedia PDF Downloads 58
15731 Remediation of Heavy Metal Contaminated Soil with Vivianite Nanoparticles

Authors: Shinen B., Bavor J., Dorjkhand B., Suvd B., Maitsetseg B.

Abstract:

A number of remediation techniques are available for the treatment of soils and sediments contaminated by heavy metals. However, some of these techniques are expensive and environmentally disruptive. Nanomaterials are used in the environment as environmental catalysts to convert toxic substances from water, soil, and sediment into environmentally benign compounds. This study was carried out to scrutinize the feasibility of vivianite nanoparticles for remediation of soils contaminated with heavy metals. Column experiments were performed in the laboratory to examine nanoparticle sequestration of metal in soil amended with vivianite nanoparticle suspension. The effect of environmental parameters such as temperature, pH and redox potential on metal leachability and bioavailability of soil amended with nanoparticle suspension was examined and compared with non-amended soils. The vivianite was effective in reducing the leachability of metals in soils. It is suggested that vivianite nanoparticles could be applied for the remediation of contaminated sites polluted by heavy metals due to mining activities, particularly in Mongolia, where mining industries have been developing rapidly in the last decade.

Keywords: bioavailability, heavy metals, nanoparticles, remediation

Procedia PDF Downloads 190
15730 Bayesian Inference for High Dimensional Dynamic Spatio-Temporal Models

Authors: Sofia M. Karadimitriou, Kostas Triantafyllopoulos, Timothy Heaton

Abstract:

Reduced dimension Dynamic Spatio-Temporal Models (DSTMs) jointly describe the spatial and temporal evolution of a function observed subject to noise. A basic state space model is adopted for the discrete temporal variation, while a continuous autoregressive structure describes the continuous spatial evolution. Application of such a DSTM relies upon the pre-selection of a suitable reduced set of basic functions and this can present a challenge in practice. In this talk, we propose an online estimation method for high dimensional spatio-temporal data based upon DSTM and we attempt to resolve this issue by allowing the basis to adapt to the observed data. Specifically, we present a wavelet decomposition in order to obtain a parsimonious approximation of the spatial continuous process. This parsimony can be achieved by placing a Laplace prior distribution on the wavelet coefficients. The aim of using the Laplace prior, is to filter wavelet coefficients with low contribution, and thus achieve the dimension reduction with significant computation savings. We then propose a Hierarchical Bayesian State Space model, for the estimation of which we offer an appropriate particle filter algorithm. The proposed methodology is illustrated using real environmental data.

Keywords: multidimensional Laplace prior, particle filtering, spatio-temporal modelling, wavelets

Procedia PDF Downloads 427
15729 RA-Apriori: An Efficient and Faster MapReduce-Based Algorithm for Frequent Itemset Mining on Apache Flink

Authors: Sanjay Rathee, Arti Kashyap

Abstract:

Extraction of useful information from large datasets is one of the most important research problems. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent patterns) is most important part of the association rule mining. There exist many algorithms to find frequent patterns but Apriori algorithm always remains a preferred choice due to its ease of implementation and natural tendency to be parallelized. Many single-machine based Apriori variants exist but massive amount of data available these days is above capacity of a single machine. Therefore, to meet the demands of this ever-growing huge data, there is a need of multiple machines based Apriori algorithm. For these types of distributed applications, MapReduce is a popular fault-tolerant framework. Hadoop is one of the best open-source software frameworks with MapReduce approach for distributed storage and distributed processing of huge datasets using clusters built from commodity hardware. However, heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce-based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted a lot of attention because of their inbuilt support to distributed computations. Earlier we proposed a reduced- Apriori algorithm on Spark platform which outperforms parallel Apriori, one because of use of Spark and secondly because of the improvement we proposed in standard Apriori. Therefore, this work is a natural sequel of our work and targets on implementing, testing and benchmarking Apriori and Reduced-Apriori and our new algorithm ReducedAll-Apriori on Apache Flink and compares it with Spark implementation. Flink, a streaming dataflow engine, overcomes disk I/O bottlenecks in MapReduce, providing an ideal platform for distributed Apriori. Flink's pipelining based structure allows starting a next iteration as soon as partial results of earlier iteration are available. Therefore, there is no need to wait for all reducers result to start a next iteration. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of the Apriori and RA-Apriori algorithm on Flink.

Keywords: apriori, apache flink, Mapreduce, spark, Hadoop, R-Apriori, frequent itemset mining

Procedia PDF Downloads 294
15728 Research on the Landscape of Xi'an Ancient City Based on the Poetry Text of Tang Dynasty

Authors: Zou Yihui

Abstract:

The integration of the traditional landscape of the ancient city and the poet's emotions and symbolization into ancient poetry is the unique cultural gene and spiritual core of the historical city, and re-understanding the historical landscape pattern from the poetry is conducive to continuing the historical city context and improving the current situation of the gradual decline of the poetry of the modern historical urban landscape. Starting from Tang poetry uses semantic analysis methods、combined with text mining technology, entry mining, word frequency analysis, and cluster analysis of the landscape information of Tang Chang'an City were carried out, and the method framework for analyzing the urban landscape form based on poetry text was constructed. Nearly 160 poems describing the landscape of Tang Chang'an City were screened, and the poetic landscape characteristics of Tang Chang'an City were sorted out locally in order to combine with modern urban spatial development to continue the urban spatial context.

Keywords: Tang Chang'an City, poetic texts, semantic analysis, historical landscape

Procedia PDF Downloads 63
15727 Delineating Concern Ground in Block Caving – Underground Mine Using Ground Penetrating Radar

Authors: Eric Sitorus, Septian Prahastudhi, Turgod Nainggolan, Erwin Riyanto

Abstract:

Mining by block or panel caving is a mining method that takes advantage of fractures within an ore body, coupled with gravity, to extract material from a predetermined column of ore. The caving column is weakened from beneath through the use of undercutting, after which the ore breaks up and is extracted from below in a continuous cycle. The nature of this method induces cyclical stresses on the pillars of excavations as stress is built up and released over time, which has a detrimental effect on both the installed ground support and the rock mass itself. Ground support capacity, especially on the production where excavation void ratio is highest, is subjected to heavy loading. Strain above threshold of the elongation of support capacity can yield resulting in damage to excavations. Geotechnical engineers must evaluate not only the remnant capacity of ground support systems but also investigate depth of rock mass yield within pillars, backs and floors. Ground Penetrating Radar (GPR) is a geophysical method that has the ability to evaluate rock mass damage using electromagnetic waves. This paper illustrates a case study from the Grasberg mining complex where non-invasive information on the depth of damage and condition of the remaining rock mass was required. GPR with 100 MHz antenna resolution was used to obtain images of the subsurface to determine rehabilitation requirements prior to recommencing production activities. The GPR surveys were used to calibrate the reflection coefficient response of varying rock mass conditions to known Rock Quality Designation (RQD) parameters observed at the mine. The calibrated GPR survey allowed site engineers to map subsurface conditions and plan rehabilitation accordingly.

Keywords: block caving, ground penetrating radar, reflectivity, RQD

Procedia PDF Downloads 134
15726 Social Media and Internet Celebrity for Social Commerce Intentional and Behavioral Recommendations

Authors: Shu-Hsien Liao, Yao-Hsuan Yang

Abstract:

Social media is a virtual community and online platform that people use to create, share, and exchange opinions/experiences. Internet celebrities are people who become famous on the Internet, increasing their popularity through their social networking or video websites. Social commerce (s-ecommerce) is the combination of social relations and commercial transaction activities. The combination of social media and Internet celebrities is an emerging model for the development of s-ecommerce. With recent advances in system sciences, recommendation systems are gradually moving to develop intentional and behavioral recommendations. This background leads to the research issues regarding digital and social media in enterprises. Thus, this study implements data mining analytics, including clustering analysis and association rules, to investigate Taiwanese users (n=2,102) to investigate social media and Internet celebrities’ preferences to find knowledge profiles/patterns/rules for s-ecommerce intentional and behavioral recommendations.

Keywords: social media, internet celebrity, social commerce (s-ecommerce), data mining analytics, intentional and behavioral recommendations

Procedia PDF Downloads 30
15725 Medical Knowledge Management since the Integration of Heterogeneous Data until the Knowledge Exploitation in a Decision-Making System

Authors: Nadjat Zerf Boudjettou, Fahima Nader, Rachid Chalal

Abstract:

Knowledge management is to acquire and represent knowledge relevant to a domain, a task or a specific organization in order to facilitate access, reuse and evolution. This usually means building, maintaining and evolving an explicit representation of knowledge. The next step is to provide access to that knowledge, that is to say, the spread in order to enable effective use. Knowledge management in the medical field aims to improve the performance of the medical organization by allowing individuals in the care facility (doctors, nurses, paramedics, etc.) to capture, share and apply collective knowledge in order to make optimal decisions in real time. In this paper, we propose a knowledge management approach based on integration technique of heterogeneous data in the medical field by creating a data warehouse, a technique of extracting knowledge from medical data by choosing a technique of data mining, and finally an exploitation technique of that knowledge in a case-based reasoning system.

Keywords: data warehouse, data mining, knowledge discovery in database, KDD, medical knowledge management, Bayesian networks

Procedia PDF Downloads 395
15724 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar

Abstract:

Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 316
15723 A Near-Optimal Domain Independent Approach for Detecting Approximate Duplicates

Authors: Abdelaziz Fellah, Allaoua Maamir

Abstract:

We propose a domain-independent merging-cluster filter approach complemented with a set of algorithms for identifying approximate duplicate entities efficiently and accurately within a single and across multiple data sources. The near-optimal merging-cluster filter (MCF) approach is based on the Monge-Elkan well-tuned algorithm and extended with an affine variant of the Smith-Waterman similarity measure. Then we present constant, variable, and function threshold algorithms that work conceptually in a divide-merge filtering fashion for detecting near duplicates as hierarchical clusters along with their corresponding representatives. The algorithms take recursive refinement approaches in the spirit of filtering, merging, and updating, cluster representatives to detect approximate duplicates at each level of the cluster tree. Experiments show a high effectiveness and accuracy of the MCF approach in detecting approximate duplicates by outperforming the seminal Monge-Elkan’s algorithm on several real-world benchmarks and generated datasets.

Keywords: data mining, data cleaning, approximate duplicates, near-duplicates detection, data mining applications and discovery

Procedia PDF Downloads 387
15722 Synergy Effect of Energy and Water Saving in China's Energy Sectors: A Multi-Objective Optimization Analysis

Authors: Yi Jin, Xu Tang, Cuiyang Feng

Abstract:

The ‘11th five-year’ and ‘12th five-year’ plans have clearly put forward to strictly control the total amount and intensity of energy and water consumption. The synergy effect of energy and water has rarely been considered in the process of energy and water saving in China, where its contribution cannot be maximized. Energy sectors consume large amounts of energy and water when producing massive energy, which makes them both energy and water intensive. Therefore, the synergy effect in these sectors is significant. This paper assesses and optimizes the synergy effect in three energy sectors under the background of promoting energy and water saving. Results show that: From the perspective of critical path, chemical industry, mining and processing of non-metal ores and smelting and pressing of metals are coupling points in the process of energy and water flowing to energy sectors, in which the implementation of energy and water saving policies can bring significant synergy effect. Multi-objective optimization shows that increasing efforts on input restructuring can effectively improve synergy effects; relatively large synergetic energy saving and little water saving are obtained after solely reducing the energy and water intensity of coupling sectors. By optimizing the input structure of sectors, especially the coupling sectors, the synergy effect of energy and water saving can be improved in energy sectors under the premise of keeping economy running stably.

Keywords: critical path, energy sector, multi-objective optimization, synergy effect, water

Procedia PDF Downloads 360
15721 Fuzzy Expert Approach for Risk Mitigation on Functional Urban Areas Affected by Anthropogenic Ground Movements

Authors: Agnieszka A. Malinowska, R. Hejmanowski

Abstract:

A number of European cities are strongly affected by ground movements caused by anthropogenic activities or post-anthropogenic metamorphosis. Those are mainly water pumping, current mining operation, the collapse of post-mining underground voids or mining-induced earthquakes. These activities lead to large and small-scale ground displacements and a ground ruptures. The ground movements occurring in urban areas could considerably affect stability and safety of structures and infrastructures. The complexity of the ground deformation phenomenon in relation to the structures and infrastructures vulnerability leads to considerable constraints in assessing the threat of those objects. However, the increase of access to the free software and satellite data could pave the way for developing new methods and strategies for environmental risk mitigation and management. Open source geographical information systems (OS GIS), may support data integration, management, and risk analysis. Lately, developed methods based on fuzzy logic and experts methods for buildings and infrastructure damage risk assessment could be integrated into OS GIS. Those methods were verified base on back analysis proving their accuracy. Moreover, those methods could be supported by ground displacement observation. Based on freely available data from European Space Agency and free software, ground deformation could be estimated. The main innovation presented in the paper is the application of open source software (OS GIS) for integration developed models and assessment of the threat of urban areas. Those approaches will be reinforced by analysis of ground movement based on free satellite data. Those data would support the verification of ground movement prediction models. Moreover, satellite data will enable our mapping of ground deformation in urbanized areas. Developed models and methods have been implemented in one of the urban areas hazarded by underground mining activity. Vulnerability maps supported by satellite ground movement observation would mitigate the hazards of land displacements in urban areas close to mines.

Keywords: fuzzy logic, open source geographic information science (OS GIS), risk assessment on urbanized areas, satellite interferometry (InSAR)

Procedia PDF Downloads 159