Search results for: Frequent itemset mining.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 721

Search results for: Frequent itemset mining.

61 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine

Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li

Abstract:

Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.

Keywords: Machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 878
60 Comparative Study of Seismic Isolation as Retrofit Method for Historical Constructions

Authors: Carlos H. Cuadra

Abstract:

Seismic isolation can be used as a retrofit method for historical buildings with the advantage that minimum intervention on super-structure is required. However, selection of isolation devices depends on weight and stiffness of upper structure. In this study, two buildings are considered for analyses to evaluate the applicability of this retrofitting methodology. Both buildings are located at Akita prefecture in the north part of Japan. One building is a wooden structure that corresponds to the old council meeting hall of Noshiro city. The second building is a brick masonry structure that was used as house of a foreign mining engineer and it is located at Ani town. Ambient vibration measurements were performed on both buildings to estimate their dynamic characteristics. Then, target period of vibration of isolated systems is selected as 3 seconds is selected to estimate required stiffness of isolation devices. For wooden structure, which is a light construction, it was found that natural rubber isolators in combination with friction bearings are suitable for seismic isolation. In case of masonry building elastomeric isolator can be used for its seismic isolation. Lumped mass systems are used for seismic response analysis and it is verified in both cases that seismic isolation can be used as retrofitting method of historical construction. However, in the case of the light building, most of the weight corresponds to the reinforced concrete slab that is required to install isolation devices.

Keywords: Historical building, finite element method, masonry structure, seismic isolation, wooden structure.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 663
59 Status of Thyroid Function and Iron Overload in Adolescents and Young Adults with Beta- Thalassemia Major Treated with Deferoxamine in Jordan

Authors: Fawzi Irshaid, Kamal Mansi

Abstract:

Thyroid dysfunction is one of the most frequently reported complications of chronic blood transfusion therapy in patients with beta-thalassemia major (BTM). However, the occurrence of thyroid dysfunction and its possible association with iron overload in BTM patients is still under debate. Therefore, this study aimed to investigate the status of thyroid functions and iron overload in adolescent and young adult patients with BTM in Jordan population. Thirty six BTM patients aged 12-28 years and matched controls were included in this study. All patients have been receiving frequent blood transfusion to maintain pretransfusion hemoglobin concentration above 10 g dl-1 and deferoxamine at a dose of 45 mg kg-1 day-1 (8 h, 5-7 days/week) by subcutaneous infusion. Blood samples were drawn from patients and controls. The status of thyroid functions and iron overload was evaluated by measurements of serum free thyroxine (FT4), triiodothyronine (FT3), thyrotropin (TSH) and serum ferritin level. A number of some hematological and biochemical parameters were also measured. It was found that hematocrit, serum ferritin, hemoglobin, FT3 and zinc, copper mean values were significantly higher in the patients than in the controls (P< 0.05). On other hand, leukocyte, FT4 and TSH mean values were similar to that of the controls. In addition, our data also indicated that all of the above examined parameters were not significantly affected by the patient-s age and gender. Deferoxamine approach for removing excess iron from our BTM patient did not normalize the values of serum ferritin, copper and zinc, suggesting poor compliance with deferoxamine chelation therapy. Thus, we recommend the use of a combination of deferoxamine and deferiprone to reduce the risk of excess of iron in our patients. Furthermore, thyroid dysfunction appears to be a rare complication, because our patients showed normal mean levels for serum TSH and FT4. However, high mean levels of serum ferritin, zinc, copper might be seen as potential risk factors for initiation and development of thyroid dysfunctions and other diseases. Therefore, further studies must be carried out at yearly intervals with large sample number, to detect subclinical thyroid dysfunction cases.

Keywords: beta-thalassemia major, deferoxamine, iron overload, triiodothyronine, zinc.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1774
58 Measuring the Structural Similarity of Web-based Documents: A Novel Approach

Authors: Matthias Dehmer, Frank Emmert Streib, Alexander Mehler, Jürgen Kilian

Abstract:

Most known methods for measuring the structural similarity of document structures are based on, e.g., tag measures, path metrics and tree measures in terms of their DOM-Trees. Other methods measures the similarity in the framework of the well known vector space model. In contrast to these we present a new approach to measuring the structural similarity of web-based documents represented by so called generalized trees which are more general than DOM-Trees which represent only directed rooted trees.We will design a new similarity measure for graphs representing web-based hypertext structures. Our similarity measure is mainly based on a novel representation of a graph as strings of linear integers, whose components represent structural properties of the graph. The similarity of two graphs is then defined as the optimal alignment of the underlying property strings. In this paper we apply the well known technique of sequence alignments to solve a novel and challenging problem: Measuring the structural similarity of generalized trees. More precisely, we first transform our graphs considered as high dimensional objects in linear structures. Then we derive similarity values from the alignments of the property strings in order to measure the structural similarity of generalized trees. Hence, we transform a graph similarity problem to a string similarity problem. We demonstrate that our similarity measure captures important structural information by applying it to two different test sets consisting of graphs representing web-based documents.

Keywords: Graph similarity, hierarchical and directed graphs, hypertext, generalized trees, web structure mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2517
57 Calibration of the Discrete Element Method Using a Large Shear Box

Authors: Corné J. Coetzee, Etienne Horn

Abstract:

One of the main challenges in using the Discrete Element Method (DEM) is to specify the correct input parameter values. In general, the models are sensitive to the input parameter values and accurate results can only be achieved if the correct values are specified. For the linear contact model, micro-parameters such as the particle density, stiffness, coefficient of friction, as well as the particle size and shape distributions are required. There is a need for a procedure to accurately calibrate these parameters before any attempt can be made to accurately model a complete bulk materials handling system. Since DEM is often used to model applications in the mining and quarrying industries, a calibration procedure was developed for materials that consist of relatively large (up to 40 mm in size) particles. A coarse crushed aggregate was used as the test material. Using a specially designed large shear box with a diameter of 590 mm, the confined Young’s modulus (bulk stiffness) and internal friction angle of the material were measured by means of the confined compression test and the direct shear test respectively. DEM models of the experimental setup were developed and the input parameter values were varied iteratively until a close correlation between the experimental and numerical results was achieved. The calibration process was validated by modelling the pull-out of an anchor from a bed of material. The model results compared well with experimental measurement.

Keywords: Discrete Element Method (DEM), calibration, shear box, anchor pull-out.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2610
56 Real-Time Data Stream Partitioning over a Sliding Window in Real-Time Spatial Big Data

Authors: Sana Hamdi, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, real-time spatial applications, like location-aware services and traffic monitoring, have become more and more important. Such applications result dynamic environments where data as well as queries are continuously moving. As a result, there is a tremendous amount of real-time spatial data generated every day. The growth of the data volume seems to outspeed the advance of our computing infrastructure. For instance, in real-time spatial Big Data, users expect to receive the results of each query within a short time period without holding in account the load of the system. But with a huge amount of real-time spatial data generated, the system performance degrades rapidly especially in overload situations. To solve this problem, we propose the use of data partitioning as an optimization technique. Traditional horizontal and vertical partitioning can increase the performance of the system and simplify data management. But they remain insufficient for real-time spatial Big data; they can’t deal with real-time and stream queries efficiently. Thus, in this paper, we propose a novel data partitioning approach for real-time spatial Big data named VPA-RTSBD (Vertical Partitioning Approach for Real-Time Spatial Big data). This contribution is an implementation of the Matching algorithm for traditional vertical partitioning. We find, firstly, the optimal attribute sequence by the use of Matching algorithm. Then, we propose a new cost model used for database partitioning, for keeping the data amount of each partition more balanced limit and for providing a parallel execution guarantees for the most frequent queries. VPA-RTSBD aims to obtain a real-time partitioning scheme and deals with stream data. It improves the performance of query execution by maximizing the degree of parallel execution. This affects QoS (Quality Of Service) improvement in real-time spatial Big Data especially with a huge volume of stream data. The performance of our contribution is evaluated via simulation experiments. The results show that the proposed algorithm is both efficient and scalable, and that it outperforms comparable algorithms.

Keywords: Real-Time Spatial Big Data, Quality Of Service, Vertical partitioning, Horizontal partitioning, Matching algorithm, Hamming distance, Stream query.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1004
55 Applying Theory of Inventive Problem Solving to Develop Innovative Solutions: A Case Study

Authors: Y. H. Wang, C. C. Hsieh

Abstract:

Good service design can increase organization revenue and consumer satisfaction while reducing labor and time costs. The problems facing consumers in the original serve model for eyewear and optical industry includes the following issues: 1. Insufficient information on eyewear products 2. Passively dependent on recommendations, insufficient selection 3. Incomplete records on progression of vision conditions 4. Lack of complete customer records. This study investigates the case of Kobayashi Optical, applying the Theory of Inventive Problem Solving (TRIZ) to develop innovative solutions for eyewear and optical industry. Analysis results raise the following conclusions and management implications: In order to provide customers with improved professional information and recommendations, Kobayashi Optical is suggested to establish customer purchasing records. Overall service efficiency can be enhanced by applying data mining techniques to analyze past consumer preferences and purchase histories. Furthermore, Kobayashi Optical should continue to develop a 3D virtual trial service which can allow customers for easy browsing of different frame styles and colors. This 3D virtual trial service will save customer waiting times in during peak service times at stores.

Keywords: Theory of inventive problem solving, service design, augmented reality, eyewear and optical industry.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1618
54 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches

Authors: Aya Salama

Abstract:

Digital Twin has emerged as a compelling research area, capturing the attention of scholars over the past decade. It finds applications across diverse fields, including smart manufacturing and healthcare, offering significant time and cost savings. Notably, it often intersects with other cutting-edge technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, the concept of a Human Digital Twin (HDT) is still in its infancy and requires further demonstration of its practicality. HDT takes the notion of Digital Twin a step further by extending it to living entities, notably humans, who are vastly different from inanimate physical objects. The primary objective of this research was to create an HDT capable of automating real-time human responses by simulating human behavior. To achieve this, the study delved into various areas, including clustering, supervised classification, topic extraction, and sentiment analysis. The paper successfully demonstrated the feasibility of HDT for generating personalized responses in social messaging applications. Notably, the proposed approach achieved an overall accuracy of 63%, a highly promising result that could pave the way for further exploration of the HDT concept. The methodology employed Random Forest for clustering the question database and matching new questions, while K-nearest neighbor was utilized for sentiment analysis.

Keywords: Human Digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification and clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 116
53 Relationship-Centred Care in Cross-Linguistic Medical Encounters

Authors: Nami Matsumoto

Abstract:

This study explores the experiences of cross-linguistic medical encounters by patients, and their views of receiving language support therein, with a particular focus on Japanese-English cases. The aim of this study is to investigate the reason for the frequent use of a spouse as a communication mediator from a Japanese perspective, through a comparison with that of English speakers. This study conducts an empirical qualitative analysis of the accounts of informants. A total of 31 informants who have experienced Japanese-English cross-linguistic medical encounters were recruited in Australia and Japan for semi-structured in-depth interviews. A breakdown of informants is 15 English speakers and 16 Japanese speakers. In order to obtain a further insight into collected data, additional interviews were held with 4 Australian doctors who are familiar with using interpreters. This study was approved by the Australian National University Human Research Ethics Committee, and written consent to participate in this study was obtained from all participants. The interviews lasted up to over one hour. They were audio-recorded and subsequently transcribed by the author. Japanese transcriptions were translated into English by the author. An analysis of interview data found that patients value relationship in communication. Particularly, Japanese informants, who have an English-speaking spouse, value trust-based communication interventions by their spouse, regardless of the language proficiency of the spouse. In Australia, health care interpreters are required to abide by the national code of ethics for interpreters. The Code defines the role of an interpreter exclusively to be language rendition and enshrines the tenets of accuracy, confidentiality and professional role boundaries. However, the analysis found that an interpreter who strictly complies with the Code sometimes fails to render the real intentions of the patient and their doctor. Findings from the study suggest that an interpreter should not be detached from the context and should be more engaged in the needs of patients. Their needs are not always communicated by an interpreter when they simply follow a professional code of ethics. The concept of relationship-centred care should be incorporated in the professional practice of health care interpreters.

Keywords: Health care, Japanese-English medical encounters, language barriers, trust.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1208
52 Pattern Discovery from Student Feedback: Identifying Factors to Improve Student Emotions in Learning

Authors: Angelina A. Tzacheva, Jaishree Ranganathan

Abstract:

Interest in (STEM) Science Technology Engineering Mathematics education especially Computer Science education has seen a drastic increase across the country. This fuels effort towards recruiting and admitting a diverse population of students. Thus the changing conditions in terms of the student population, diversity and the expected teaching and learning outcomes give the platform for use of Innovative Teaching models and technologies. It is necessary that these methods adapted should also concentrate on raising quality of such innovations and have positive impact on student learning. Light-Weight Team is an Active Learning Pedagogy, which is considered to be low-stake activity and has very little or no direct impact on student grades. Emotion plays a major role in student’s motivation to learning. In this work we use the student feedback data with emotion classification using surveys at a public research institution in the United States. We use Actionable Pattern Discovery method for this purpose. Actionable patterns are patterns that provide suggestions in the form of rules to help the user achieve better outcomes. The proposed method provides meaningful insight in terms of changes that can be incorporated in the Light-Weight team activities, resources utilized in the course. The results suggest how to enhance student emotions to a more positive state, in particular focuses on the emotions ‘Trust’ and ‘Joy’.

Keywords: Actionable pattern discovery, education, emotion, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 457
51 Evolutionary Approach for Automated Discovery of Censored Production Rules

Authors: Kamal K. Bharadwaj, Basheer M. Al-Maqaleh

Abstract:

In the recent past, there has been an increasing interest in applying evolutionary methods to Knowledge Discovery in Databases (KDD) and a number of successful applications of Genetic Algorithms (GA) and Genetic Programming (GP) to KDD have been demonstrated. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision. The Censored Production Rules (CPRs), an extension of PRs, were proposed by Michalski & Winston that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations, in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence are tight or there is simply no information available as to whether it holds or not. Thus, the 'If P Then D' part of the CPR expresses important information, while the Unless C part acts only as a switch and changes the polarity of D to ~D. This paper presents a classification algorithm based on evolutionary approach that discovers comprehensible rules with exceptions in the form of CPRs. The proposed approach has flexible chromosome encoding, where each chromosome corresponds to a CPR. Appropriate genetic operators are suggested and a fitness function is proposed that incorporates the basic constraints on CPRs. Experimental results are presented to demonstrate the performance of the proposed algorithm.

Keywords: Censored Production Rule, Data Mining, MachineLearning, Evolutionary Algorithms.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1830
50 Steady State Rolling and Dynamic Response of a Tire at Low Frequency

Authors: Md Monir Hossain, Anne Staples, Kuya Takami, Tomonari Furukawa

Abstract:

Tire noise has a significant impact on ride quality and vehicle interior comfort, even at low frequency. Reduction of tire noise is especially important due to strict state and federal environmental regulations. The primary sources of tire noise are the low frequency structure-borne noise and the noise that originates from the release of trapped air between the tire tread and road surface during each revolution of the tire. The frequency response of the tire changes at low and high frequency. At low frequency, the tension and bending moment become dominant, while the internal structure and local deformation become dominant at higher frequencies. Here, we analyze tire response in terms of deformation and rolling velocity at low revolution frequency. An Abaqus FEA finite element model is used to calculate the static and dynamic response of a rolling tire under different rolling conditions. The natural frequencies and mode shapes of a deformed tire are calculated with the FEA package where the subspace-based steady state dynamic analysis calculates dynamic response of tire subjected to harmonic excitation. The analysis was conducted on the dynamic response at the road (contact point of tire and road surface) and side nodes of a static and rolling tire when the tire was excited with 200 N vertical load for a frequency ranging from 20 to 200 Hz. The results show that frequency has little effect on tire deformation up to 80 Hz. But between 80 and 200 Hz, the radial and lateral components of displacement of the road and side nodes exhibited significant oscillation. For the static analysis, the fluctuation was sharp and frequent and decreased with frequency. In contrast, the fluctuation was periodic in nature for the dynamic response of the rolling tire. In addition to the dynamic analysis, a steady state rolling analysis was also performed on the tire traveling at ground velocity with a constant angular motion. The purpose of the computation was to demonstrate the effect of rotating motion on deformation and rolling velocity with respect to a fixed Newtonian reference point. The analysis showed a significant variation in deformation and rolling velocity due to centrifugal and Coriolis acceleration with respect to a fixed Newtonian point on ground.

Keywords: Natural frequency, rotational motion, steady state rolling, subspace-based steady state dynamic analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1264
49 An Automated Approach to the Nozzle Configuration of Polycrystalline Diamond Compact Drill Bits for Effective Cuttings Removal

Authors: R. Suresh, Pavan Kumar Nimmagadda, Ming Zo Tan, Shane Hart, Sharp Ugwuocha

Abstract:

Polycrystalline diamond compact (PDC) drill bits are extensively used in the oil and gas industry as well as the mining industry. Industry engineers continually improve upon PDC drill bit designs and hydraulic conditions. Optimized injection nozzles play a key role in improving the drilling performance and efficiency of these ever changing PDC drill bits. In the first part of this study, computational fluid dynamics (CFD) modelling is performed to investigate the hydrodynamic characteristics of drilling fluid flow around the PDC drill bit. An Open-source CFD software – OpenFOAM simulates the flow around the drill bit, based on the field input data. A specifically developed console application integrates the entire CFD process including, domain extraction, meshing, and solving governing equations and post-processing. The results from the OpenFOAM solver are then compared with that of the ANSYS Fluent software. The data from both software programs agree. The second part of the paper describes the parametric study of the PDC drill bit nozzle to determine the effect of parameters such as number of nozzles, nozzle velocity, nozzle radial position and orientations on the flow field characteristics and bit washing patterns. After analyzing a series of nozzle configurations, the best configuration is identified and recommendations are made for modifying the PDC bit design.

Keywords: ANSYS Fluent, computational fluid dynamics, nozzle configuration, OpenFOAM, PDC dill bit.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 906
48 A Growing Natural Gas Approach for Evaluating Quality of Software Modules

Authors: Parvinder S. Sandhu, Sandeep Khimta, Kiranpreet Kaur

Abstract:

The prediction of Software quality during development life cycle of software project helps the development organization to make efficient use of available resource to produce the product of highest quality. “Whether a module is faulty or not" approach can be used to predict quality of a software module. There are numbers of software quality prediction models described in the literature based upon genetic algorithms, artificial neural network and other data mining algorithms. One of the promising aspects for quality prediction is based on clustering techniques. Most quality prediction models that are based on clustering techniques make use of K-means, Mixture-of-Guassians, Self-Organizing Map, Neural Gas and fuzzy K-means algorithm for prediction. In all these techniques a predefined structure is required that is number of neurons or clusters should be known before we start clustering process. But in case of Growing Neural Gas there is no need of predetermining the quantity of neurons and the topology of the structure to be used and it starts with a minimal neurons structure that is incremented during training until it reaches a maximum number user defined limits for clusters. Hence, in this work we have used Growing Neural Gas as underlying cluster algorithm that produces the initial set of labeled cluster from training data set and thereafter this set of clusters is used to predict the quality of test data set of software modules. The best testing results shows 80% accuracy in evaluating the quality of software modules. Hence, the proposed technique can be used by programmers in evaluating the quality of modules during software development.

Keywords: Growing Neural Gas, data clustering, fault prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1817
47 Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores

Authors: Ankit Sinha, Soham Banerjee, Pratik Chattopadhyay

Abstract:

Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition. In this paper, we consider the problem of automatically identifying the classes of the products placed on racks in retail stores from an image of the rack and information about the query/product images. We improve upon the existing approaches in terms of effectiveness and memory requirement by developing a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer that detects the object regions in the rack image and a ResNet-18-based image encoder that classifies  the detected regions into the appropriate classes. Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model. This encoder is trained using a triplet loss function following the strategy of online-hard-negative-mining for improved prediction. The proposed models are lightweight and can be connected in an end-to-end manner during deployment to automatically identify each product object placed in a rack image. Extensive experiments using Grozi-32k and GP-180 data sets verify the effectiveness of the proposed model.

Keywords: Retail stores, Faster-RCNN, object localization, ResNet-18, triplet loss, data augmentation, product recognition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 476
46 SUPAR: System for User-Centric Profiling of Association Rules in Streaming Data

Authors: Sarabjeet Kaur Kochhar

Abstract:

With a surge of stream processing applications novel techniques are required for generation and analysis of association rules in streams. The traditional rule mining solutions cannot handle streams because they generally require multiple passes over the data and do not guarantee the results in a predictable, small time. Though researchers have been proposing algorithms for generation of rules from streams, there has not been much focus on their analysis. We propose Association rule profiling, a user centric process for analyzing association rules and attaching suitable profiles to them depending on their changing frequency behavior over a previous snapshot of time in a data stream. Association rule profiles provide insights into the changing nature of associations and can be used to characterize the associations. We discuss importance of characteristics such as predictability of linkages present in the data and propose metric to quantify it. We also show how association rule profiles can aid in generation of user specific, more understandable and actionable rules. The framework is implemented as SUPAR: System for Usercentric Profiling of Association Rules in streaming data. The proposed system offers following capabilities: i) Continuous monitoring of frequency of streaming item-sets and detection of significant changes therein for association rule profiling. ii) Computation of metrics for quantifying predictability of associations present in the data. iii) User-centric control of the characterization process: user can control the framework through a) constraint specification and b) non-interesting rule elimination.

Keywords: Data Streams, User subjectivity, Change detection, Association rule profiles, Predictability.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1415
45 Upgraded Rough Clustering and Outlier Detection Method on Yeast Dataset by Entropy Rough K-Means Method

Authors: P. Ashok, G. M. Kadhar Nawaz

Abstract:

Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.

Keywords: Clustering, Entropy, Outlier, Rough K-Means, validity index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1360
44 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1489
43 Using Data Mining in Automotive Safety

Authors: Carine Cridelich, Pablo Juesas Cano, Emmanuel Ramasso, Noureddine Zerhouni, Bernd Weiler

Abstract:

Safety is one of the most important considerations when buying a new car. While active safety aims at avoiding accidents, passive safety systems such as airbags and seat belts protect the occupant in case of an accident. In addition to legal regulations, organizations like Euro NCAP provide consumers with an independent assessment of the safety performance of cars and drive the development of safety systems in automobile industry. Those ratings are mainly based on injury assessment reference values derived from physical parameters measured in dummies during a car crash test. The components and sub-systems of a safety system are designed to achieve the required restraint performance. Sled tests and other types of tests are then carried out by car makers and their suppliers to confirm the protection level of the safety system. A Knowledge Discovery in Databases (KDD) process is proposed in order to minimize the number of tests. The KDD process is based on the data emerging from sled tests according to Euro NCAP specifications. About 30 parameters of the passive safety systems from different data sources (crash data, dummy protocol) are first analysed together with experts opinions. A procedure is proposed to manage missing data and validated on real data sets. Finally, a procedure is developed to estimate a set of rough initial parameters of the passive system before testing aiming at reducing the number of tests.

Keywords: KDD process, passive safety systems, sled test, dummy injury assessment reference values, frontal impact

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2790
42 Analyzing Factors Impacting COVID-19 Vaccination Rates

Authors: Dongseok Cho, Mitchell Driedger, Sera Han, Noman Khan, Mohammed Elmorsy, Mohamad El-Hajj

Abstract:

Since the approval of the COVID-19 vaccine in late 2020, vaccination rates have varied around the globe. Access to a vaccine supply, mandated vaccination policy, and vaccine hesitancy contribute to these rates. This study used COVID-19 vaccination data from Our World in Data and the Multilateral Leaders Task Force on COVID-19 to create two COVID-19 vaccination indices. The first index is the Vaccine Utilization Index (VUI), which measures how effectively each country has utilized its vaccine supply to doubly vaccinate its population. The second index is the Vaccination Acceleration Index (VAI), which evaluates how efficiently each country vaccinated their populations within their first 150 days. Pearson correlations were created between these indices and country indicators obtained from the World Bank. Results of these correlations identify countries with stronger Health indicators such as lower mortality rates, lower age-dependency ratios, and higher rates of immunization to other diseases display higher VUI and VAI scores than countries with lesser values. VAI scores are also positively correlated to Governance and Economic indicators, such as regulatory quality, control of corruption, and GDP per capita. As represented by the VUI, proper utilization of the COVID-19 vaccine supply by country is observed in countries that display excellence in health practices. A country’s motivation to accelerate its vaccination rates within the first 150 days of vaccinating, as represented by the VAI, was largely a product of the governing body’s effectiveness and economic status, as well as overall excellence in health practises.

Keywords: Data mining, Pearson Correlation, COVID-19, vaccination rates, hesitancy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 263
41 Comparative Study Using Weka for Red Blood Cells Classification

Authors: Jameela Ali Alkrimi, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy

Abstract:

Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifying the RBCs as normal or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithms tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital - Malaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively.

Keywords: K-Nearest Neighbors, Neural Network, Radial Basis Function, Red blood cells, Support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2931
40 An Architectural Study on the Railway Station Buildings in Malaysia during British Era, 1885-1957

Authors: Nor Hafizah Anuar, M. Gul Akdeniz

Abstract:

This paper attempted on emphasize on the station buildings façade elements. Station buildings were essential part of the transportation that reflected the technology. Comparative analysis on architectural styles will also be made between the railway station buildings of Malaysia and any railway station buildings which have similarities. The Malay Peninsula which is strategically situated between the Straits of Malacca and the South China Sea makes it an ideal location for trade. Malacca became an important trading port whereby merchants from around the world stopover to exchange various products. The Portuguese ruled Malacca for 130 years (1511–1641) and for the next century and a half (1641–1824), the Dutch endeavoured to maintain an economic monopoly along the coasts of Malaya. Malacca came permanently under British rule under the Anglo-Dutch Treaty, 1824. Up to Malaysian independence in 1957, Malaya saw a great influx of Chinese and Indian migrants as workers to support its growing industrial needs facilitated by the British. The growing tin ore mining and rubber industry resulted as the reason of the development of the railways as urgency to transport it from one place to another. The existence of railway transportation becomes more significant when the city started to bloom and the British started to build grandeur buildings that have different functions; administrative buildings, town and city halls, railway stations, public works department, courts, and post offices.

Keywords: Malaysia, railway station, architectural design, façade elements.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1595
39 Co-Disposal of Coal Ash with Mine Tailings in Surface Paste Disposal Practices: A Gold Mining Case Study

Authors: M. L. Dinis, M. C. Vila, A. Fiúza, A. Futuro, C. Nunes

Abstract:

The present paper describes the study of paste tailings prepared in laboratory using gold tailings, produced in a Finnish gold mine with the incorporation of coal ash. Natural leaching tests were conducted with the original materials (tailings, fly and bottom ashes) and also with paste mixtures that were prepared with different percentages of tailings and ashes. After leaching, the solid wastes were physically and chemically characterized and the results were compared to those selected as blank – the unleached samples. The tailings and the coal ash, as well as the prepared mixtures, were characterized, in addition to the textural parameters, by the following measurements: grain size distribution, chemical composition and pH. Mixtures were also tested in order to characterize their mechanical behavior by measuring the flexural strength, the compressive strength and the consistency. The original tailing samples presented an alkaline pH because during their processing they were previously submitted to pressure oxidation with destruction of the sulfides. Therefore, it was not possible to ascertain the effect of the coal ashes in the acid mine drainage. However, it was possible to verify that the paste reactivity was affected mostly by the bottom ash and that the tailings blended with bottom ash present lower mechanical strength than when blended with a combination of fly and bottom ash. Surface paste disposal offer an attractive alternative to traditional methods in addition to the environmental benefits of incorporating large-volume wastes (e.g. bottom ash). However, a comprehensive characterization of the paste mixtures is crucial to optimize paste design in order to enhance engineer and environmental properties.

Keywords: Coal ash, gold tailings, paste, surface disposal.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1391
38 The Quality of Fishery Product on the Moldovan Market, Regulations, National Institutions, Controls and Non-Compliant Products

Authors: Mihaela Munteanu (Pila), Silvius Stanciu

Abstract:

This paper presents the aspects of the official control of fishery in the Republic of Moldova. Currently, the regulations and the activity of national institutions with responsibilities in the field of food quality are in a process of harmonization with the European rules, aiming at European integration, quality improvement and providing a higher level of food safety. The National Agency for Food Safety is the main national body with responsibilities in the field of food safety. In the field of fishery products, the Agency carries out an intensive activity of informing the citizen and controlling the products marketed. The paper presents the dangers related to the consumption of fish and fishery products traded on the national market, the sanitary-veterinary inspections conducted by the profile institution and the improper situations identified. The national market of fishery products depends largely on imports, mainly focused on ocean fish. The research carried out has shown that during the period 2011-2018, following the inspections carried out on fishery products traded on the national market, a number of inconsistencies have been identified. Thus, indigenous products were frequently detected with sensory characteristics unfit for consumption, and being commercialized in inappropriate locations or contaminated with chemical pollutants. On import products controlled, the most frequent inconsistent situations have been represented by inconsistent sensory aspects and by parasite contamination. Taking into account the specific aspects of aquatic products, including the high level of alterability, special conditions of growth, marketing, culinary preparation and consumption are necessary in order to decrease the risk of disease over the population. Certificates, attestations and other documents certifying the quality of batches, completed by additional laboratory examinations, are necessary in order to increase the level of confidence on the quality of products marketed in the Republic. The implementation of various control procedures and mechanisms at national level, correlated with the focused activity of the specialized institutions, can decrease the risk of contamination and avoid cases of disease on the population due to the consumption of fishery products.

Keywords: Fishery products, food safety, insurance, inspection, Republic of Moldova.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 834
37 Discovery of Quantified Hierarchical Production Rules from Large Set of Discovered Rules

Authors: Tamanna Siddiqui, M. Afshar Alam

Abstract:

Automated discovery of Rule is, due to its applicability, one of the most fundamental and important method in KDD. It has been an active research area in the recent past. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form: Decision If < condition> Generality Specificity . HPRs systems are capable of handling taxonomical structures inherent in the knowledge about the real world. This paper focuses on the issue of mining Quantified rules with crisp hierarchical structure using Genetic Programming (GP) approach to knowledge discovery. The post-processing scheme presented in this work uses Quantified production rules as initial individuals of GP and discovers hierarchical structure. In proposed approach rules are quantified by using Dempster Shafer theory. Suitable genetic operators are proposed for the suggested encoding. Based on the Subsumption Matrix(SM), an appropriate fitness function is suggested. Finally, Quantified Hierarchical Production Rules (HPRs) are generated from the discovered hierarchy, using Dempster Shafer theory. Experimental results are presented to demonstrate the performance of the proposed algorithm.

Keywords: Knowledge discovery in database, quantification, dempster shafer theory, genetic programming, hierarchy, subsumption matrix.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1483
36 Management of Meskit (Prosopis juliflora) Tree in Oman: The Case of Using Meskit (Prosopis juliflora) Pods for Feeding Omani Sheep

Authors: S. Al-Khalasi, O. Mahgoub, H. Yaakub

Abstract:

This study evaluated the use of raw or processed Prosopis juliflora (Meskit) pods as a major ingredient in a formulated ration to provide an alternative non-conventional concentrate for livestock feeding in Oman. Dry Meskit pods were reduced to lengths of 0.5- 1.0 cm to ensure thorough mixing into three diets. Meskit pods were subjected to two types of treatments; roasting and soaking. They were roasted at 150оC for 30 minutes using a locally-made roasting device (40 kg barrel container rotated by electric motor and heated by flame gas cooker). Chopped pods were soaked in tap water for 24 hours and dried for 2 days under the sun with frequent turning. The Meskit-pod-based diets (MPBD) were formulated and pelleted from 500 g/kg ground Meskit pods, 240 g/kg wheat bran, 200 g/kg barley grain, 50 g/kg local dried sardines and 10 g/kg of salt. Twenty four 10 months-old intact Omani male lambs with average body weight of 27.3 kg (± 0.5 kg) were used in a feeding trial for 84 days. They were divided (on body weight basis) and allocated to four diet combination groups. These were: Rhodes grass hay (RGH) plus a general ruminant concentrate (GRC); RGH plus raw Meskit pods (RMP) based concentrate; RGH plus roasted Meskit pods (ROMP) based concentrate; RGH plus soaked Meskit pods (SMP) based concentrate Daily feed intakes and bi-weekly body weights were recorded. MPBD had higher contents of crude protein (CP), acid detergent fibre (ADF) and neutral detergent fibre (NDF) than the GRC. Animals fed various types of MPBD did not show signs of ill health. There was a significant effect of feeding ROMP on the performance of Omani sheep compared to RMP and SMP. The ROMP fed animals had similar performance to those fed the GRC in terms of feed intake, body weight gain and feed conversion ratio (FCR).This study indicated that roasted Meskit pods based diet may be used instead of the commercial concentrate for feeding Omani sheep without adverse effects on performance. It offers a cheap alternative source of protein and energy for feeding Omani sheep. Also, it might help in solving the spread impact of Meskit trees, maintain the ecosystem and helping in preserving the local tree species.

Keywords: Growth, Meskit, Omani sheep, Prosopis juliflora.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2725
35 Rare Earth Elements in Soils of Jharia Coal Field

Authors: R. E. Masto, L. C. Ram, S. K. Verma, V. A. Selvi, J. George, R. C. Tripathi, N. K. Srivastava, D. Mohanty, S. K.Jha, A. K. Sinha, A. Sinha

Abstract:

There are many sources trough which the soil get enriched and contaminated with REEs. The determination of REEs in environmental samples has been limited because of the lack of sensitive analytical techniques. Soil samples were collected from four sites including open cast coal mine, natural coal burning, coal washery and control in the coal field located in Dhanbad, India. Total concentrations of rare earth elements (REEs) were determined using the inductively coupled plasma atomic absorption spectrometry in order to assess enrichment status in the coal field. Results showed that the mean concentrations of La, Pr, Eu, Tb, Ho, and Tm in open cast mine and natural coal burning sites were elevated compared to the reference concentrations, while Ce, Nd, Sm, and Gd were elevated in coal washery site. When compared to reference soil, heavy REEs (HREEs) were enriched in open cast mines and natural coal burning affected soils, however, the HREEs were depleted in the coal washery sites. But, the Chondrite-normalization diagram showed significant enrichment for light REEs (LREEs) in all the soils. High concentration of Pr, Eu, Tb, Ho, Tm, and Lu in coal mining and coal burning sites may pose human health risks. Factor analysis showed that distribution and relative abundance of REEs of the coal washery site is comparable with the control. Eventually washing or cleaning of coal could significantly decrease the emission of REEs from coal into the environment.

Keywords: Rare earth elements, coal, soil, factor analysis

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2775
34 Exploring Influence Range of Tainan City Using Electronic Toll Collection Big Data

Authors: Chen Chou, Feng-Tyan Lin

Abstract:

Big Data has been attracted a lot of attentions in many fields for analyzing research issues based on a large number of maternal data. Electronic Toll Collection (ETC) is one of Intelligent Transportation System (ITS) applications in Taiwan, used to record starting point, end point, distance and travel time of vehicle on the national freeway. This study, taking advantage of ETC big data, combined with urban planning theory, attempts to explore various phenomena of inter-city transportation activities. ETC, one of government's open data, is numerous, complete and quick-update. One may recall that living area has been delimited with location, population, area and subjective consciousness. However, these factors cannot appropriately reflect what people’s movement path is in daily life. In this study, the concept of "Living Area" is replaced by "Influence Range" to show dynamic and variation with time and purposes of activities. This study uses data mining with Python and Excel, and visualizes the number of trips with GIS to explore influence range of Tainan city and the purpose of trips, and discuss living area delimited in current. It dialogues between the concepts of "Central Place Theory" and "Living Area", presents the new point of view, integrates the application of big data, urban planning and transportation. The finding will be valuable for resource allocation and land apportionment of spatial planning.

Keywords: Big Data, ITS, influence range, living area, central place theory, visualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 929
33 Privacy Concerns and Law Enforcement Data Collection to Tackle Domestic and Sexual Violence

Authors: Francesca Radice

Abstract:

It has been observed that violent or coercive behaviour has been apparent from initial conversations on dating apps like Tinder. Child pornography, stalking, and coercive control are some criminal offences from dating apps, including women murdered after finding partners through Tinder. Police databases and predictive policing are novel approaches taken to prevent crime before harm is done. This research will investigate how police databases can be used in a privacy-preserving way to characterise users in terms of their potential for violent crime. Using the COPS database of NSW Police, we will explore how the past criminal record can be interpreted to yield a category of potential danger for each dating app user. It is up to the judgement of each subscriber on what degree of the potential danger they are prepared to enter into. Sentiment analysis is an area where research into natural language processing has made great progress over the last decade. This research will investigate how sentiment analysis can be used to interpret interchanges between dating app users to detect manipulative or coercive sentiments. These can be used to alert law enforcement if continued for a defined number of communications. One of the potential problems of this approach is the potential prejudice a categorisation can cause. Another drawback is the possibility of misinterpreting communications and involving law enforcement without reason. The approach will be thoroughly tested with cross-checks by human readers who verify both the level of danger predicted by the interpretation of the criminal record and the sentiment detected from personal messages. Even if only a few violent crimes can be prevented, the approach will have a tangible value for real people.

Keywords: Sentiment Analysis, data mining, predictive policing, virtual manipulation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 89
32 Evaluation of Ensemble Classifiers for Intrusion Detection

Authors: M. Govindarajan

Abstract:

One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection. 

Keywords: Data mining, ensemble, radial basis function, support vector machine, accuracy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1647