Search results for: sequence mining
68 Weighted-Distance Sliding Windows and Cooccurrence Graphs for Supporting Entity-Relationship Discovery in Unstructured Text
Authors: Paolo Fantozzi, Luigi Laura, Umberto Nanni
Abstract:
The problem of Entity relation discovery in structured data, a well covered topic in literature, consists in searching within unstructured sources (typically, text) in order to find connections among entities. These can be a whole dictionary, or a specific collection of named items. In many cases machine learning and/or text mining techniques are used for this goal. These approaches might be unfeasible in computationally challenging problems, such as processing massive data streams. A faster approach consists in collecting the cooccurrences of any two words (entities) in order to create a graph of relations - a cooccurrence graph. Indeed each cooccurrence highlights some grade of semantic correlation between the words because it is more common to have related words close each other than having them in the opposite sides of the text. Some authors have used sliding windows for such problem: they count all the occurrences within a sliding windows running over the whole text. In this paper we generalise such technique, coming up to a Weighted-Distance Sliding Window, where each occurrence of two named items within the window is accounted with a weight depending on the distance between items: a closer distance implies a stronger evidence of a relationship. We develop an experiment in order to support this intuition, by applying this technique to a data set consisting in the text of the Bible, split into verses.Keywords: Cooccurrence graph, entity relation graph, unstructured text, weighted distance.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 68467 Breast Cancer Survivability Prediction via Classifier Ensemble
Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia
Abstract:
This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.Keywords: Classifier ensemble, breast cancer survivability, data mining, SEER.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 167166 Multi-Level Air Quality Classification in China Using Information Gain and Support Vector Machine
Authors: Bingchun Liu, Pei-Chann Chang, Natasha Huang, Dun Li
Abstract:
Machine Learning and Data Mining are the two important tools for extracting useful information and knowledge from large datasets. In machine learning, classification is a wildly used technique to predict qualitative variables and is generally preferred over regression from an operational point of view. Due to the enormous increase in air pollution in various countries especially China, Air Quality Classification has become one of the most important topics in air quality research and modelling. This study aims at introducing a hybrid classification model based on information theory and Support Vector Machine (SVM) using the air quality data of four cities in China namely Beijing, Guangzhou, Shanghai and Tianjin from Jan 1, 2014 to April 30, 2016. China's Ministry of Environmental Protection has classified the daily air quality into 6 levels namely Serious Pollution, Severe Pollution, Moderate Pollution, Light Pollution, Good and Excellent based on their respective Air Quality Index (AQI) values. Using the information theory, information gain (IG) is calculated and feature selection is done for both categorical features and continuous numeric features. Then SVM Machine Learning algorithm is implemented on the selected features with cross-validation. The final evaluation reveals that the IG and SVM hybrid model performs better than SVM (alone), Artificial Neural Network (ANN) and K-Nearest Neighbours (KNN) models in terms of accuracy as well as complexity.
Keywords: Machine learning, air quality classification, air quality index, information gain, support vector machine, cross-validation.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 94865 Comparative Study of Seismic Isolation as Retrofit Method for Historical Constructions
Authors: Carlos H. Cuadra
Abstract:
Seismic isolation can be used as a retrofit method for historical buildings with the advantage that minimum intervention on super-structure is required. However, selection of isolation devices depends on weight and stiffness of upper structure. In this study, two buildings are considered for analyses to evaluate the applicability of this retrofitting methodology. Both buildings are located at Akita prefecture in the north part of Japan. One building is a wooden structure that corresponds to the old council meeting hall of Noshiro city. The second building is a brick masonry structure that was used as house of a foreign mining engineer and it is located at Ani town. Ambient vibration measurements were performed on both buildings to estimate their dynamic characteristics. Then, target period of vibration of isolated systems is selected as 3 seconds is selected to estimate required stiffness of isolation devices. For wooden structure, which is a light construction, it was found that natural rubber isolators in combination with friction bearings are suitable for seismic isolation. In case of masonry building elastomeric isolator can be used for its seismic isolation. Lumped mass systems are used for seismic response analysis and it is verified in both cases that seismic isolation can be used as retrofitting method of historical construction. However, in the case of the light building, most of the weight corresponds to the reinforced concrete slab that is required to install isolation devices.
Keywords: Historical building, finite element method, masonry structure, seismic isolation, wooden structure.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 72564 Calibration of the Discrete Element Method Using a Large Shear Box
Authors: Corné J. Coetzee, Etienne Horn
Abstract:
One of the main challenges in using the Discrete Element Method (DEM) is to specify the correct input parameter values. In general, the models are sensitive to the input parameter values and accurate results can only be achieved if the correct values are specified. For the linear contact model, micro-parameters such as the particle density, stiffness, coefficient of friction, as well as the particle size and shape distributions are required. There is a need for a procedure to accurately calibrate these parameters before any attempt can be made to accurately model a complete bulk materials handling system. Since DEM is often used to model applications in the mining and quarrying industries, a calibration procedure was developed for materials that consist of relatively large (up to 40 mm in size) particles. A coarse crushed aggregate was used as the test material. Using a specially designed large shear box with a diameter of 590 mm, the confined Young’s modulus (bulk stiffness) and internal friction angle of the material were measured by means of the confined compression test and the direct shear test respectively. DEM models of the experimental setup were developed and the input parameter values were varied iteratively until a close correlation between the experimental and numerical results was achieved. The calibration process was validated by modelling the pull-out of an anchor from a bed of material. The model results compared well with experimental measurement.Keywords: Discrete Element Method (DEM), calibration, shear box, anchor pull-out.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 267263 Applying Theory of Inventive Problem Solving to Develop Innovative Solutions: A Case Study
Authors: Y. H. Wang, C. C. Hsieh
Abstract:
Good service design can increase organization revenue and consumer satisfaction while reducing labor and time costs. The problems facing consumers in the original serve model for eyewear and optical industry includes the following issues: 1. Insufficient information on eyewear products 2. Passively dependent on recommendations, insufficient selection 3. Incomplete records on progression of vision conditions 4. Lack of complete customer records. This study investigates the case of Kobayashi Optical, applying the Theory of Inventive Problem Solving (TRIZ) to develop innovative solutions for eyewear and optical industry. Analysis results raise the following conclusions and management implications: In order to provide customers with improved professional information and recommendations, Kobayashi Optical is suggested to establish customer purchasing records. Overall service efficiency can be enhanced by applying data mining techniques to analyze past consumer preferences and purchase histories. Furthermore, Kobayashi Optical should continue to develop a 3D virtual trial service which can allow customers for easy browsing of different frame styles and colors. This 3D virtual trial service will save customer waiting times in during peak service times at stores.Keywords: Theory of inventive problem solving, service design, augmented reality, eyewear and optical industry.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 167262 Human Digital Twin for Personal Conversation Automation Using Supervised Machine Learning Approaches
Authors: Aya Salama
Abstract:
Digital Twin has emerged as a compelling research area, capturing the attention of scholars over the past decade. It finds applications across diverse fields, including smart manufacturing and healthcare, offering significant time and cost savings. Notably, it often intersects with other cutting-edge technologies such as Data Mining, Artificial Intelligence, and Machine Learning. However, the concept of a Human Digital Twin (HDT) is still in its infancy and requires further demonstration of its practicality. HDT takes the notion of Digital Twin a step further by extending it to living entities, notably humans, who are vastly different from inanimate physical objects. The primary objective of this research was to create an HDT capable of automating real-time human responses by simulating human behavior. To achieve this, the study delved into various areas, including clustering, supervised classification, topic extraction, and sentiment analysis. The paper successfully demonstrated the feasibility of HDT for generating personalized responses in social messaging applications. Notably, the proposed approach achieved an overall accuracy of 63%, a highly promising result that could pave the way for further exploration of the HDT concept. The methodology employed Random Forest for clustering the question database and matching new questions, while K-nearest neighbor was utilized for sentiment analysis.
Keywords: Human Digital twin, sentiment analysis, topic extraction, supervised machine learning, unsupervised machine learning, classification and clustering.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 18861 The Magnetic Susceptibility of the Late Quaternary Loess in North-East of Iran and Its Correlation with Other Palaeoclimatical Parameters
Authors: Fereshteh M. Haskouei, Habib Alimohammadian
Abstract:
Magnetic susceptibility (χ) is operational to identify of late quaternary glacial-interglacial cycles in loess-paleosol sequences. It is well accepted that many loess-paleosol sequences bear witness to cold-dry/warm-humid periods, well known as glacial-interglacial cycles, respectively. For this study, loess-paleosol sequence of north-east of Iran was magnetically investigated. The study area is situated at about 8 km away of Neka city, on the main road of Sari-Behshahr, in Mazandaran Province, north of Iran. The youngest deposits of study area are the late Quaternary wind-blown accumulations. In this study, the total number of 117 samples was collected from loess-paleosols units. After that, the natural remnant magnetization (NRM) and magnetic susceptibility (MS) of the samples were measured. Variation of MS of more than 110 loess samples was plotted to reveal the correlation of the MS and paleoclimatic changes. This study aims reconstruction of climatic changes (glacial-interglacial and stadials-interstadials cycles). To confirm our results we compared MS (χ) and the curves of other investigations in paleoclimatology. This correspondence abled us to recognize worldly events in the study area such as: Younger Dryas, the Last Glacial Maximum (LGM), deglaciation of Northern Hemisphere etc. The obtained magnetic data indicate that during almost 50 ka, at least two glacial-interglacial periods occurred in north-east of Iran. Further, variation of χ values revealed short period of climatically cycles known as stadials-interstadials. We recognized 4 stadials and a single stadial as colder sub-periods for S0 (recently soil-paleosol) and S2 (lower paleosol), respectively, Moreover, we recognized 6 warmer sub-periods (interstadials) for L1 (upper loess) and one interstadial L2 (lower loess).
Keywords: Glacial-interglacial cycles, Iran, last glacial maximum, loess, magnetic susceptibility (χ), Neka, Stadials-Interstadials sub-periods, younger dryas.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 64160 Pattern Discovery from Student Feedback: Identifying Factors to Improve Student Emotions in Learning
Authors: Angelina A. Tzacheva, Jaishree Ranganathan
Abstract:
Interest in (STEM) Science Technology Engineering Mathematics education especially Computer Science education has seen a drastic increase across the country. This fuels effort towards recruiting and admitting a diverse population of students. Thus the changing conditions in terms of the student population, diversity and the expected teaching and learning outcomes give the platform for use of Innovative Teaching models and technologies. It is necessary that these methods adapted should also concentrate on raising quality of such innovations and have positive impact on student learning. Light-Weight Team is an Active Learning Pedagogy, which is considered to be low-stake activity and has very little or no direct impact on student grades. Emotion plays a major role in student’s motivation to learning. In this work we use the student feedback data with emotion classification using surveys at a public research institution in the United States. We use Actionable Pattern Discovery method for this purpose. Actionable patterns are patterns that provide suggestions in the form of rules to help the user achieve better outcomes. The proposed method provides meaningful insight in terms of changes that can be incorporated in the Light-Weight team activities, resources utilized in the course. The results suggest how to enhance student emotions to a more positive state, in particular focuses on the emotions ‘Trust’ and ‘Joy’.Keywords: Actionable pattern discovery, education, emotion, data mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 52659 Assessment of Wastewater Reuse Potential for an Enamel Coating Industry
Authors: Guclu Insel, Efe Gumuslu, Gulten Yuksek, Nilay Sayi Ucar, Emine Ubay Cokgor, Tugba Olmez Hanci, Didem Okutman Tas, Fatos Germirli Babuna, Derya Firat Ertem, Okmen Yildirim, Ozge Erturan, Betul Kirci
Abstract:
In order to eliminate water scarcity problems, effective precautions must be taken. Growing competition for water is increasingly forcing facilities to tackle their own water scarcity problems. At this point, application of wastewater reclamation and reuse results in considerable economic advantageous. In this study, an enamel coating facility, which is one of the high water consumed facilities, is evaluated in terms of its wastewater reuse potential. Wastewater reclamation and reuse can be defined as one of the best available techniques for this sector. Hence, process and pollution profiles together with detailed characterization of segregated wastewater sources are appraised in a way to find out the recoverable effluent streams arising from enamel coating operations. Daily, 170 m3 of process water is required and 160 m3 of wastewater is generated. The segregated streams generated by two enamel coating processes are characterized in terms of conventional parameters. Relatively clean segregated wastewater streams (reusable wastewaters) are separately collected and experimental treatability studies are conducted on it. The results reflected that the reusable wastewater fraction has an approximate amount of 110 m3/day that accounts for 68% of the total wastewaters. The need for treatment applicable on reusable wastewaters is determined by considering water quality requirements of various operations and characterization of reusable wastewater streams. Ultra-filtration (UF), Nano-filtration (NF) and Reverse Osmosis (RO) membranes are subsequently applied on reusable effluent fraction. Adequate organic matter removal is not obtained with the mentioned treatment sequence.Keywords: enamel coating, membrane, reuse, wastewater
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 149058 Evolutionary Approach for Automated Discovery of Censored Production Rules
Authors: Kamal K. Bharadwaj, Basheer M. Al-Maqaleh
Abstract:
In the recent past, there has been an increasing interest in applying evolutionary methods to Knowledge Discovery in Databases (KDD) and a number of successful applications of Genetic Algorithms (GA) and Genetic Programming (GP) to KDD have been demonstrated. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. The PRs, however, are unable to handle exceptions and do not exhibit variable precision. The Censored Production Rules (CPRs), an extension of PRs, were proposed by Michalski & Winston that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations, in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence are tight or there is simply no information available as to whether it holds or not. Thus, the 'If P Then D' part of the CPR expresses important information, while the Unless C part acts only as a switch and changes the polarity of D to ~D. This paper presents a classification algorithm based on evolutionary approach that discovers comprehensible rules with exceptions in the form of CPRs. The proposed approach has flexible chromosome encoding, where each chromosome corresponds to a CPR. Appropriate genetic operators are suggested and a fitness function is proposed that incorporates the basic constraints on CPRs. Experimental results are presented to demonstrate the performance of the proposed algorithm.Keywords: Censored Production Rule, Data Mining, MachineLearning, Evolutionary Algorithms.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 188157 The Genesis of the Anomalous Sernio Fan, Valtellina, Northern Italy
Authors: E. De Finis, P. Gattinoni, L. Scesi
Abstract:
Massive rock avalanches formed some of the largest landslide deposits on Earth and they represent one of the major geohazards in high-relief mountains. This paper interprets a very large sedimentary fan (the Sernio fan, Valtellina, Northern Italy), located 20 Km SW from Val Pola Rock avalanche (1987), as the deposit of a partial collapse of a Deep Seated Gravitational Slope Deformation (DSGSD), afterwards eroded and buried by debris flows. The proposed emplacement sequence has been reconstructed based on geomorphological, structural and mechanical evidences. The Sernio fan is actually considered anomalous with reference to the very high ratio between the fan area (≈ 4.5km2) and the basin area (≈ 3km2). The morphology of the fan area is characterised by steep slopes (dip ≈ 20%) and the fan apex is extended for 1.8 km inside the small catchment basin. This sedimentary fan was originated by a landslide that interested a part of a large deep-seated gravitational slope deformation, involving a wide area of about 55 km². The main controlling factor is tectonic and it is related to the proximity to regional fault systems and the consequent occurrence of fault weak rocks (GSI locally lower than 10 with compressive stress lower than 20MPa). Moreover, the fan deposit shows sedimentary evidences of recent debris flow events. The best current explanation of the Sernio fan involves an initial failure of some hundreds of Mm3. The run-out was quite limited because of the morphology of Valtellina’s valley floor, and the deposit filled the main valley forming a landslide dam, as confirmed by the lacustrine deposits detected upstream the fan. Nowadays the debris flow events represent the main hazard in the study area.Keywords: Anomalous sedimentary fans, debris flow, deep seated gravitational slope deformation, Italy, rock avalanche.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 175156 An Automated Approach to the Nozzle Configuration of Polycrystalline Diamond Compact Drill Bits for Effective Cuttings Removal
Authors: R. Suresh, Pavan Kumar Nimmagadda, Ming Zo Tan, Shane Hart, Sharp Ugwuocha
Abstract:
Polycrystalline diamond compact (PDC) drill bits are extensively used in the oil and gas industry as well as the mining industry. Industry engineers continually improve upon PDC drill bit designs and hydraulic conditions. Optimized injection nozzles play a key role in improving the drilling performance and efficiency of these ever changing PDC drill bits. In the first part of this study, computational fluid dynamics (CFD) modelling is performed to investigate the hydrodynamic characteristics of drilling fluid flow around the PDC drill bit. An Open-source CFD software – OpenFOAM simulates the flow around the drill bit, based on the field input data. A specifically developed console application integrates the entire CFD process including, domain extraction, meshing, and solving governing equations and post-processing. The results from the OpenFOAM solver are then compared with that of the ANSYS Fluent software. The data from both software programs agree. The second part of the paper describes the parametric study of the PDC drill bit nozzle to determine the effect of parameters such as number of nozzles, nozzle velocity, nozzle radial position and orientations on the flow field characteristics and bit washing patterns. After analyzing a series of nozzle configurations, the best configuration is identified and recommendations are made for modifying the PDC bit design.
Keywords: ANSYS Fluent, computational fluid dynamics, nozzle configuration, OpenFOAM, PDC dill bit.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 98655 A Growing Natural Gas Approach for Evaluating Quality of Software Modules
Authors: Parvinder S. Sandhu, Sandeep Khimta, Kiranpreet Kaur
Abstract:
The prediction of Software quality during development life cycle of software project helps the development organization to make efficient use of available resource to produce the product of highest quality. “Whether a module is faulty or not" approach can be used to predict quality of a software module. There are numbers of software quality prediction models described in the literature based upon genetic algorithms, artificial neural network and other data mining algorithms. One of the promising aspects for quality prediction is based on clustering techniques. Most quality prediction models that are based on clustering techniques make use of K-means, Mixture-of-Guassians, Self-Organizing Map, Neural Gas and fuzzy K-means algorithm for prediction. In all these techniques a predefined structure is required that is number of neurons or clusters should be known before we start clustering process. But in case of Growing Neural Gas there is no need of predetermining the quantity of neurons and the topology of the structure to be used and it starts with a minimal neurons structure that is incremented during training until it reaches a maximum number user defined limits for clusters. Hence, in this work we have used Growing Neural Gas as underlying cluster algorithm that produces the initial set of labeled cluster from training data set and thereafter this set of clusters is used to predict the quality of test data set of software modules. The best testing results shows 80% accuracy in evaluating the quality of software modules. Hence, the proposed technique can be used by programmers in evaluating the quality of modules during software development.
Keywords: Growing Neural Gas, data clustering, fault prediction.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 186554 Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores
Authors: Ankit Sinha, Soham Banerjee, Pratik Chattopadhyay
Abstract:
Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition. In this paper, we consider the problem of automatically identifying the classes of the products placed on racks in retail stores from an image of the rack and information about the query/product images. We improve upon the existing approaches in terms of effectiveness and memory requirement by developing a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer that detects the object regions in the rack image and a ResNet-18-based image encoder that classifies the detected regions into the appropriate classes. Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model. This encoder is trained using a triplet loss function following the strategy of online-hard-negative-mining for improved prediction. The proposed models are lightweight and can be connected in an end-to-end manner during deployment to automatically identify each product object placed in a rack image. Extensive experiments using Grozi-32k and GP-180 data sets verify the effectiveness of the proposed model.
Keywords: Retail stores, Faster-RCNN, object localization, ResNet-18, triplet loss, data augmentation, product recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 58353 SUPAR: System for User-Centric Profiling of Association Rules in Streaming Data
Authors: Sarabjeet Kaur Kochhar
Abstract:
With a surge of stream processing applications novel techniques are required for generation and analysis of association rules in streams. The traditional rule mining solutions cannot handle streams because they generally require multiple passes over the data and do not guarantee the results in a predictable, small time. Though researchers have been proposing algorithms for generation of rules from streams, there has not been much focus on their analysis. We propose Association rule profiling, a user centric process for analyzing association rules and attaching suitable profiles to them depending on their changing frequency behavior over a previous snapshot of time in a data stream. Association rule profiles provide insights into the changing nature of associations and can be used to characterize the associations. We discuss importance of characteristics such as predictability of linkages present in the data and propose metric to quantify it. We also show how association rule profiles can aid in generation of user specific, more understandable and actionable rules. The framework is implemented as SUPAR: System for Usercentric Profiling of Association Rules in streaming data. The proposed system offers following capabilities: i) Continuous monitoring of frequency of streaming item-sets and detection of significant changes therein for association rule profiling. ii) Computation of metrics for quantifying predictability of associations present in the data. iii) User-centric control of the characterization process: user can control the framework through a) constraint specification and b) non-interesting rule elimination.Keywords: Data Streams, User subjectivity, Change detection, Association rule profiles, Predictability.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 145852 Upgraded Rough Clustering and Outlier Detection Method on Yeast Dataset by Entropy Rough K-Means Method
Authors: P. Ashok, G. M. Kadhar Nawaz
Abstract:
Rough set theory is used to handle uncertainty and incomplete information by applying two accurate sets, Lower approximation and Upper approximation. In this paper, the rough clustering algorithms are improved by adopting the Similarity, Dissimilarity–Similarity and Entropy based initial centroids selection method on three different clustering algorithms namely Entropy based Rough K-Means (ERKM), Similarity based Rough K-Means (SRKM) and Dissimilarity-Similarity based Rough K-Means (DSRKM) were developed and executed by yeast dataset. The rough clustering algorithms are validated by cluster validity indexes namely Rand and Adjusted Rand indexes. An experimental result shows that the ERKM clustering algorithm perform effectively and delivers better results than other clustering methods. Outlier detection is an important task in data mining and very much different from the rest of the objects in the clusters. Entropy based Rough Outlier Factor (EROF) method is seemly to detect outlier effectively for yeast dataset. In rough K-Means method, by tuning the epsilon (ᶓ) value from 0.8 to 1.08 can detect outliers on boundary region and the RKM algorithm delivers better results, when choosing the value of epsilon (ᶓ) in the specified range. An experimental result shows that the EROF method on clustering algorithm performed very well and suitable for detecting outlier effectively for all datasets. Further, experimental readings show that the ERKM clustering method outperformed the other methods.
Keywords: Clustering, Entropy, Outlier, Rough K-Means, validity index.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 141251 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach
Authors: Rajvir Kaur, Jeewani Anupama Ginige
Abstract:
With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.Keywords: Artificial neural networks, breast cancer, cancer dataset, classifiers, cervical cancer, F-score, logistic regression, machine learning, precision, recall, support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 155350 Toward Indoor and Outdoor Surveillance Using an Improved Fast Background Subtraction Algorithm
Authors: A. El Harraj, N. Raissouni
Abstract:
The detection of moving objects from a video image sequences is very important for object tracking, activity recognition, and behavior understanding in video surveillance. The most used approach for moving objects detection / tracking is background subtraction algorithms. Many approaches have been suggested for background subtraction. But, these are illumination change sensitive and the solutions proposed to bypass this problem are time consuming. In this paper, we propose a robust yet computationally efficient background subtraction approach and, mainly, focus on the ability to detect moving objects on dynamic scenes, for possible applications in complex and restricted access areas monitoring, where moving and motionless persons must be reliably detected. It consists of three main phases, establishing illumination changes invariance, background/foreground modeling and morphological analysis for noise removing. We handle illumination changes using Contrast Limited Histogram Equalization (CLAHE), which limits the intensity of each pixel to user determined maximum. Thus, it mitigates the degradation due to scene illumination changes and improves the visibility of the video signal. Initially, the background and foreground images are extracted from the video sequence. Then, the background and foreground images are separately enhanced by applying CLAHE. In order to form multi-modal backgrounds we model each channel of a pixel as a mixture of K Gaussians (K=5) using Gaussian Mixture Model (GMM). Finally, we post process the resulting binary foreground mask using morphological erosion and dilation transformations to remove possible noise. For experimental test, we used a standard dataset to challenge the efficiency and accuracy of the proposed method on a diverse set of dynamic scenes.
Keywords: Video surveillance, background subtraction, Contrast Limited Histogram Equalization, illumination invariance, object tracking, object detection, behavior understanding, dynamic scenes.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 208749 Using Data Mining in Automotive Safety
Authors: Carine Cridelich, Pablo Juesas Cano, Emmanuel Ramasso, Noureddine Zerhouni, Bernd Weiler
Abstract:
Safety is one of the most important considerations when buying a new car. While active safety aims at avoiding accidents, passive safety systems such as airbags and seat belts protect the occupant in case of an accident. In addition to legal regulations, organizations like Euro NCAP provide consumers with an independent assessment of the safety performance of cars and drive the development of safety systems in automobile industry. Those ratings are mainly based on injury assessment reference values derived from physical parameters measured in dummies during a car crash test. The components and sub-systems of a safety system are designed to achieve the required restraint performance. Sled tests and other types of tests are then carried out by car makers and their suppliers to confirm the protection level of the safety system. A Knowledge Discovery in Databases (KDD) process is proposed in order to minimize the number of tests. The KDD process is based on the data emerging from sled tests according to Euro NCAP specifications. About 30 parameters of the passive safety systems from different data sources (crash data, dummy protocol) are first analysed together with experts opinions. A procedure is proposed to manage missing data and validated on real data sets. Finally, a procedure is developed to estimate a set of rough initial parameters of the passive system before testing aiming at reducing the number of tests.
Keywords: KDD process, passive safety systems, sled test, dummy injury assessment reference values, frontal impact
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 284448 Analyzing Factors Impacting COVID-19 Vaccination Rates
Authors: Dongseok Cho, Mitchell Driedger, Sera Han, Noman Khan, Mohammed Elmorsy, Mohamad El-Hajj
Abstract:
Since the approval of the COVID-19 vaccine in late 2020, vaccination rates have varied around the globe. Access to a vaccine supply, mandated vaccination policy, and vaccine hesitancy contribute to these rates. This study used COVID-19 vaccination data from Our World in Data and the Multilateral Leaders Task Force on COVID-19 to create two COVID-19 vaccination indices. The first index is the Vaccine Utilization Index (VUI), which measures how effectively each country has utilized its vaccine supply to doubly vaccinate its population. The second index is the Vaccination Acceleration Index (VAI), which evaluates how efficiently each country vaccinated their populations within their first 150 days. Pearson correlations were created between these indices and country indicators obtained from the World Bank. Results of these correlations identify countries with stronger Health indicators such as lower mortality rates, lower age-dependency ratios, and higher rates of immunization to other diseases display higher VUI and VAI scores than countries with lesser values. VAI scores are also positively correlated to Governance and Economic indicators, such as regulatory quality, control of corruption, and GDP per capita. As represented by the VUI, proper utilization of the COVID-19 vaccine supply by country is observed in countries that display excellence in health practices. A country’s motivation to accelerate its vaccination rates within the first 150 days of vaccinating, as represented by the VAI, was largely a product of the governing body’s effectiveness and economic status, as well as overall excellence in health practises.
Keywords: Data mining, Pearson Correlation, COVID-19, vaccination rates, hesitancy.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 34547 Comparative Study Using Weka for Red Blood Cells Classification
Authors: Jameela Ali Alkrimi, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy
Abstract:
Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifying the RBCs as normal or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithms tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital - Malaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively.
Keywords: K-Nearest Neighbors, Neural Network, Radial Basis Function, Red blood cells, Support vector machine.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 299546 An Architectural Study on the Railway Station Buildings in Malaysia during British Era, 1885-1957
Authors: Nor Hafizah Anuar, M. Gul Akdeniz
Abstract:
This paper attempted on emphasize on the station buildings façade elements. Station buildings were essential part of the transportation that reflected the technology. Comparative analysis on architectural styles will also be made between the railway station buildings of Malaysia and any railway station buildings which have similarities. The Malay Peninsula which is strategically situated between the Straits of Malacca and the South China Sea makes it an ideal location for trade. Malacca became an important trading port whereby merchants from around the world stopover to exchange various products. The Portuguese ruled Malacca for 130 years (1511–1641) and for the next century and a half (1641–1824), the Dutch endeavoured to maintain an economic monopoly along the coasts of Malaya. Malacca came permanently under British rule under the Anglo-Dutch Treaty, 1824. Up to Malaysian independence in 1957, Malaya saw a great influx of Chinese and Indian migrants as workers to support its growing industrial needs facilitated by the British. The growing tin ore mining and rubber industry resulted as the reason of the development of the railways as urgency to transport it from one place to another. The existence of railway transportation becomes more significant when the city started to bloom and the British started to build grandeur buildings that have different functions; administrative buildings, town and city halls, railway stations, public works department, courts, and post offices.
Keywords: Malaysia, railway station, architectural design, façade elements.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 166345 Co-Disposal of Coal Ash with Mine Tailings in Surface Paste Disposal Practices: A Gold Mining Case Study
Authors: M. L. Dinis, M. C. Vila, A. Fiúza, A. Futuro, C. Nunes
Abstract:
The present paper describes the study of paste tailings prepared in laboratory using gold tailings, produced in a Finnish gold mine with the incorporation of coal ash. Natural leaching tests were conducted with the original materials (tailings, fly and bottom ashes) and also with paste mixtures that were prepared with different percentages of tailings and ashes. After leaching, the solid wastes were physically and chemically characterized and the results were compared to those selected as blank – the unleached samples. The tailings and the coal ash, as well as the prepared mixtures, were characterized, in addition to the textural parameters, by the following measurements: grain size distribution, chemical composition and pH. Mixtures were also tested in order to characterize their mechanical behavior by measuring the flexural strength, the compressive strength and the consistency. The original tailing samples presented an alkaline pH because during their processing they were previously submitted to pressure oxidation with destruction of the sulfides. Therefore, it was not possible to ascertain the effect of the coal ashes in the acid mine drainage. However, it was possible to verify that the paste reactivity was affected mostly by the bottom ash and that the tailings blended with bottom ash present lower mechanical strength than when blended with a combination of fly and bottom ash. Surface paste disposal offer an attractive alternative to traditional methods in addition to the environmental benefits of incorporating large-volume wastes (e.g. bottom ash). However, a comprehensive characterization of the paste mixtures is crucial to optimize paste design in order to enhance engineer and environmental properties.Keywords: Coal ash, gold tailings, paste, surface disposal.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 144344 Systematic Identification and Quantification of Substrate Specificity Determinants in Human Protein Kinases
Authors: Manuel A. Alonso-Tarajano, Roberto Mosca, Patrick Aloy
Abstract:
Protein kinases participate in a myriad of cellular processes of major biomedical interest. The in vivo substrate specificity of these enzymes is a process determined by several factors, and despite several years of research on the topic, is still far from being totally understood. In the present work, we have quantified the contributions to the kinase substrate specificity of i) the phosphorylation sites and their surrounding residues in the sequence and of ii) the association of kinases to adaptor or scaffold proteins. We have used position-specific scoring matrices (PSSMs), to represent the stretches of sequences phosphorylated by 93 families of kinases. We have found negative correlations between the number of sequences from which a PSSM is generated and the statistical significance and the performance of that PSSM. Using a subset of 22 statistically significant PSSMs, we have identified specificity determinant residues (SDRs) for 86% of the corresponding kinase families. Our results suggest that different SDRs can function as positive or negative elements of substrate recognition by the different families of kinases. Additionally, we have found that human proteins with known function as adaptors or scaffolds (kAS) tend to interact with a significantly large fraction of the substrates of the kinases to which they associate. Based on this characteristic we have identified a set of 279 potential adaptors/scaffolds (pAS) for human kinases, which is enriched in Pfam domains and functional terms tightly related to the proposed function. Moreover, our results show that for 74.6% of the kinase–pAS association found, the pAS colocalize with the substrates of the kinases they are associated to. Finally, we have found evidence suggesting that the association of kinases to adaptors and scaffolds, may contribute significantly to diminish the in vivo substrate crossed-specificity of protein kinases. In general, our results indicate the relevance of several SDRs for both the positive and negative selection of phosphorylation sites by kinase families and also suggest that the association of kinases to pAS proteins may be an important factor for the localization of the enzymes with their set of substrates.
Keywords: Kinase, phosphorylation, substrate specificity, adaptors, scaffolds, cellular colocalization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 153543 The Antibacterial and Anticancer Activity of Marine Actinomycete Strain HP411 Isolated in the Northern Coast of Vietnam
Authors: Huyen T. Pham, Nhue P. Nguyen, Tien Q. Phi, Phuong T. Dang, Hy G. Le
Abstract:
Since the marine environmental conditions are extremely different from the other ones, marine actinomycetes might produce novel bioactive compounds. Therefore, actinomycete strains were screened from marine water and sediment samples collected from the coastal areas of Northern Vietnam. Ninety-nine actinomycete strains were obtained on starch-casein agar media by dilution technique, only seven strains, named HP112, HP12, HP411, HPN11, HP 11, HPT13 and HPX12, showed significant antibacterial activity against both gram-positive and gram-negative bacteria (Bacillus subtilis ATCC 6633, Staphylococcus epidemidis ATCC 12228, Escherichia coli ATCC 11105). Further studies were carried out with the most active HP411 strain against Candida albicans ATCC 10231. This strain could grow rapidly on starch casein agar and other media with high salt containing 7-10% NaCl at 28-30oC. Spore-chain of HP411 showed an elongated and circular shape with 10 to 30 spores/chain. Identification of the strain was carried out by employing the taxonomical studies including the 16S rRNA sequence. Based on phylogenetic and phenotypic evidence it is proposed that HP411 to be belongs to species Streptomyces variabilis. The potent of the crude extract of fermentation broth of HP411 that are effective against wide range of pathogens: both grampositive, gram-negative and fungi. Further studies revealed that the crude extract HP411 could obtain the anticancer activity for cancer cell lines: Hep-G2 (liver cancer cell line); RD (cardiac and skeletal muscle letters cell line); FL (membrane of the uterus cancer cell line). However, the actinomycetes from marine ecosystem will be useful for the discovery of new drugs in the future.
Keywords: Marine actinomycetes, antibacterial, anticancer, Streptomyces variabilis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 361342 Discovery of Quantified Hierarchical Production Rules from Large Set of Discovered Rules
Authors: Tamanna Siddiqui, M. Afshar Alam
Abstract:
Automated discovery of Rule is, due to its applicability, one of the most fundamental and important method in KDD. It has been an active research area in the recent past. Hierarchical representation allows us to easily manage the complexity of knowledge, to view the knowledge at different levels of details, and to focus our attention on the interesting aspects only. One of such efficient and easy to understand systems is Hierarchical Production rule (HPRs) system. A HPR, a standard production rule augmented with generality and specificity information, is of the following form: Decision If < condition> Generality
Keywords: Knowledge discovery in database, quantification, dempster shafer theory, genetic programming, hierarchy, subsumption matrix.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 152741 Rare Earth Elements in Soils of Jharia Coal Field
Authors: R. E. Masto, L. C. Ram, S. K. Verma, V. A. Selvi, J. George, R. C. Tripathi, N. K. Srivastava, D. Mohanty, S. K.Jha, A. K. Sinha, A. Sinha
Abstract:
There are many sources trough which the soil get enriched and contaminated with REEs. The determination of REEs in environmental samples has been limited because of the lack of sensitive analytical techniques. Soil samples were collected from four sites including open cast coal mine, natural coal burning, coal washery and control in the coal field located in Dhanbad, India. Total concentrations of rare earth elements (REEs) were determined using the inductively coupled plasma atomic absorption spectrometry in order to assess enrichment status in the coal field. Results showed that the mean concentrations of La, Pr, Eu, Tb, Ho, and Tm in open cast mine and natural coal burning sites were elevated compared to the reference concentrations, while Ce, Nd, Sm, and Gd were elevated in coal washery site. When compared to reference soil, heavy REEs (HREEs) were enriched in open cast mines and natural coal burning affected soils, however, the HREEs were depleted in the coal washery sites. But, the Chondrite-normalization diagram showed significant enrichment for light REEs (LREEs) in all the soils. High concentration of Pr, Eu, Tb, Ho, Tm, and Lu in coal mining and coal burning sites may pose human health risks. Factor analysis showed that distribution and relative abundance of REEs of the coal washery site is comparable with the control. Eventually washing or cleaning of coal could significantly decrease the emission of REEs from coal into the environment.Keywords: Rare earth elements, coal, soil, factor analysis
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 283040 Exploring Influence Range of Tainan City Using Electronic Toll Collection Big Data
Authors: Chen Chou, Feng-Tyan Lin
Abstract:
Big Data has been attracted a lot of attentions in many fields for analyzing research issues based on a large number of maternal data. Electronic Toll Collection (ETC) is one of Intelligent Transportation System (ITS) applications in Taiwan, used to record starting point, end point, distance and travel time of vehicle on the national freeway. This study, taking advantage of ETC big data, combined with urban planning theory, attempts to explore various phenomena of inter-city transportation activities. ETC, one of government's open data, is numerous, complete and quick-update. One may recall that living area has been delimited with location, population, area and subjective consciousness. However, these factors cannot appropriately reflect what people’s movement path is in daily life. In this study, the concept of "Living Area" is replaced by "Influence Range" to show dynamic and variation with time and purposes of activities. This study uses data mining with Python and Excel, and visualizes the number of trips with GIS to explore influence range of Tainan city and the purpose of trips, and discuss living area delimited in current. It dialogues between the concepts of "Central Place Theory" and "Living Area", presents the new point of view, integrates the application of big data, urban planning and transportation. The finding will be valuable for resource allocation and land apportionment of spatial planning.
Keywords: Big Data, ITS, influence range, living area, central place theory, visualization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 97639 Development System for Emotion Detection Based on Brain Signals and Facial Images
Authors: Suprijanto, Linda Sari, Vebi Nadhira , IGN. Merthayasa. Farida I.M
Abstract:
Detection of human emotions has many potential applications. One of application is to quantify attentiveness audience in order evaluate acoustic quality in concern hall. The subjective audio preference that based on from audience is used. To obtain fairness evaluation of acoustic quality, the research proposed system for multimodal emotion detection; one modality based on brain signals that measured using electroencephalogram (EEG) and the second modality is sequences of facial images. In the experiment, an audio signal was customized which consist of normal and disorder sounds. Furthermore, an audio signal was played in order to stimulate positive/negative emotion feedback of volunteers. EEG signal from temporal lobes, i.e. T3 and T4 was used to measured brain response and sequence of facial image was used to monitoring facial expression during volunteer hearing audio signal. On EEG signal, feature was extracted from change information in brain wave, particularly in alpha and beta wave. Feature of facial expression was extracted based on analysis of motion images. We implement an advance optical flow method to detect the most active facial muscle form normal to other emotion expression that represented in vector flow maps. The reduce problem on detection of emotion state, vector flow maps are transformed into compass mapping that represents major directions and velocities of facial movement. The results showed that the power of beta wave is increasing when disorder sound stimulation was given, however for each volunteer was giving different emotion feedback. Based on features derived from facial face images, an optical flow compass mapping was promising to use as additional information to make decision about emotion feedback.
Keywords: Multimodal Emotion Detection, EEG, Facial Image, Optical Flow, compass mapping, Brain Wave
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2292