Search results for: data mining technique
29845 Blood Glucose Measurement and Analysis: Methodology
Authors: I. M. Abd Rahim, H. Abdul Rahim, R. Ghazali
Abstract:
There is numerous non-invasive blood glucose measurement technique developed by researchers, and near infrared (NIR) is the potential technique nowadays. However, there are some disagreements on the optimal wavelength range that is suitable to be used as the reference of the glucose substance in the blood. This paper focuses on the experimental data collection technique and also the analysis method used to analyze the data gained from the experiment. The selection of suitable linear and non-linear model structure is essential in prediction system, as the system developed need to be conceivably accurate.Keywords: linear, near-infrared (NIR), non-invasive, non-linear, prediction system
Procedia PDF Downloads 46129844 Planning Urban Sprawl in Mining Areas in Africa: How to Ensure Coherent Development
Authors: Pascal Rey, Anaïs Weber
Abstract:
Many mining projects are being developed in Africa the last decades. Due to the economic opportunities they offer, these projects result in a massive and rapid influx of migrants to the surrounding area. In areas where central government representation is low and local administration lack financial resources, urban development is often anarchical, beyond all public control. It leads to socio-spatial segregation, insecurity and the risk of social conflicts rising. Aware that their economic development is very correlated with local situation, mining companies get more and more involved in regional planning in setting up tools and Strategic Directions document. One of the commonly used tools in this regard is the “Influx Management Plan”. It consists in looking at the region’s absorption capacities in order to ensure its coherent development and by developing several urban centers than one macrocephalic city. It includes many other measures such as urban governance support, skills transfer, creation of strategic guidelines, financial support (local taxes, mining taxes, development funds etc.) local development projects. Through various examples of mining projects in Guinea, A country that is host to many large mining projects, we will look at the implications of regional and urban planning of which mining companies are key playor as well as public authorities. While their investment capacity offers advantages and accelerates development, their actions raise questions of the unilaterality of interests and local governance. By interfering in public affairs are mining companies not increasing the risk of central and local government shirking their responsibilities in terms of regional development, or even calling their legitimacy into question? Is such public-private collaboration really sustainable for the region as a whole and for all stakeholders?Keywords: Africa, guinea, mine, urban planning
Procedia PDF Downloads 49929843 Feature Based Unsupervised Intrusion Detection
Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein
Abstract:
The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.Keywords: information gain (IG), intrusion detection system (IDS), k-means clustering, Weka
Procedia PDF Downloads 29629842 Clustering Ethno-Informatics of Naming Village in Java Island Using Data Mining
Authors: Atje Setiawan Abdullah, Budi Nurani Ruchjana, I. Gede Nyoman Mindra Jaya, Eddy Hermawan
Abstract:
Ethnoscience is used to see the culture with a scientific perspective, which may help to understand how people develop various forms of knowledge and belief, initially focusing on the ecology and history of the contributions that have been there. One of the areas studied in ethnoscience is etno-informatics, is the application of informatics in the culture. In this study the science of informatics used is data mining, a process to automatically extract knowledge from large databases, to obtain interesting patterns in order to obtain a knowledge. While the application of culture described by naming database village on the island of Java were obtained from Geographic Indonesia Information Agency (BIG), 2014. The purpose of this study is; first, to classify the naming of the village on the island of Java based on the structure of the word naming the village, including the prefix of the word, syllable contained, and complete word. Second to classify the meaning of naming the village based on specific categories, as well as its role in the community behavioral characteristics. Third, how to visualize the naming of the village to a map location, to see the similarity of naming villages in each province. In this research we have developed two theorems, i.e theorems area as a result of research studies have collected intersection naming villages in each province on the island of Java, and the composition of the wedge theorem sets the provinces in Java is used to view the peculiarities of a location study. The methodology in this study base on the method of Knowledge Discovery in Database (KDD) on data mining, the process includes preprocessing, data mining and post processing. The results showed that the Java community prioritizes merit in running his life, always working hard to achieve a more prosperous life, and love as well as water and environmental sustainment. Naming villages in each location adjacent province has a high degree of similarity, and influence each other. Cultural similarities in the province of Central Java, East Java and West Java-Banten have a high similarity, whereas in Jakarta-Yogyakarta has a low similarity. This research resulted in the cultural character of communities within the meaning of the naming of the village on the island of Java, this character is expected to serve as a guide in the behavior of people's daily life on the island of Java.Keywords: ethnoscience, ethno-informatics, data mining, clustering, Java island culture
Procedia PDF Downloads 28329841 Discerning Divergent Nodes in Social Networks
Authors: Mehran Asadi, Afrand Agah
Abstract:
In data mining, partitioning is used as a fundamental tool for classification. With the help of partitioning, we study the structure of data, which allows us to envision decision rules, which can be applied to classification trees. In this research, we used online social network dataset and all of its attributes (e.g., Node features, labels, etc.) to determine what constitutes an above average chance of being a divergent node. We used the R statistical computing language to conduct the analyses in this report. The data were found on the UC Irvine Machine Learning Repository. This research introduces the basic concepts of classification in online social networks. In this work, we utilize overfitting and describe different approaches for evaluation and performance comparison of different classification methods. In classification, the main objective is to categorize different items and assign them into different groups based on their properties and similarities. In data mining, recursive partitioning is being utilized to probe the structure of a data set, which allow us to envision decision rules and apply them to classify data into several groups. Estimating densities is hard, especially in high dimensions, with limited data. Of course, we do not know the densities, but we could estimate them using classical techniques. First, we calculated the correlation matrix of the dataset to see if any predictors are highly correlated with one another. By calculating the correlation coefficients for the predictor variables, we see that density is strongly correlated with transitivity. We initialized a data frame to easily compare the quality of the result classification methods and utilized decision trees (with k-fold cross validation to prune the tree). The method performed on this dataset is decision trees. Decision tree is a non-parametric classification method, which uses a set of rules to predict that each observation belongs to the most commonly occurring class label of the training data. Our method aggregates many decision trees to create an optimized model that is not susceptible to overfitting. When using a decision tree, however, it is important to use cross-validation to prune the tree in order to narrow it down to the most important variables.Keywords: online social networks, data mining, social cloud computing, interaction and collaboration
Procedia PDF Downloads 16029840 The Clustering of Multiple Sclerosis Subgroups through L2 Norm Multifractal Denoising Technique
Authors: Yeliz Karaca, Rana Karabudak
Abstract:
Multifractal Denoising techniques are used in the identification of significant attributes by removing the noise of the dataset. Magnetic resonance (MR) image technique is the most sensitive method so as to identify chronic disorders of the nervous system such as Multiple Sclerosis. MRI and Expanded Disability Status Scale (EDSS) data belonging to 120 individuals who have one of the subgroups of MS (Relapsing Remitting MS (RRMS), Secondary Progressive MS (SPMS), Primary Progressive MS (PPMS)) as well as 19 healthy individuals in the control group have been used in this study. The study is comprised of the following stages: (i) L2 Norm Multifractal Denoising technique, one of the multifractal technique, has been used with the application on the MS data (MRI and EDSS). In this way, the new dataset has been obtained. (ii) The new MS dataset obtained from the MS dataset and L2 Multifractal Denoising technique has been applied to the K-Means and Fuzzy C Means clustering algorithms which are among the unsupervised methods. Thus, the clustering performances have been compared. (iii) In the identification of significant attributes in the MS dataset through the Multifractal denoising (L2 Norm) technique using K-Means and FCM algorithms on the MS subgroups and control group of healthy individuals, excellent performance outcome has been yielded. According to the clustering results based on the MS subgroups obtained in the study, successful clustering results have been obtained in the K-Means and FCM algorithms by applying the L2 norm of multifractal denoising technique for the MS dataset. Clustering performance has been more successful with the MS Dataset (L2_Norm MS Data Set) K-Means and FCM in which significant attributes are obtained by applying L2 Norm Denoising technique.Keywords: clinical decision support, clustering algorithms, multiple sclerosis, multifractal techniques
Procedia PDF Downloads 17129839 Comparison of Different Methods of Microorganism's Identification from a Copper Mining in Pará, Brazil
Authors: Louise H. Gracioso, Marcela P.G. Baltazar, Ingrid R. Avanzi, Bruno Karolski, Luciana J. Gimenes, Claudio O. Nascimento, Elen A. Perpetuo
Abstract:
Introduction: Higher copper concentrations promote a selection pressure on organisms such as plants, fungi and bacteria, which allows surviving only the resistant organisms to the contaminated site. This selective pressure keeps only the organisms most resistant to a specific condition and subsequently increases their bioremediation potential. Despite the bacteria importance for biosphere maintenance, it is estimated that only a small fraction living microbial species has been described and characterized. Due to the molecular biology development, tools based on analysis 16S ribosomal RNA or another specific gene are making a new scenario for the characterization studies and identification of microorganisms in the environment. News identification of microorganisms methods have also emerged like Biotyper (MALDI / TOF), this method mass spectrometry is subject to the recognition of spectroscopic patterns of conserved and features proteins for different microbial species. In view of this, this study aimed to isolate bacteria resistant to copper present in a Copper Processing Area (Sossego Mine, Canaan, PA) and identifies them in two different methods: Recent (spectrometry mass) and conventional. This work aimed to use them for a future bioremediation of this Mining. Material and Methods: Samples were collected at fifteen different sites of five periods of times. Microorganisms were isolated from mining wastes by culture enrichment technique; this procedure was repeated 4 times. The isolates were inoculated into MJS medium containing different concentrations of chloride copper (1mM, 2.5mM, 5mM, 7.5mM and 10 mM) and incubated in plates for 72 h at 28 ºC. These isolates were subjected to mass spectrometry identification methods (Biotyper – MALDI/TOF) and 16S gene sequencing. Results: A total of 105 strains were isolated in this area, bacterial identification by mass spectrometry method (MALDI/TOF) achieved 74% agreement with the conventional identification method (16S), 31% have been unsuccessful in MALDI-TOF and 2% did not obtain identification sequence the 16S. These results show that Biotyper can be a very useful tool in the identification of bacteria isolated from environmental samples, since it has a better value for money (cheap and simple sample preparation and MALDI plates are reusable). Furthermore, this technique is more rentable because it saves time and has a high performance (the mass spectra are compared to the database and it takes less than 2 minutes per sample).Keywords: copper mining area, bioremediation, microorganisms, identification, MALDI/TOF, RNA 16S
Procedia PDF Downloads 37829838 The Reduction of Post-Blast Fumes to Improve Productivity and Safety: A Review Paper
Authors: Nhleko Monique Chiloane
Abstract:
The gold mining industry has predominantly used ammonium nitrate fuel oil (ANFO) explosives for decades, although these are known to be “gassier” and their detonation results in toxic fumes, for example, carbon monoxide (CO), nitrogen oxides (NOx) and ammonia. Re-entry into underground workings too soon after blasting can lead to fatal exposure to toxic fumes. It is, therefore, required that the polluted air be removed from the affected areas within a reasonable period before employees' re-entry into the working area. Post-blast re-entry times have therefore been described as a productivity bottleneck. The known causes of post-blast fumes are water ingress, incorrect fuel to oxygen ratio, confinement, explosive additives etc. To prevent or minimize post-blast fumes, some researchers have used neutralization, re-burning technique and non-explosive products or different oxidizing agents. The use of commercial explosives without nitrate oxidizing agents can also minimize the production of blasting fumes and thereby reduce the time needed for the clearance of these fumes to allow workers to re-enter the underground workings safely. The reduction in non-production time directly contributes to an increase in the available time per shift for productive work, thus leading to continuous mining. However, owing to its low cost and ease of use, ANFO is still widely used in South African underground blasting operations.Keywords: post-blast fumes, continuous mining, ammonium nitrate explosive, non-explosive blasting, re-entry period
Procedia PDF Downloads 18329837 Developing an Advanced Algorithm Capable of Classifying News, Articles and Other Textual Documents Using Text Mining Techniques
Authors: R. B. Knudsen, O. T. Rasmussen, R. A. Alphinas
Abstract:
The reason for conducting this research is to develop an algorithm that is capable of classifying news articles from the automobile industry, according to the competitive actions that they entail, with the use of Text Mining (TM) methods. It is needed to test how to properly preprocess the data for this research by preparing pipelines which fits each algorithm the best. The pipelines are tested along with nine different classification algorithms in the realm of regression, support vector machines, and neural networks. Preliminary testing for identifying the optimal pipelines and algorithms resulted in the selection of two algorithms with two different pipelines. The two algorithms are Logistic Regression (LR) and Artificial Neural Network (ANN). These algorithms are optimized further, where several parameters of each algorithm are tested. The best result is achieved with the ANN. The final model yields an accuracy of 0.79, a precision of 0.80, a recall of 0.78, and an F1 score of 0.76. By removing three of the classes that created noise, the final algorithm is capable of reaching an accuracy of 94%.Keywords: Artificial Neural network, Competitive dynamics, Logistic Regression, Text classification, Text mining
Procedia PDF Downloads 12229836 Structural Damage Detection via Incomplete Model Data Using Output Data Only
Authors: Ahmed Noor Al-qayyim, Barlas Özden Çağlayan
Abstract:
Structural failure is caused mainly by damage that often occurs on structures. Many researchers focus on obtaining very efficient tools to detect the damage in structures in the early state. In the past decades, a subject that has received considerable attention in literature is the damage detection as determined by variations in the dynamic characteristics or response of structures. This study presents a new damage identification technique. The technique detects the damage location for the incomplete structure system using output data only. The method indicates the damage based on the free vibration test data by using “Two Points - Condensation (TPC) technique”. This method creates a set of matrices by reducing the structural system to two degrees of freedom systems. The current stiffness matrices are obtained from optimization of the equation of motion using the measured test data. The current stiffness matrices are compared with original (undamaged) stiffness matrices. High percentage changes in matrices’ coefficients lead to the location of the damage. TPC technique is applied to the experimental data of a simply supported steel beam model structure after inducing thickness change in one element. Where two cases are considered, the method detects the damage and determines its location accurately in both cases. In addition, the results illustrate that these changes in stiffness matrix can be a useful tool for continuous monitoring of structural safety using ambient vibration data. Furthermore, its efficiency proves that this technique can also be used for big structures.Keywords: damage detection, optimization, signals processing, structural health monitoring, two points–condensation
Procedia PDF Downloads 36529835 Image Steganography Using Least Significant Bit Technique
Authors: Preeti Kumari, Ridhi Kapoor
Abstract:
In any communication, security is the most important issue in today’s world. In this paper, steganography is the process of hiding the important data into other data, such as text, audio, video, and image. The interest in this topic is to provide availability, confidentiality, integrity, and authenticity of data. The steganographic technique that embeds hides content with unremarkable cover media so as not to provoke eavesdropper’s suspicion or third party and hackers. In which many applications of compression, encryption, decryption, and embedding methods are used for digital image steganography. Due to compression, the nose produces in the image. To sustain noise in the image, the LSB insertion technique is used. The performance of the proposed embedding system with respect to providing security to secret message and robustness is discussed. We also demonstrate the maximum steganography capacity and visual distortion.Keywords: steganography, LSB, encoding, information hiding, color image
Procedia PDF Downloads 47529834 Use of Quasi-3D Inversion of VES Data Based on Lateral Constraints to Characterize the Aquifer and Mining Sites of an Area Located in the North-East of Figuil, North Cameroon
Authors: Fofie Kokea Ariane Darolle, Gouet Daniel Hervé, Koumetio Fidèle, Yemele David
Abstract:
The electrical resistivity method is successfully used in this paper in order to have a clearer picture of the subsurface of the North-East ofFiguil in northern Cameroon. It is worth noting that this method is most often used when the objective of the study is to image the shallow subsoils by considering them as a set of stratified ground layers. The problem to be solved is very often environmental, and in this case, it is necessary to perform an inversion of the data in order to have a complete and accurate picture of the parameters of the said layers. In the case of this work, thirty-three (33) Schlumberger VES have been carried out on an irregular grid to investigate the subsurface of the study area. The 1D inversion applied as a preliminary modeling tool and in correlation with the mechanical drillings results indicates a complex subsurface lithology distribution mainly consisting of marbles and schists. Moreover, the quasi-3D inversion with lateral constraint shows that the misfit between the observed field data and the model response is quite good and acceptable with a value low than 10%. The method also reveals existence of two water bearing in the considered area. The first is the schist or weathering aquifer (unsuitable), and the other is the marble or the fracturing aquifer (suitable). The final quasi 3D inversion results and geological models indicate proper sites for groundwaters prospecting and for mining exploitation, thus allowing the economic development of the study area.Keywords: electrical resistivity method, 1D inversion, quasi 3D inversion, groundwaters, mining
Procedia PDF Downloads 15729833 Implementation of 4-Bit Direct Charge Transfer Switched Capacitor DAC with Mismatch Shaping Technique
Authors: Anuja Askhedkar, G. H. Agrawal, Madhu Gudgunti
Abstract:
Direct Charge Transfer Switched Capacitor (DCT-SC) DAC is the internal DAC used in Delta-Sigma (∆∑) DAC which works on Over-Sampling concept. The Switched Capacitor DAC mainly suffers from mismatch among capacitors. Mismatch among capacitors in DAC, causes non linearity between output and input. Dynamic Element Matching (DEM) technique is used to match the capacitors. According to element selection logic there are many types. In this paper, Data Weighted Averaging (DWA) technique is used for mismatch shaping. In this paper, the 4 bit DCT-SC-DAC with DWA-DEM technique is implemented using WINSPICE simulation software in 180nm CMOS technology. DNL for DAC with DWA is ±0.03 LSB and INL is ± 0.02LSB.Keywords: ∑-Δ DAC, DCT-SC-DAC, mismatch shaping, DWA, DEM
Procedia PDF Downloads 35129832 The Use of Classifiers in Image Analysis of Oil Wells Profiling Process and the Automatic Identification of Events
Authors: Jaqueline Maria Ribeiro Vieira
Abstract:
Different strategies and tools are available at the oil and gas industry for detecting and analyzing tension and possible fractures in borehole walls. Most of these techniques are based on manual observation of the captured borehole images. While this strategy may be possible and convenient with small images and few data, it may become difficult and suitable to errors when big databases of images must be treated. While the patterns may differ among the image area, depending on many characteristics (drilling strategy, rock components, rock strength, etc.). Previously we developed and proposed a novel strategy capable of detecting patterns at borehole images that may point to regions that have tension and breakout characteristics, based on segmented images. In this work we propose the inclusion of data-mining classification strategies in order to create a knowledge database of the segmented curves. These classifiers allow that, after some time using and manually pointing parts of borehole images that correspond to tension regions and breakout areas, the system will indicate and suggest automatically new candidate regions, with higher accuracy. We suggest the use of different classifiers methods, in order to achieve different knowledge data set configurations.Keywords: image segmentation, oil well visualization, classifiers, data-mining, visual computer
Procedia PDF Downloads 30429831 Value Analysis of Islamic Banking and Conventional Banking to Measure Value Co-Creation
Authors: Amna Javed, Hisashi Masuda, Youji Kohda
Abstract:
This study examines the value analysis in Islamic and conventional banking services in Pakistan. Many scholars have focused on co-creation of values in services but mainly economic values not non-economic. As Islamic banking is based on Islamic principles that are more concerned with non-economic values (well-being, partnership, fairness, trust worthy, and justice) than economic values as money in terms of interest. This study is important to know the providers point of view about the co-created values, because, it may be more sustainable and appropriate for today’s unpredictable socioeconomic environment. Data were collected from 4 banks (2 Islamic and 2 conventional banks). Text mining technique is applied for data analysis, and values with 100% occurrences in Islamic banking are chosen. The results reflect that Islamic banking is more centric towards non-economic values than economic values and it promotes team work and partnership concept by applying Islamic spirit and trust worthiness concept.Keywords: economic values, Islamic banking, non-economic values, value system
Procedia PDF Downloads 46429830 Hybrid Energy System for the German Mining Industry: An Optimized Model
Authors: Kateryna Zharan, Jan C. Bongaerts
Abstract:
In recent years, economic attractiveness of renewable energy (RE) for the mining industry, especially for off-grid mines, and a negative environmental impact of fossil energy are stimulating to use RE for mining needs. Being that remote area mines have higher energy expenses than mines connected to a grid, integration of RE may give a mine economic benefits. Regarding the literature review, there is a lack of business models for adopting of RE at mine. The main aim of this paper is to develop an optimized model of RE integration into the German mining industry (GMI). Hereby, the GMI with amount of around 800 mill. t. annually extracted resources is included in the list of the 15 major mining country in the world. Accordingly, the mining potential of Germany is evaluated in this paper as a perspective market for RE implementation. The GMI has been classified in order to find out the location of resources, quantity and types of the mines, amount of extracted resources, and access of the mines to the energy resources. Additionally, weather conditions have been analyzed in order to figure out where wind and solar generation technologies can be integrated into a mine with the highest efficiency. Despite the fact that the electricity demand of the GMI is almost completely covered by a grid connection, the hybrid energy system (HES) based on a mix of RE and fossil energy is developed due to show environmental and economic benefits. The HES for the GMI consolidates a combination of wind turbine, solar PV, battery and diesel generation. The model has been calculated using the HOMER software. Furthermore, the demonstrated HES contains a forecasting model that predicts solar and wind generation in advance. The main result from the HES such as CO2 emission reduction is estimated in order to make the mining processing more environmental friendly.Keywords: diesel generation, German mining industry, hybrid energy system, hybrid optimization model for electric renewables, optimized model, renewable energy
Procedia PDF Downloads 34629829 Study of the Stability of Underground Mines by Numerical Method: The Mine Chaabet El Hamra, Algeria
Authors: Nakache Radouane, M. Boukelloul, M. Fredj
Abstract:
Method room and pillar sizes are key factors for safe mining and their recovery in open-stop mining. This method is advantageous due to its simplicity and requirement of little information to be used. It is probably the most representative method among the total load approach methods although it also remains a safe design method. Using a finite element software (PLAXIS 3D), analyses were carried out with an elasto-plastic model and comparisons were made with methods based on the total load approach. The results were presented as the optimization for improving the ore recovery rate while maintaining a safe working environment.Keywords: room and pillar, mining, total load approach, elasto-plastic
Procedia PDF Downloads 33029828 Evaluation of the Performance of ACTIFLO® Clarifier in the Treatment of Mining Wastewaters: Case Study of Costerfield Mining Operations, Victoria, Australia
Authors: Seyed Mohsen Samaei, Shirley Gato-Trinidad
Abstract:
A pre-treatment stage prior to reverse osmosis (RO) is very important to ensure the long-term performance of the RO membranes in any wastewater treatment using RO. This study aims to evaluate the application of the Actiflo® clarifier as part of a pre-treatment unit in mining operations. It involves performing analytical testing on RO feed water before and after installation of Actiflo® unit. Water samples prior to RO plant stage were obtained on different dates from Costerfield mining operations in Victoria, Australia. Tests were conducted in an independent laboratory to determine the concentration of various compounds in RO feed water before and after installation of Actiflo® unit during the entire evaluated period from December 2015 to June 2018. Water quality analysis shows that the quality of RO feed water has remarkably improved since installation of Actiflo® clarifier. Suspended solids (SS) and turbidity removal efficiencies has been improved by 91 and 85 percent respectively in pre-treatment system since the installation of Actiflo®. The Actiflo® clarifier proved to be a valuable part of pre-treatment system prior to RO. It has the potential to conveniently condition the mining wastewater prior to RO unit, and reduce the risk of RO physical failure and irreversible fouling. Consequently, reliable and durable operation of RO unit with minimum requirement for RO membrane replacement is expected with Actiflo® in use.Keywords: ACTIFLO ® clarifier, mining wastewater, reverse osmosis, water treatment
Procedia PDF Downloads 19429827 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection
Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada
Abstract:
With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.Keywords: machine learning, imbalanced data, data mining, big data
Procedia PDF Downloads 13229826 Voice of Customer: Mining Customers' Reviews on On-Line Car Community
Authors: Kim Dongwon, Yu Songjin
Abstract:
This study identifies the business value of VOC (Voice of Customer) on the business. Precisely, we intend to demonstrate how much negative and positive sentiment of VOC has an influence on car sales market share in the unites states. We extract 7 emotions such as sadness, shame, anger, fear, frustration, delight and satisfaction from the VOC data, 23,204 pieces of opinions, that had been posted on car-related on-line community from 2007 to 2009(a part of data collection from 2007 to 2015), and intend to clarify the correlation between negative and positive sentimental keywords and contribution to market share. In order to develop a lexicon for each category of negative and positive sentiment, we took advantage of Corpus program, Antconc 3.4.1.w and on-line sentimental data, SentiWordNet and identified the part of speech(POS) information of words in the customers' opinion by using a part-of-speech tagging function provided by TextAnalysisOnline. For the purpose of this present study, a total of 45,741 pieces of customers' opinions of 28 car manufacturing companies had been collected including titles and status information. We conducted an experiment to examine whether the inclusion, frequency and intensity of terms with negative and positive emotions in each category affect the adoption of customer opinions for vehicle organizations' market share. In the experiment, we statistically verified that there is correlation between customer ideas containing negative and positive emotions and variation of marker share. Particularly, "Anger," a domain of negative domains, is significantly influential to car sales market share. The domain "Delight" and "Satisfaction" increased in proportion to growth of market share.Keywords: data mining, opinion mining, sentiment analysis, VOC
Procedia PDF Downloads 21529825 A Heart Arrhythmia Prediction Using Machine Learning’s Classification Approach and the Concept of Data Mining
Authors: Roshani S. Golhar, Neerajkumar S. Sathawane, Snehal Dongre
Abstract:
Background and objectives: As the, cardiovascular illnesses increasing and becoming cause of mortality worldwide, killing around lot of people each year. Arrhythmia is a type of cardiac illness characterized by a change in the linearity of the heartbeat. The goal of this study is to develop novel deep learning algorithms for successfully interpreting arrhythmia using a single second segment. Because the ECG signal indicates unique electrical heart activity across time, considerable changes between time intervals are detected. Such variances, as well as the limited number of learning data available for each arrhythmia, make standard learning methods difficult, and so impede its exaggeration. Conclusions: The proposed method was able to outperform several state-of-the-art methods. Also proposed technique is an effective and convenient approach to deep learning for heartbeat interpretation, that could be probably used in real-time healthcare monitoring systemsKeywords: electrocardiogram, ECG classification, neural networks, convolutional neural networks, portable document format
Procedia PDF Downloads 7229824 Research on the Correlation between College Students' Physical Fitness and Running Habits: Data Mining of Smart Phone Sports App
Authors: Mingming Guo, Xiaozan Wang
Abstract:
Introduction: The purpose of this study is to examine the correlation between the physical fitness of Chinese college students and their daily running habits (RH). Methods: A total of 718 college students from East China Normal University participated in this study (385 boys and 333 girls). Each participant participated in the Chinese Students’ Physical Fitness Test during the 2018-2019 school year. In addition, each student is also required to use the app to record all their running results during each run during the 2018-2019 school year. Researchers can query and export all running records through the app's management platform. Results: (1) The total number of kilometers run by the students showed a significant negative correlation with their vital capacity (VC), sitting body flexion (SBF), and long jump (LJ) (rᵥKeywords: college students, physical fitness, running habits, data mining
Procedia PDF Downloads 14129823 Multi-Source Data Fusion for Urban Comprehensive Management
Authors: Bolin Hua
Abstract:
In city governance, various data are involved, including city component data, demographic data, housing data and all kinds of business data. These data reflects different aspects of people, events and activities. Data generated from various systems are different in form and data source are different because they may come from different sectors. In order to reflect one or several facets of an event or rule, data from multiple sources need fusion together. Data from different sources using different ways of collection raised several issues which need to be resolved. Problem of data fusion include data update and synchronization, data exchange and sharing, file parsing and entry, duplicate data and its comparison, resource catalogue construction. Governments adopt statistical analysis, time series analysis, extrapolation, monitoring analysis, value mining, scenario prediction in order to achieve pattern discovery, law verification, root cause analysis and public opinion monitoring. The result of Multi-source data fusion is to form a uniform central database, which includes people data, location data, object data, and institution data, business data and space data. We need to use meta data to be referred to and read when application needs to access, manipulate and display the data. A uniform meta data management ensures effectiveness and consistency of data in the process of data exchange, data modeling, data cleansing, data loading, data storing, data analysis, data search and data delivery.Keywords: multi-source data fusion, urban comprehensive management, information fusion, government data
Procedia PDF Downloads 39429822 Mapping of Arenga Pinnata Tree Using Remote Sensing
Authors: Zulkiflee Abd Latif, Sitinor Atikah Nordin, Alawi Sulaiman
Abstract:
Different tree species possess different and various benefits. Arenga Pinnata tree species own several potential uses that is valuable for the economy and the country. Mapping vegetation using remote sensing technique involves various process, techniques and consideration. Using satellite imagery, this method enables the access of inaccessible area and with the availability of near infra-red band; it is useful in vegetation analysis, especially in identifying tree species. Pixel-based and object-based classification technique is used as a method in this study. Pixel-based classification technique used in this study divided into unsupervised and supervised classification. Object based classification technique becomes more popular another alternative method in classification process. Using spectral, texture, color and other information, to classify the target make object-based classification is a promising technique for classification. Classification of Arenga Pinnata trees is overlaid with elevation, slope and aspect, soil and river data and several other data to give information regarding the tree character and living environment. This paper will present the utilization of remote sensing technique in order to map Arenga Pinnata tree speciesKeywords: Arenga Pinnata, pixel-based classification, object-based classification, remote sensing
Procedia PDF Downloads 38129821 A Fuzzy Kernel K-Medoids Algorithm for Clustering Uncertain Data Objects
Authors: Behnam Tavakkol
Abstract:
Uncertain data mining algorithms use different ways to consider uncertainty in data such as by representing a data object as a sample of points or a probability distribution. Fuzzy methods have long been used for clustering traditional (certain) data objects. They are used to produce non-crisp cluster labels. For uncertain data, however, besides some uncertain fuzzy k-medoids algorithms, not many other fuzzy clustering methods have been developed. In this work, we develop a fuzzy kernel k-medoids algorithm for clustering uncertain data objects. The developed fuzzy kernel k-medoids algorithm is superior to existing fuzzy k-medoids algorithms in clustering data sets with non-linearly separable clusters.Keywords: clustering algorithm, fuzzy methods, kernel k-medoids, uncertain data
Procedia PDF Downloads 21629820 Determination of the Bank's Customer Risk Profile: Data Mining Applications
Authors: Taner Ersoz, Filiz Ersoz, Seyma Ozbilge
Abstract:
In this study, the clients who applied to a bank branch for loan were analyzed through data mining. The study was composed of the information such as amounts of loans received by personal and SME clients working with the bank branch, installment numbers, number of delays in loan installments, payments available in other banks and number of banks to which they are in debt between 2010 and 2013. The client risk profile was examined through Classification and Regression Tree (CART) analysis, one of the decision tree classification methods. At the end of the study, 5 different types of customers have been determined on the decision tree. The classification of these types of customers has been created with the rating of those posing a risk for the bank branch and the customers have been classified according to the risk ratings.Keywords: client classification, loan suitability, risk rating, CART analysis
Procedia PDF Downloads 33829819 Sustainable and Responsible Mining - Lundin Mining’s Subsidiary in Portugal, Sociedade Mineira de Neves-Corvo Case
Authors: Jose Daniel Braga Alves, Joaquim Gois, Alexandre Leite
Abstract:
This abstract presents the responsible and sustainable mining case study of a Portuguese mine operation, highlighting how mine exploitation can sustainably exist in balance with the environment, aligned with all stakeholders. The mining operation is remotely located in a United Nations (UN) biodiversity reserve, away from major industrial centers or logistical ports, and presents an interesting investigation to assess the balanced mine operation in alignment with all key stakeholders, which presents unique opportunities as well as challenges. Based on the sustainable mining framework, it is intended to detail examples of best practices from Sociedade Mineira de Neves-Corvo (SOMINCOR), demonstrating social acceptance by the local community, health, and safety at work, reduction of environmental impacts and management of mining waste, which directly influence the acceptance and recognition of a sustainable operation. The case study aims to present the SOMINCOR approach to sustainable mining, focusing on social responsibility, considering materials provided by Lundin Mining Corporation (LMC) and SOMINCOR and the socially responsible approach of the mining operations., referencing related international guidelines, UN Sustainable Development Goals. The researchers reviewed LMC's annual Sustainability Reports (2019, 2020 and 2021) and updated information regarding material topics of the most significant interest to internal and external stakeholders. These material topics formed the basis of the corporation-wide sustainability strategy. LMC's Responsible Mining Policy (RMP) was reviewed, focusing on the commitment that guides the approach to responsible operation and management of the Company's business. Social performance, compliance, environmental management, governance, human rights, and economic contribution are principles of the RMP. The Human Rights Risk Impact Assessment (HRRIA), based on frameworks including UN Guiding Principles (UNGP), Voluntary Principles on Security and Human Rights, and a community engagement program implemented (SLO index), was part of this research. The program consists of ongoing surveys and perceptions studies using behavioural science insights, data from which was not available within the timeframe of completing this research. LMC stakeholder engagement standards and grievance mechanisms were also reviewed. Stakeholder engagement and the community's perception are key to this operation to ensure social license to operate (SLO). Preliminary surveys with local communities provided input data for the local development strategy. After the implementation of several initiatives, subsequent surveys were performed to assess acceptance and trust from the local communities and changes to the SLO index. SOMINCOR's operation contributes to 12 out of 17 sustainable development goals. From the assessed and available data, local communities and social engagement are priorities to SOMINCOR. Experience to date shows that the continual engagement with local communities and the grievance mechanisms in place are respected and followed for all concerns presented by any stakeholder. It can be concluded that this underground mine in Portugal complies with applicable regulations and goes beyond them with regard to sustainable development and engagement with key stakeholders.Keywords: sustainable mining, development goals, portuguese mining, zinc copper
Procedia PDF Downloads 7729818 Grid and Market Integration of Large Scale Wind Farms using Advanced Predictive Data Mining Techniques
Authors: Umit Cali
Abstract:
The integration of intermittent energy sources like wind farms into the electricity grid has become an important challenge for the utilization and control of electric power systems, because of the fluctuating behaviour of wind power generation. Wind power predictions improve the economic and technical integration of large amounts of wind energy into the existing electricity grid. Trading, balancing, grid operation, controllability and safety issues increase the importance of predicting power output from wind power operators. Therefore, wind power forecasting systems have to be integrated into the monitoring and control systems of the transmission system operator (TSO) and wind farm operators/traders. The wind forecasts are relatively precise for the time period of only a few hours, and, therefore, relevant with regard to Spot and Intraday markets. In this work predictive data mining techniques are applied to identify a statistical and neural network model or set of models that can be used to predict wind power output of large onshore and offshore wind farms. These advanced data analytic methods helps us to amalgamate the information in very large meteorological, oceanographic and SCADA data sets into useful information and manageable systems. Accurate wind power forecasts are beneficial for wind plant operators, utility operators, and utility customers. An accurate forecast allows grid operators to schedule economically efficient generation to meet the demand of electrical customers. This study is also dedicated to an in-depth consideration of issues such as the comparison of day ahead and the short-term wind power forecasting results, determination of the accuracy of the wind power prediction and the evaluation of the energy economic and technical benefits of wind power forecasting.Keywords: renewable energy sources, wind power, forecasting, data mining, big data, artificial intelligence, energy economics, power trading, power grids
Procedia PDF Downloads 51929817 Analysis Mechanized Boring (TBM) of Tehran Subway Line 7
Authors: Shahin Shabani, Pouya Pourmadadi
Abstract:
Tunnel boring machines (TBMs) have been used for the construction of various tunnels for mining projects for the purpose of access, conveyance of ore and waste, drainage, exploration, water supply and water diversion. Several mining projects have seen the successful and economic beneficial use of TBMs, and there is an increasing awareness of the benefits of TBMs for mining projects. Key technical considerations for the use of TBMs for the construction of tunnels for mining projects include geological issues (rock type, rock alteration, rock strength, rock abrasivity, durability, ground water inflows), depth of cover and the potential for overstressing/rockbursts, site access and terrain, portal locations, TBM constraints, minimum tunnel size, tunnel support requirements, contractor and labor experience, and project schedule demands. This study focuses on tunnelling mining, with the goal to develop methods and tools to be used to gain understanding of these processes, and to analyze metro of Tehran. The Metro Line 7 of Tehran is one of the Longest (26 Km) and deepest (27m) of projects that’s under implementation. Because of major differences like passing under all geotechnical layers of the town and encountering part of it with underground water table and also using mechanized excavation system, is one of special metro projects.Keywords: TBM, tunnel boring machines economic, metro, line 7
Procedia PDF Downloads 38429816 Parallel Genetic Algorithms Clustering for Handling Recruitment Problem
Authors: Walid Moudani, Ahmad Shahin
Abstract:
This research presents a study to handle the recruitment services system. It aims to enhance a business intelligence system by embedding data mining in its core engine and to facilitate the link between job searchers and recruiters companies. The purpose of this study is to present an intelligent management system for supporting recruitment services based on data mining methods. It consists to apply segmentation on the extracted job postings offered by the different recruiters. The details of the job postings are associated to a set of relevant features that are extracted from the web and which are based on critical criterion in order to define consistent clusters. Thereafter, we assign the job searchers to the best cluster while providing a ranking according to the job postings of the selected cluster. The performance of the proposed model used is analyzed, based on a real case study, with the clustered job postings dataset and classified job searchers dataset by using some metrics.Keywords: job postings, job searchers, clustering, genetic algorithms, business intelligence
Procedia PDF Downloads 329