Search results for: allele mining
849 A Comparative Analysis of Classification Models with Wrapper-Based Feature Selection for Predicting Student Academic Performance
Authors: Abdullah Al Farwan, Ya Zhang
Abstract:
In today’s educational arena, it is critical to understand educational data and be able to evaluate important aspects, particularly data on student achievement. Educational Data Mining (EDM) is a research area that focusing on uncovering patterns and information in data from educational institutions. Teachers, if they are able to predict their students' class performance, can use this information to improve their teaching abilities. It has evolved into valuable knowledge that can be used for a wide range of objectives; for example, a strategic plan can be used to generate high-quality education. Based on previous data, this paper recommends employing data mining techniques to forecast students' final grades. In this study, five data mining methods, Decision Tree, JRip, Naive Bayes, Multi-layer Perceptron, and Random Forest with wrapper feature selection, were used on two datasets relating to Portuguese language and mathematics classes lessons. The results showed the effectiveness of using data mining learning methodologies in predicting student academic success. The classification accuracy achieved with selected algorithms lies in the range of 80-94%. Among all the selected classification algorithms, the lowest accuracy is achieved by the Multi-layer Perceptron algorithm, which is close to 70.45%, and the highest accuracy is achieved by the Random Forest algorithm, which is close to 94.10%. This proposed work can assist educational administrators to identify poor performing students at an early stage and perhaps implement motivational interventions to improve their academic success and prevent educational dropout.Keywords: classification algorithms, decision tree, feature selection, multi-layer perceptron, Naïve Bayes, random forest, students’ academic performance
Procedia PDF Downloads 166848 Association of Social Data as a Tool to Support Government Decision Making
Authors: Diego Rodrigues, Marcelo Lisboa, Elismar Batista, Marcos Dias
Abstract:
Based on data on child labor, this work arises questions about how to understand and locate the factors that make up the child labor rates, and which properties are important to analyze these cases. Using data mining techniques to discover valid patterns on Brazilian social databases were evaluated data of child labor in the State of Tocantins (located north of Brazil with a territory of 277000 km2 and comprises 139 counties). This work aims to detect factors that are deterministic for the practice of child labor and their relationships with financial indicators, educational, regional and social, generating information that is not explicit in the government database, thus enabling better monitoring and updating policies for this purpose.Keywords: social data, government decision making, association of social data, data mining
Procedia PDF Downloads 369847 Multiscale Connected Component Labelling and Applications to Scientific Microscopy Image Processing
Authors: Yayun Hsu, Henry Horng-Shing Lu
Abstract:
In this paper, a new method is proposed to extending the method of connected component labeling from processing binary images to multi-scale modeling of images. By using the adaptive threshold of multi-scale attributes, this approach minimizes the possibility of missing those important components with weak intensities. In addition, the computational cost of this approach remains similar to that of the typical approach of component labeling. Then, this methodology is applied to grain boundary detection and Drosophila Brain-bow neuron segmentation. These demonstrate the feasibility of the proposed approach in the analysis of challenging microscopy images for scientific discovery.Keywords: microscopic image processing, scientific data mining, multi-scale modeling, data mining
Procedia PDF Downloads 435846 Exploring Twitter Data on Human Rights Activism on Olympics Stage through Social Network Analysis and Mining
Authors: Teklu Urgessa, Joong Seek Lee
Abstract:
Social media is becoming the primary choice of activists to make their voices heard. This fact is coupled by two main reasons. The first reason is the emergence web 2.0, which gave the users opportunity to become content creators than passive recipients. Secondly the control of the mainstream mass media outlets by the governments and individuals with their political and economic interests. This paper aimed at exploring twitter data of network actors talking about the marathon silver medalists on Rio2016, who showed solidarity with the Oromo protesters in Ethiopia on the marathon race finish line when he won silver. The aim is to discover important insight using social network analysis and mining. The hashtag #FeyisaLelisa was used for Twitter network search. The actors’ network was visualized and analyzed. It showed the central influencers during first 10 days in August, were international media outlets while it was changed to individual activist in September. The degree distribution of the network is scale free where the frequency of degrees decay by power low. Text mining was also used to arrive at meaningful themes from tweet corpus about the event selected for analysis. The semantic network indicated important clusters of concepts (15) that provided different insight regarding the why, who, where, how of the situation related to the event. The sentiments of the words in the tweets were also analyzed and indicated that 95% of the opinions in the tweets were either positive or neutral. Overall, the finding showed that Olympic stage protest of the marathoner brought the issue of Oromo protest to the global stage. The new research framework is proposed based for event-based social network analysis and mining based on the practical procedures followed in this research for event-based social media sense making.Keywords: human rights, Olympics, social media, network analysis, social network ming
Procedia PDF Downloads 257845 The Reduction of Post-Blast Fumes to Improve Productivity and Safety: A Review Paper
Authors: Nhleko Monique Chiloane
Abstract:
The gold mining industry has predominantly used ammonium nitrate fuel oil (ANFO) explosives for decades, although these are known to be “gassier” and their detonation results in toxic fumes, for example, carbon monoxide (CO), nitrogen oxides (NOx) and ammonia. Re-entry into underground workings too soon after blasting can lead to fatal exposure to toxic fumes. It is, therefore, required that the polluted air be removed from the affected areas within a reasonable period before employees' re-entry into the working area. Post-blast re-entry times have therefore been described as a productivity bottleneck. The known causes of post-blast fumes are water ingress, incorrect fuel to oxygen ratio, confinement, explosive additives etc. To prevent or minimize post-blast fumes, some researchers have used neutralization, re-burning technique and non-explosive products or different oxidizing agents. The use of commercial explosives without nitrate oxidizing agents can also minimize the production of blasting fumes and thereby reduce the time needed for the clearance of these fumes to allow workers to re-enter the underground workings safely. The reduction in non-production time directly contributes to an increase in the available time per shift for productive work, thus leading to continuous mining. However, owing to its low cost and ease of use, ANFO is still widely used in South African underground blasting operations.Keywords: post-blast fumes, continuous mining, ammonium nitrate explosive, non-explosive blasting, re-entry period
Procedia PDF Downloads 183844 Comparative Analysis of Classification Methods in Determining Non-Active Student Characteristics in Indonesia Open University
Authors: Dewi Juliah Ratnaningsih, Imas Sukaesih Sitanggang
Abstract:
Classification is one of data mining techniques that aims to discover a model from training data that distinguishes records into the appropriate category or class. Data mining classification methods can be applied in education, for example, to determine the classification of non-active students in Indonesia Open University. This paper presents a comparison of three methods of classification: Naïve Bayes, Bagging, and C.45. The criteria used to evaluate the performance of three methods of classification are stratified cross-validation, confusion matrix, the value of the area under the ROC Curve (AUC), Recall, Precision, and F-measure. The data used for this paper are from the non-active Indonesia Open University students in registration period of 2004.1 to 2012.2. Target analysis requires that non-active students were divided into 3 groups: C1, C2, and C3. Data analyzed are as many as 4173 students. Results of the study show: (1) Bagging method gave a high degree of classification accuracy than Naïve Bayes and C.45, (2) the Bagging classification accuracy rate is 82.99 %, while the Naïve Bayes and C.45 are 80.04 % and 82.74 % respectively, (3) the result of Bagging classification tree method has a large number of nodes, so it is quite difficult in decision making, (4) classification of non-active Indonesia Open University student characteristics uses algorithms C.45, (5) based on the algorithm C.45, there are 5 interesting rules which can describe the characteristics of non-active Indonesia Open University students.Keywords: comparative analysis, data mining, clasiffication, Bagging, Naïve Bayes, C.45, non-active students, Indonesia Open University
Procedia PDF Downloads 316843 Implementation of an IoT Sensor Data Collection and Analysis Library
Authors: Jihyun Song, Kyeongjoo Kim, Minsoo Lee
Abstract:
Due to the development of information technology and wireless Internet technology, various data are being generated in various fields. These data are advantageous in that they provide real-time information to the users themselves. However, when the data are accumulated and analyzed, more various information can be extracted. In addition, development and dissemination of boards such as Arduino and Raspberry Pie have made it possible to easily test various sensors, and it is possible to collect sensor data directly by using database application tools such as MySQL. These directly collected data can be used for various research and can be useful as data for data mining. However, there are many difficulties in using the board to collect data, and there are many difficulties in using it when the user is not a computer programmer, or when using it for the first time. Even if data are collected, lack of expert knowledge or experience may cause difficulties in data analysis and visualization. In this paper, we aim to construct a library for sensor data collection and analysis to overcome these problems.Keywords: clustering, data mining, DBSCAN, k-means, k-medoids, sensor data
Procedia PDF Downloads 378842 Environmental Impact Assessment in Mining Regions with Remote Sensing
Authors: Carla Palencia-Aguilar
Abstract:
Calculations of Net Carbon Balance can be obtained by means of Net Biome Productivity (NBP), Net Ecosystem Productivity (NEP), and Net Primary Production (NPP). The latter is an important component of the biosphere carbon cycle and is easily obtained data from MODIS MOD17A3HGF; however, the results are only available yearly. To overcome data availability, bands 33 to 36 from MODIS MYD021KM (obtained on a daily basis) were analyzed and compared with NPP data from the years 2000 to 2021 in 7 sites where surface mining takes place in the Colombian territory. Coal, Gold, Iron, and Limestone were the minerals of interest. Scales and Units as well as thermal anomalies, were considered for net carbon balance per location. The NPP time series from the satellite images were filtered by using two Matlab filters: First order and Discrete Transfer. After filtering the NPP time series, comparing the graph results from the satellite’s image value, and running a linear regression, the results showed R2 from 0,72 to 0,85. To establish comparable units among NPP and bands 33 to 36, the Greenhouse Gas Equivalencies Calculator by EPA was used. The comparison was established in two ways: one by the sum of all the data per point per year and the other by the average of 46 weeks and finding the percentage that the value represented with respect to NPP. The former underestimated the total CO2 emissions. The results also showed that coal and gold mining in the last 22 years had less CO2 emissions than limestone, with an average per year of 143 kton CO2 eq for gold, 152 kton CO2 eq for coal, and 287 kton CO2 eq for iron. Limestone emissions varied from 206 to 441 kton CO2 eq. The maximum emission values from unfiltered data correspond to 165 kton CO2 eq. for gold, 188 kton CO2 eq. for coal, and 310 kton CO2 eq. for iron and limestone, varying from 231 to 490 kton CO2 eq. If the most pollutant limestone site improves its production technology, limestone could count with a maximum of 318 kton CO2 eq emissions per year, a value very similar respect to iron. The importance of gathering data is to establish benchmarks in order to attain 2050’s zero emissions goal.Keywords: carbon dioxide, NPP, MODIS, MINING
Procedia PDF Downloads 104841 Improvement of Microstructure, Wear and Mechanical Properties of Modified G38NiCrMo8-4-4 Steel Used in Mining Industry
Authors: Mustafa Col, Funda Gul Koc, Merve Yangaz, Eylem Subasi, Can Akbasoglu
Abstract:
G38NiCrMo8-4-4 steel is widely used in mining industries, machine parts, gears due to its high strength and toughness properties. In this study, microstructure, wear and mechanical properties of G38NiCrMo8-4-4 steel modified with boron used in the mining industry were investigated. For this purpose, cast materials were alloyed by melting in an induction furnace to include boron with the rates of 0 ppm, 15 ppm, and 50 ppm (wt.) and were formed in the dimensions of 150x200x150 mm by casting into the sand mould. Homogenization heat treatment was applied to the specimens at 1150˚C for 7 hours. Then all specimens were austenitized at 930˚C for 1 hour, quenched in the polymer solution and tempered at 650˚C for 1 hour. Microstructures of the specimens were investigated by using light microscope and SEM to determine the effect of boron and heat treatment conditions. Changes in microstructure properties and material hardness were obtained due to increasing boron content and heat treatment conditions after microstructure investigations and hardness tests. Wear tests were carried out using a pin-on-disc tribometer under dry sliding conditions. Charpy V notch impact test was performed to determine the toughness properties of the specimens. Fracture and worn surfaces were investigated with scanning electron microscope (SEM). The results show that boron element has a positive effect on the hardness and wear properties of G38NiCrMo8-4-4 steel.Keywords: G38NiCrMo8-4-4 steel, boron, heat treatment, microstructure, wear, mechanical properties
Procedia PDF Downloads 195840 Impact of Coal Mining on River Sediment Quality in the Sydney Basin, Australia
Authors: A. Ali, V. Strezov, P. Davies, I. Wright, T. Kan
Abstract:
The environmental impacts arising from mining activities affect the air, water, and soil quality. Impacts may result in unexpected and adverse environmental outcomes. This study reports on the impact of coal production on sediment in Sydney region of Australia. The sediment samples upstream and downstream from the discharge points from three mines were taken, and 80 parameters were tested. The results were assessed against sediment quality based on presence of metals. The study revealed the increment of metal content in the sediment downstream of the reference locations. In many cases, the sediment was above the Australia and New Zealand Environment Conservation Council and international sediment quality guidelines value (SQGV). The major outliers to the guidelines were nickel (Ni) and zinc (Zn).Keywords: coal mine, environmental impact, produced water, sediment quality guidelines value (SQGV)
Procedia PDF Downloads 304839 Road Traffic Accidents Analysis in Mexico City through Crowdsourcing Data and Data Mining Techniques
Authors: Gabriela V. Angeles Perez, Jose Castillejos Lopez, Araceli L. Reyes Cabello, Emilio Bravo Grajales, Adriana Perez Espinosa, Jose L. Quiroz Fabian
Abstract:
Road traffic accidents are among the principal causes of traffic congestion, causing human losses, damages to health and the environment, economic losses and material damages. Studies about traditional road traffic accidents in urban zones represents very high inversion of time and money, additionally, the result are not current. However, nowadays in many countries, the crowdsourced GPS based traffic and navigation apps have emerged as an important source of information to low cost to studies of road traffic accidents and urban congestion caused by them. In this article we identified the zones, roads and specific time in the CDMX in which the largest number of road traffic accidents are concentrated during 2016. We built a database compiling information obtained from the social network known as Waze. The methodology employed was Discovery of knowledge in the database (KDD) for the discovery of patterns in the accidents reports. Furthermore, using data mining techniques with the help of Weka. The selected algorithms was the Maximization of Expectations (EM) to obtain the number ideal of clusters for the data and k-means as a grouping method. Finally, the results were visualized with the Geographic Information System QGIS.Keywords: data mining, k-means, road traffic accidents, Waze, Weka
Procedia PDF Downloads 418838 The Impact of Mining Activities on the Surface Water Quality: A Case Study of the Kaap River in Barberton, Mpumalanga
Authors: M. F. Mamabolo
Abstract:
Mining activities are identified as the most significant source of heavy metal contamination in river basins, due to inadequate disposal of mining waste thus resulting in acid mine drainage. Waste materials generated from gold mining and processing have severe and widespread impacts on water resources. Therefore, a total of 30 water samples were collected from Fig Tree Creek, Kaapriver, Sheba mine stream & Sauid kaap river to investigate the impact of gold mines on the Kaap River system. Physicochemical parameters (pH, EC and TDS) were taken using a BANTE 900P portable water quality meter. The concentration of Fe, Cu, Co, and SO₄²⁻ in water samples were analysed using Inductively Coupled Plasma-Mass spectrophotometry (ICP-MS) at 0.01 mg/L. The results were compared to the regulatory guideline of the World Health Organization (WHO) and the South Africa National Standards (SANS). It was found that Fe, Cu and Co were below the guideline values while SO₄²⁻ detected in Sheba mine stream exceeded the 250 mg/L limit for both seasons, attributed by mine wastewater. SO₄²⁻ was higher in wet season due to high evaporation rates and greater interaction between rocks and water. The pH of all the streams was within the limit (≥5 to ≤9.7), however EC of the Sheba mine stream, Suid Kaap River & where the tributary connects with the Fig Tree Creek exceeded 1700 uS/m, due to dissolved material. The TDS of Sheba mine stream exceeded 1000 mg/L, attributed by high SO₄²⁻ concentration. While the tributary connecting to the Fig Tree Creek exceed the value due to pollution from household waste, runoff from agriculture etc. In conclusion, the water from all sampled streams were safe for consumption due to low concentrations of physicochemical parameters. However, elevated concentration of SO₄²⁻ should be monitored and managed to avoid water quality deterioration in the Kaap River system.Keywords: Kaap river system, mines, heavy metals, sulphate
Procedia PDF Downloads 81837 Statistical Analysis to Select Evacuation Route
Authors: Zaky Musyarof, Dwi Yono Sutarto, Dwima Rindy Atika, R. B. Fajriya Hakim
Abstract:
Each country should be responsible for the safety of people, especially responsible for the safety of people living in disaster-prone areas. One of those services is provides evacuation route for them. But all this time, the selection of evacuation route is seem doesn’t well organized, it could be seen that when a disaster happen, there will be many accumulation of people on the steps of evacuation route. That condition is dangerous to people because hampers evacuation process. By some methods in Statistical analysis, author tries to give a suggestion how to prepare evacuation route which is organized and based on people habit. Those methods are association rules, sequential pattern mining, hierarchical cluster analysis and fuzzy logic.Keywords: association rules, sequential pattern mining, cluster analysis, fuzzy logic, evacuation route
Procedia PDF Downloads 504836 AniMoveMineR: Animal Behavior Exploratory Analysis Using Association Rules Mining
Authors: Suelane Garcia Fontes, Silvio Luiz Stanzani, Pedro L. Pizzigatti Corrła Ronaldo G. Morato
Abstract:
Environmental changes and major natural disasters are most prevalent in the world due to the damage that humanity has caused to nature and these damages directly affect the lives of animals. Thus, the study of animal behavior and their interactions with the environment can provide knowledge that guides researchers and public agencies in preservation and conservation actions. Exploratory analysis of animal movement can determine the patterns of animal behavior and with technological advances the ability of animals to be tracked and, consequently, behavioral studies have been expanded. There is a lot of research on animal movement and behavior, but we note that a proposal that combines resources and allows for exploratory analysis of animal movement and provide statistical measures on individual animal behavior and its interaction with the environment is missing. The contribution of this paper is to present the framework AniMoveMineR, a unified solution that aggregates trajectory analysis and data mining techniques to explore animal movement data and provide a first step in responding questions about the animal individual behavior and their interactions with other animals over time and space. We evaluated the framework through the use of monitored jaguar data in the city of Miranda Pantanal, Brazil, in order to verify if the use of AniMoveMineR allows to identify the interaction level between these jaguars. The results were positive and provided indications about the individual behavior of jaguars and about which jaguars have the highest or lowest correlation.Keywords: data mining, data science, trajectory, animal behavior
Procedia PDF Downloads 144835 Exploration of RFID in Healthcare: A Data Mining Approach
Authors: Shilpa Balan
Abstract:
Radio Frequency Identification, also popularly known as RFID is used to automatically identify and track tags attached to items. This study focuses on the application of RFID in healthcare. The adoption of RFID in healthcare is a crucial technology to patient safety and inventory management. Data from RFID tags are used to identify the locations of patients and inventory in real time. Medical errors are thought to be a prominent cause of loss of life and injury. The major advantage of RFID application in healthcare industry is the reduction of medical errors. The healthcare industry has generated huge amounts of data. By discovering patterns and trends within the data, big data analytics can help improve patient care and lower healthcare costs. The number of increasing research publications leading to innovations in RFID applications shows the importance of this technology. This study explores the current state of research of RFID in healthcare using a text mining approach. No study has been performed yet on examining the current state of RFID research in healthcare using a data mining approach. In this study, related articles were collected on RFID from healthcare journal and news articles. Articles collected were from the year 2000 to 2015. Significant keywords on the topic of focus are identified and analyzed using open source data analytics software such as Rapid Miner. These analytical tools help extract pertinent information from massive volumes of data. It is seen that the main benefits of adopting RFID technology in healthcare include tracking medicines and equipment, upholding patient safety, and security improvement. The real-time tracking features of RFID allows for enhanced supply chain management. By productively using big data, healthcare organizations can gain significant benefits. Big data analytics in healthcare enables improved decisions by extracting insights from large volumes of data.Keywords: RFID, data mining, data analysis, healthcare
Procedia PDF Downloads 233834 Virtual Dimension Analysis of Hyperspectral Imaging to Characterize a Mining Sample
Authors: L. Chevez, A. Apaza, J. Rodriguez, R. Puga, H. Loro, Juan Z. Davalos
Abstract:
Virtual Dimension (VD) procedure is used to analyze Hyperspectral Image (HIS) treatment-data in order to estimate the abundance of mineral components of a mining sample. Hyperspectral images coming from reflectance spectra (NIR region) are pre-treated using Standard Normal Variance (SNV) and Minimum Noise Fraction (MNF) methodologies. The endmember components are identified by the Simplex Growing Algorithm (SVG) and after adjusted to the reflectance spectra of reference-databases using Simulated Annealing (SA) methodology. The obtained abundance of minerals of the sample studied is very near to the ones obtained using XRD with a total relative error of 2%.Keywords: hyperspectral imaging, minimum noise fraction, MNF, simplex growing algorithm, SGA, standard normal variance, SNV, virtual dimension, XRD
Procedia PDF Downloads 158833 Seasonal Variation of the Impact of Mining Activities on Ga-Selati River in Limpopo Province, South Africa
Authors: Joshua N. Edokpayi, John O. Odiyo, Patience P. Shikwambana
Abstract:
Water is a very rare natural resource in South Africa. Ga-Selati River is used for both domestic and industrial purposes. This study was carried out in order to assess the quality of Ga-Selati River in a mining area of Limpopo Province-Phalaborwa. The pH, Electrical Conductivity (EC) and Total Dissolved Solids (TDS) were determined using a Crinson multimeter while turbidity was measured using a Labcon Turbidimeter. The concentrations of Al, Ca, Cd, Cr, Fe, K, Mg, Mn, Na and Pb were analysed in triplicate using a Varian 520 flame atomic absorption spectrometer (AAS) supplied by PerkinElmer, after acid digestion with nitric acid in a fume cupboard. The average pH of the river from eight different sampling sites was 8.00 and 9.38 in wet and dry season respectively. Higher EC values were determined in the dry season (138.7 mS/m) than in the wet season (96.93 mS/m). Similarly, TDS values were higher in dry (929.29 mg/L) than in the wet season (640.72 mg/L) season. These values exceeded the recommended guideline of South Africa Department of Water Affairs and Forestry (DWAF) for domestic water use (70 mS/m) and that of the World Health Organization (WHO) (600 mS/m), respectively. Turbidity varied between 1.78-5.20 and 0.95-2.37 NTU in both wet and dry seasons. Total hardness of 312.50 mg/L and 297.75 mg/L as the concentration of CaCO3 was computed for the river in both the wet and the dry seasons and the river water was categorised as very hard. Mean concentration of the metals studied in both the wet and the dry seasons are: Na (94.06 mg/L and 196.3 mg/L), K (11.79 mg/L and 13.62 mg/L), Ca (45.60 mg/L and 41.30 mg/L), Mg (48.41 mg/L and 44.71 mg/L), Al (0.31 mg/L and 0.38 mg/L), Cd (0.01 mg/L and 0.01 mg/L), Cr (0.02 mg/L and 0.09 mg/L), Pb (0.05 mg/L and 0.06 mg/L), Mn (0.31 mg/L and 0.11 mg/L) and Fe (0.76 mg/L and 0.69 mg/L). Results from this study reveal that most of the metals were present in concentrations higher than the recommended guidelines of DWAF and WHO for domestic use and the protection of aquatic life.Keywords: contamination, mining activities, surface water, trace metals
Procedia PDF Downloads 318832 Design of Personal Job Recommendation Framework on Smartphone Platform
Authors: Chayaporn Kaensar
Abstract:
Recently, Job Recommender Systems have gained much attention in industries since they solve the problem of information overload on the recruiting website. Therefore, we proposed Extended Personalized Job System that has the capability of providing the appropriate jobs for job seeker and recommending some suitable information for them using Data Mining Techniques and Dynamic User Profile. On the other hands, company can also interact to the system for publishing and updating job information. This system have emerged and supported various platforms such as web application and android mobile application. In this paper, User profiles, Implicit User Action, User Feedback, and Clustering Techniques in WEKA libraries have gained attention and implemented for this application. In additions, open source tools like Yii Web Application Framework, Bootstrap Front End Framework and Android Mobile Technology were also applied.Keywords: recommendation, user profile, data mining, web and mobile technology
Procedia PDF Downloads 313831 Mining User-Generated Contents to Detect Service Failures with Topic Model
Authors: Kyung Bae Park, Sung Ho Ha
Abstract:
Online user-generated contents (UGC) significantly change the way customers behave (e.g., shop, travel), and a pressing need to handle the overwhelmingly plethora amount of various UGC is one of the paramount issues for management. However, a current approach (e.g., sentiment analysis) is often ineffective for leveraging textual information to detect the problems or issues that a certain management suffers from. In this paper, we employ text mining of Latent Dirichlet Allocation (LDA) on a popular online review site dedicated to complaint from users. We find that the employed LDA efficiently detects customer complaints, and a further inspection with the visualization technique is effective to categorize the problems or issues. As such, management can identify the issues at stake and prioritize them accordingly in a timely manner given the limited amount of resources. The findings provide managerial insights into how analytics on social media can help maintain and improve their reputation management. Our interdisciplinary approach also highlights several insights by applying machine learning techniques in marketing research domain. On a broader technical note, this paper illustrates the details of how to implement LDA in R program from a beginning (data collection in R) to an end (LDA analysis in R) since the instruction is still largely undocumented. In this regard, it will help lower the boundary for interdisciplinary researcher to conduct related research.Keywords: latent dirichlet allocation, R program, text mining, topic model, user generated contents, visualization
Procedia PDF Downloads 187830 Integration of Educational Data Mining Models to a Web-Based Support System for Predicting High School Student Performance
Authors: Sokkhey Phauk, Takeo Okazaki
Abstract:
The challenging task in educational institutions is to maximize the high performance of students and minimize the failure rate of poor-performing students. An effective method to leverage this task is to know student learning patterns with highly influencing factors and get an early prediction of student learning outcomes at the timely stage for setting up policies for improvement. Educational data mining (EDM) is an emerging disciplinary field of data mining, statistics, and machine learning concerned with extracting useful knowledge and information for the sake of improvement and development in the education environment. The study is of this work is to propose techniques in EDM and integrate it into a web-based system for predicting poor-performing students. A comparative study of prediction models is conducted. Subsequently, high performing models are developed to get higher performance. The hybrid random forest (Hybrid RF) produces the most successful classification. For the context of intervention and improving the learning outcomes, a feature selection method MICHI, which is the combination of mutual information (MI) and chi-square (CHI) algorithms based on the ranked feature scores, is introduced to select a dominant feature set that improves the performance of prediction and uses the obtained dominant set as information for intervention. By using the proposed techniques of EDM, an academic performance prediction system (APPS) is subsequently developed for educational stockholders to get an early prediction of student learning outcomes for timely intervention. Experimental outcomes and evaluation surveys report the effectiveness and usefulness of the developed system. The system is used to help educational stakeholders and related individuals for intervening and improving student performance.Keywords: academic performance prediction system, educational data mining, dominant factors, feature selection method, prediction model, student performance
Procedia PDF Downloads 106829 Development of New Technology Evaluation Model by Using Patent Information and Customers' Review Data
Authors: Kisik Song, Kyuwoong Kim, Sungjoo Lee
Abstract:
Many global firms and corporations derive new technology and opportunity by identifying vacant technology from patent analysis. However, previous studies failed to focus on technologies that promised continuous growth in industrial fields. Most studies that derive new technology opportunities do not test practical effectiveness. Since previous studies depended on expert judgment, it became costly and time-consuming to evaluate new technologies based on patent analysis. Therefore, research suggests a quantitative and systematic approach to technology evaluation indicators by using patent data to and from customer communities. The first step involves collecting two types of data. The data is used to construct evaluation indicators and apply these indicators to the evaluation of new technologies. This type of data mining allows a new method of technology evaluation and better predictor of how new technologies are adopted.Keywords: data mining, evaluating new technology, technology opportunity, patent analysis
Procedia PDF Downloads 377828 Decision Making System for Clinical Datasets
Authors: P. Bharathiraja
Abstract:
Computer Aided decision making system is used to enhance diagnosis and prognosis of diseases and also to assist clinicians and junior doctors in clinical decision making. Medical Data used for decision making should be definite and consistent. Data Mining and soft computing techniques are used for cleaning the data and for incorporating human reasoning in decision making systems. Fuzzy rule based inference technique can be used for classification in order to incorporate human reasoning in the decision making process. In this work, missing values are imputed using the mean or mode of the attribute. The data are normalized using min-ma normalization to improve the design and efficiency of the fuzzy inference system. The fuzzy inference system is used to handle the uncertainties that exist in the medical data. Equal-width-partitioning is used to partition the attribute values into appropriate fuzzy intervals. Fuzzy rules are generated using Class Based Associative rule mining algorithm. The system is trained and tested using heart disease data set from the University of California at Irvine (UCI) Machine Learning Repository. The data was split using a hold out approach into training and testing data. From the experimental results it can be inferred that classification using fuzzy inference system performs better than trivial IF-THEN rule based classification approaches. Furthermore it is observed that the use of fuzzy logic and fuzzy inference mechanism handles uncertainty and also resembles human decision making. The system can be used in the absence of a clinical expert to assist junior doctors and clinicians in clinical decision making.Keywords: decision making, data mining, normalization, fuzzy rule, classification
Procedia PDF Downloads 517827 Feature Based Unsupervised Intrusion Detection
Authors: Deeman Yousif Mahmood, Mohammed Abdullah Hussein
Abstract:
The goal of a network-based intrusion detection system is to classify activities of network traffics into two major categories: normal and attack (intrusive) activities. Nowadays, data mining and machine learning plays an important role in many sciences; including intrusion detection system (IDS) using both supervised and unsupervised techniques. However, one of the essential steps of data mining is feature selection that helps in improving the efficiency, performance and prediction rate of proposed approach. This paper applies unsupervised K-means clustering algorithm with information gain (IG) for feature selection and reduction to build a network intrusion detection system. For our experimental analysis, we have used the new NSL-KDD dataset, which is a modified dataset for KDDCup 1999 intrusion detection benchmark dataset. With a split of 60.0% for the training set and the remainder for the testing set, a 2 class classifications have been implemented (Normal, Attack). Weka framework which is a java based open source software consists of a collection of machine learning algorithms for data mining tasks has been used in the testing process. The experimental results show that the proposed approach is very accurate with low false positive rate and high true positive rate and it takes less learning time in comparison with using the full features of the dataset with the same algorithm.Keywords: information gain (IG), intrusion detection system (IDS), k-means clustering, Weka
Procedia PDF Downloads 296826 Clustering Ethno-Informatics of Naming Village in Java Island Using Data Mining
Authors: Atje Setiawan Abdullah, Budi Nurani Ruchjana, I. Gede Nyoman Mindra Jaya, Eddy Hermawan
Abstract:
Ethnoscience is used to see the culture with a scientific perspective, which may help to understand how people develop various forms of knowledge and belief, initially focusing on the ecology and history of the contributions that have been there. One of the areas studied in ethnoscience is etno-informatics, is the application of informatics in the culture. In this study the science of informatics used is data mining, a process to automatically extract knowledge from large databases, to obtain interesting patterns in order to obtain a knowledge. While the application of culture described by naming database village on the island of Java were obtained from Geographic Indonesia Information Agency (BIG), 2014. The purpose of this study is; first, to classify the naming of the village on the island of Java based on the structure of the word naming the village, including the prefix of the word, syllable contained, and complete word. Second to classify the meaning of naming the village based on specific categories, as well as its role in the community behavioral characteristics. Third, how to visualize the naming of the village to a map location, to see the similarity of naming villages in each province. In this research we have developed two theorems, i.e theorems area as a result of research studies have collected intersection naming villages in each province on the island of Java, and the composition of the wedge theorem sets the provinces in Java is used to view the peculiarities of a location study. The methodology in this study base on the method of Knowledge Discovery in Database (KDD) on data mining, the process includes preprocessing, data mining and post processing. The results showed that the Java community prioritizes merit in running his life, always working hard to achieve a more prosperous life, and love as well as water and environmental sustainment. Naming villages in each location adjacent province has a high degree of similarity, and influence each other. Cultural similarities in the province of Central Java, East Java and West Java-Banten have a high similarity, whereas in Jakarta-Yogyakarta has a low similarity. This research resulted in the cultural character of communities within the meaning of the naming of the village on the island of Java, this character is expected to serve as a guide in the behavior of people's daily life on the island of Java.Keywords: ethnoscience, ethno-informatics, data mining, clustering, Java island culture
Procedia PDF Downloads 283825 Text Mining Analysis of the Reconstruction Plans after the Great East Japan Earthquake
Authors: Minami Ito, Akihiro Iijima
Abstract:
On March 11, 2011, the Great East Japan Earthquake occurred off the coast of Sanriku, Japan. It is important to build a sustainable society through the reconstruction process rather than simply restoring the infrastructure. To compare the goals of reconstruction plans of quake-stricken municipalities, Japanese language morphological analysis was performed by using text mining techniques. Frequently-used nouns were sorted into four main categories of “life”, “disaster prevention”, “economy”, and “harmony with environment”. Because Soma City is affected by nuclear accident, sentences tagged to “harmony with environment” tended to be frequent compared to the other municipalities. Results from cluster analysis and principle component analysis clearly indicated that the local government reinforces the efforts to reduce risks from radiation exposure as a top priority.Keywords: eco-friendly reconstruction, harmony with environment, decontamination, nuclear disaster
Procedia PDF Downloads 220824 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data
Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad
Abstract:
Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction
Procedia PDF Downloads 340823 Implementation of Dozer Push Measurement under Payment Mechanism in Mining Operation
Authors: Anshar Ajatasatru
Abstract:
The decline of coal prices over past years have been significantly increasing the awareness of effective mining operation. A viable step must be undertaken in becoming more cost competitive while striving for best mining practice especially at Melak Coal Mine in East Kalimantan, Indonesia. This paper aims to show how effective dozer push measurement method can be implemented as it is controlled by contract rate on the unit basis of USD ($) per bcm. The method emerges from an idea of daily dozer push activity that continually shifts the overburden until final target design by mine planning. Volume calculation is then performed by calculating volume of each time overburden is removed within determined distance using cut and fill method from a high precision GNSS system which is applied into dozer as a guidance to ensure the optimum result of overburden removal. Accumulation of daily to weekly dozer push volume is found 95 bcm which is multiplied by average sell rate of $ 0,95, thus the amount monthly revenue is $ 90,25. Furthermore, the payment mechanism is then based on push distance and push grade. The push distance interval will determine the rates that vary from $ 0,9 - $ 2,69 per bcm and are influenced by certain push slope grade from -25% until +25%. The amount payable rates for dozer push operation shall be specifically following currency adjustment and is to be added to the monthly overburden volume claim, therefore, the sell rate of overburden volume per bcm may fluctuate depends on the real time exchange rate of Jakarta Interbank Spot Dollar Rate (JISDOR). The result indicates that dozer push measurement can be one of the surface mining alternative since it has enabled to refine method of work, operating cost and productivity improvement apart from exposing risk of low rented equipment performance. In addition, payment mechanism of contract rate by dozer push operation scheduling will ultimately deliver clients by almost 45% cost reduction in the form of low and consistent cost.Keywords: contract rate, cut-fill method, dozer push, overburden volume
Procedia PDF Downloads 316822 Unravelling the Relationship Between Maternal and Fetal ACE2 Gene Polymorphism and Preeclampsia Risk
Authors: Sonia Tamanna, Akramul Hassan, Mohammad Shakil Mahmood, Farzana Ansari, Gowhar Rashid, Mir Fahim Faisal, M. Zakir Hossain Howlader
Abstract:
Background: Preeclampsia (PE), a pregnancy-specific hypertensive disorder, significantly impacts maternal and fetal health. It is particularly prevalent in underdeveloped countries and is linked to preterm delivery and fetal growth. The renin-angiotensin system (RAS) plays a crucial role in ensuring a successful pregnancy outcome, with Angiotensin-Converting Enzyme 2 (ACE2) being a key component. ACE2 converts ANG II to Ang-(1-7), offering protection against ANG II-induced stress and inflammation while regulating blood pressure and osmotic balance during pregnancy. The reduced maternal plasma angiotensin-converting enzyme 2 (ACE2) seen in preeclampsia might contribute to its pathogenesis. However, there has been a dearth of comprehensive research into the association between ACE2 gene polymorphism and preeclampsia. In the South Asian population, hypertension is strongly linked to two SNPs: rs2285666 and rs879922. This genotype was therefore considered, and the possible association of maternal and fetal ACE2 gene polymorphism with preeclampsia within the Bangladeshi population was evaluated. Method: DNA was extracted from peripheral white blood cells (WBCs) using the organic method, and SNP genotyping was done via PCR-RFLP. Odds ratios (OR) with 95% confidence intervals (95% CI) were calculated using logistic regression to determine relative risk. Result: A comprehensive case-control study was conducted on 51 PE patients and their infants, along with 56 control subjects and their infants. Maternal single nuvleotide polymorphisms (SNP) (rs2285666) analysis revealed a strong association between the TT genotype and preeclampsia, with a four-fold increased risk in mothers (P=0.024, OR=4.00, 95% CI=1.36-11.37) compared to their ancestral genotype CC. However, the CT genotype (rs2285666) showed no significant difference (P=0.46, OR=1.54, 95% CI=0.57-4.14). Notably, no significant correlation was found in infants, regardless of their gender. For rs879922, no significant association was observed in both mothers and infants. This pioneering study suggests that mothers carrying the ACE2 gene variant rs2285666 (TT allele) may be at higher risk for preeclampsia, potentially influencing hypertension characteristics, whereas rs879922 does not appear to be associated with developing preeclampsia. Conclusion: This study sheds light on the role of ACE2 gene polymorphism, particularly the rs2285666 TT allele, in maternal susceptibility to preeclampsia. However, rs879922 does not appear to be linked to the risk of PE. This research contributes to our understanding of the genetic underpinnings of preeclampsia, offering insights into potential avenues for prevention and management.Keywords: ACE2, PCR-RFLP, preeclampsia, single nuvleotide polymorphisms (SNPs)
Procedia PDF Downloads 61821 Fake News Detection for Korean News Using Machine Learning Techniques
Authors: Tae-Uk Yun, Pullip Chung, Kee-Young Kwahk, Hyunchul Ahn
Abstract:
Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection using machine learning techniques over the past years. But, there have been no prior studies proposed an automated fake news detection method for Korean news to our best knowledge. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (topic modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as logistic regression, backpropagation network, support vector machine, and deep neural network can be applied. To validate the effectiveness of the proposed method, we collected about 200 short Korean news from Seoul National University’s FactCheck. which provides with detailed analysis reports from 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.Keywords: fake news detection, Korean news, machine learning, text mining
Procedia PDF Downloads 275820 Mining Riding Patterns in Bike-Sharing System Connecting with Public Transportation
Authors: Chong Zhang, Guoming Tang, Bin Ge, Jiuyang Tang
Abstract:
With the fast growing road traffic and increasingly severe traffic congestion, more and more citizens choose to use the public transportation for daily travelling. Meanwhile, the shared bike provides a convenient option for the first and last mile to the public transit. As of 2016, over one thousand cities around the world have deployed the bike-sharing system. The combination of these two transportations have stimulated the development of each other and made significant contribution to the reduction of carbon footprint. A lot of work has been done on mining the riding behaviors in various bike-sharing systems. Most of them, however, treated the bike-sharing system as an isolated system and thus their results provide little reference for the public transit construction and optimization. In this work, we treat the bike-sharing and public transit as a whole and investigate the customers’ bike-and-ride behaviors. Specifically, we develop a spatio-temporal traffic delivery model to study the riding patterns between the two transportation systems and explore the traffic characteristics (e.g., distributions of customer arrival/departure and traffic peak hours) from the time and space dimensions. During the model construction and evaluation, we make use of large open datasets from real-world bike-sharing systems (the CitiBike in New York, GoBike in San Francisco and BIXI in Montreal) along with corresponding public transit information. The developed two-dimension traffic model, as well as the mined bike-and-ride behaviors, can provide great help to the deployment of next-generation intelligent transportation systems.Keywords: riding pattern mining, bike-sharing system, public transportation, bike-and-ride behavior
Procedia PDF Downloads 783