Search results for: sequential pattern mining
3725 A Web Service-Based Framework for Mining E-Learning Data
Authors: Felermino D. M. A. Ali, S. C. Ng
Abstract:
E-learning is an evolutionary form of distance learning and has become better over time as new technologies emerged. Today, efforts are still being made to embrace E-learning systems with emerging technologies in order to make them better. Among these advancements, Educational Data Mining (EDM) is one that is gaining a huge and increasing popularity due to its wide application for improving the teaching-learning process in online practices. However, even though EDM promises to bring many benefits to educational industry in general and E-learning environments in particular, its principal drawback is the lack of easy to use tools. The current EDM tools usually require users to have some additional technical expertise to effectively perform EDM tasks. Thus, in response to these limitations, this study intends to design and implement an EDM application framework which aims at automating and simplify the development of EDM in E-learning environment. The application framework introduces a Service-Oriented Architecture (SOA) that hides the complexity of technical details and enables users to perform EDM in an automated fashion. The framework was designed based on abstraction, extensibility, and interoperability principles. The framework implementation was made up of three major modules. The first module provides an abstraction for data gathering, which was done by extending Moodle LMS (Learning Management System) source code. The second module provides data mining methods and techniques as services; it was done by converting Weka API into a set of Web services. The third module acts as an intermediary between the first two modules, it contains a user-friendly interface that allows dynamically locating data provider services, and running knowledge discovery tasks on data mining services. An experiment was conducted to evaluate the overhead of the proposed framework through a combination of simulation and implementation. The experiments have shown that the overhead introduced by the SOA mechanism is relatively small, therefore, it has been concluded that a service-oriented architecture can be effectively used to facilitate educational data mining in E-learning environments.Keywords: educational data mining, e-learning, distributed data mining, moodle, service-oriented architecture, Weka
Procedia PDF Downloads 2363724 Trace Logo: A Notation for Representing Control-Flow of Operational Process
Authors: M. V. Manoj Kumar, Likewin Thomas, Annappa
Abstract:
Process mining research discipline bridges the gap between data mining and business process modeling and analysis, it offers the process-centric and end-to-end methods/techniques for analyzing information of real-world process detailed in operational event-logs. In this paper, we have proposed a notation called trace logo for graphically representing control-flow perspective (order of execution of activities) of process. A trace logo consists of a stack of activity names at each position, sizes of the activity name indicates their frequency in the traces and the total height of the activity depicts the information content of the position. A trace logo created from a set of aligned traces generated using Multiple Trace Alignment technique.Keywords: consensus trace, process mining, multiple trace alignment, trace logo
Procedia PDF Downloads 3483723 Data-Mining Approach to Analyzing Industrial Process Information for Real-Time Monitoring
Authors: Seung-Lock Seo
Abstract:
This work presents a data-mining empirical monitoring scheme for industrial processes with partially unbalanced data. Measurement data of good operations are relatively easy to gather, but in unusual special events or faults it is generally difficult to collect process information or almost impossible to analyze some noisy data of industrial processes. At this time some noise filtering techniques can be used to enhance process monitoring performance in a real-time basis. In addition, pre-processing of raw process data is helpful to eliminate unwanted variation of industrial process data. In this work, the performance of various monitoring schemes was tested and demonstrated for discrete batch process data. It showed that the monitoring performance was improved significantly in terms of monitoring success rate of given process faults.Keywords: data mining, process data, monitoring, safety, industrial processes
Procedia PDF Downloads 4003722 Growth Pattern and Condition Factor of Oreochromis niloticus and Sarotherodon galilaeus in Epe Lagoon, Lagos State, Nigeria
Authors: Ahmed Bolaji Alarape, Oluwatobi Damilola Aba
Abstract:
The growth pattern of Oreochromis niloticus and Sarotherodon galilaeus in Epe Lagoon Lagos State was investigated. One hundred (100) samples of each species were collected from fishermen at the landing site. They were transported to the Fisheries Laboratory of National Institute of Oceanography for identification, sexing morphometric measurement. The results showed that 58.0% and 56.0 % of the O.niloticus and S.galilaeus were female respectively while 42.0% and 44.0% were male respectively. The length-weight relationship of O.niloticus showed a strong regression coefficient (r = 0.944) (p<0.05) for the combined sex, (r =0.901) (p<0.05) for female and (r=0.985) (p<.05) for male with b-value of 2.5, 3.1 and 2.8 respectively. The S.galilaeus also showed a regression coefficient of r=0.970; p<0.05 for the combined sex, r=0.953; p<0.05 for the female and r= 0.979; p<0.05 for the male with b-value of 3.4, 3.1 and 3.6 respectively. O.niloticus showed an isometric growth pattern both in male and female. The condition factor in O.niloticus are 1.93 and 1.95 for male and female respectively while that of S.galilaeus is 1.95 for both sexes. Positive allometric was observed in both species except the male O.niloticus that showed negative allometric growth pattern. From the results of this study, the growth pattern of the two species indicated a good healthy environment.Keywords: Epe Lagoon, length-weight relationship, Oreochromis niloticus, Sarotherodon galilaeus
Procedia PDF Downloads 1463721 Retina Registration for Biometrics Based on Characterization of Retinal Feature Points
Authors: Nougrara Zineb
Abstract:
The unique structure of the blood vessels in the retina has been used for biometric identification. The retina blood vessel pattern is a unique pattern in each individual and it is almost impossible to forge that pattern in a false individual. The retina biometrics’ advantages include high distinctiveness, universality, and stability overtime of the blood vessel pattern. Once the creases have been extracted from the images, a registration stage is necessary, since the position of the retinal vessel structure could change between acquisitions due to the movements of the eye. Image registration consists of following steps: Feature detection, feature matching, transform model estimation and image resembling and transformation. In this paper, we present an algorithm of registration; it is based on the characterization of retinal feature points. For experiments, retinal images from the DRIVE database have been tested. The proposed methodology achieves good results for registration in general.Keywords: fovea, optic disc, registration, retinal images
Procedia PDF Downloads 2663720 Process Mining as an Ecosystem Platform to Mitigate a Deficiency of Processes Modelling
Authors: Yusra Abdulsalam Alqamati, Ahmed Alkilany
Abstract:
The teaching staff is a distinct group whose impact is on the educational process and which plays an important role in enhancing the quality of the academic education process. To improve the management effectiveness of the academy, the Teaching Staff Management System (TSMS) proposes that all teacher processes be digitized. Since the BPMN approach can accurately describe the processes, it lacks a clear picture of the process flow map, something that the process mining approach has, which is extracting information from event logs for discovery, monitoring, and model enhancement. Therefore, these two methodologies were combined to create the most accurate representation of system operations, the ability to extract data records and mining processes, recreate them in the form of a Petri net, and then generate them in a BPMN model for a more in-depth view of process flow. Additionally, the TSMS processes will be orchestrated to handle all requests in a guaranteed small-time manner thanks to the integration of the Google Cloud Platform (GCP), the BPM engine, and allowing business owners to take part throughout the entire TSMS project development lifecycle.Keywords: process mining, BPM, business process model and notation, Petri net, teaching staff, Google Cloud Platform
Procedia PDF Downloads 1413719 Optimization of Air Pollution Control Model for Mining
Authors: Zunaira Asif, Zhi Chen
Abstract:
The sustainable measures on air quality management are recognized as one of the most serious environmental concerns in the mining region. The mining operations emit various types of pollutants which have significant impacts on the environment. This study presents a stochastic control strategy by developing the air pollution control model to achieve a cost-effective solution. The optimization method is formulated to predict the cost of treatment using linear programming with an objective function and multi-constraints. The constraints mainly focus on two factors which are: production of metal should not exceed the available resources, and air quality should meet the standard criteria of the pollutant. The applicability of this model is explored through a case study of an open pit metal mine, Utah, USA. This method simultaneously uses meteorological data as a dispersion transfer function to support the practical local conditions. The probabilistic analysis and the uncertainties in the meteorological conditions are accomplished by Monte Carlo simulation. Reasonable results have been obtained to select the optimized treatment technology for PM2.5, PM10, NOx, and SO2. Additional comparison analysis shows that baghouse is the least cost option as compared to electrostatic precipitator and wet scrubbers for particulate matter, whereas non-selective catalytical reduction and dry-flue gas desulfurization are suitable for NOx and SO2 reduction respectively. Thus, this model can aid planners to reduce these pollutants at a marginal cost by suggesting control pollution devices, while accounting for dynamic meteorological conditions and mining activities.Keywords: air pollution, linear programming, mining, optimization, treatment technologies
Procedia PDF Downloads 2083718 A Hybrid System for Boreholes Soil Sample
Authors: Ali Ulvi Uzer
Abstract:
Data reduction is an important topic in the field of pattern recognition applications. The basic concept is the reduction of multitudinous amounts of data down to the meaningful parts. The Principal Component Analysis (PCA) method is frequently used for data reduction. The Support Vector Machine (SVM) method is a discriminative classifier formally defined by a separating hyperplane. In other words, given labeled training data, the algorithm outputs an optimal hyperplane which categorizes new examples. This study offers a hybrid approach that uses the PCA for data reduction and Support Vector Machines (SVM) for classification. In order to detect the accuracy of the suggested system, two boreholes taken from the soil sample was used. The classification accuracies for this dataset were obtained through using ten-fold cross-validation method. As the results suggest, this system, which is performed through size reduction, is a feasible system for faster recognition of dataset so our study result appears to be very promising.Keywords: feature selection, sequential forward selection, support vector machines, soil sample
Procedia PDF Downloads 4553717 Financial Assessment of the Hard Coal Mining in the Chosen Region in the Czech Republic: Real Options Methodology Application
Authors: Miroslav Čulík, Petr Gurný
Abstract:
This paper is aimed at the financial assessment of the hard coal mining in a given region by real option methodology application. Hard coal mining in this mine makes net loss for the owner during the last years due to the long-term unfavourable mining conditions and significant drop in the coal prices during the last years. Management is going to shut down the operation and abandon the project to reduce the loss of the company. The goal is to assess whether the shutting down the operation is the only and correct solution of the problem. Due to the uncertainty in the future hard coal price evolution, the production might be again restarted if the price raises enough to cover the cost of the production. For the assessment, real option methodology is applied, which captures two important aspect of the financial decision-making: risk and flexibility. The paper is structured as follows: first, current state is described and problem is analysed. Next, methodology of real options is described. At last, project is evaluated by applying real option methodology. The results are commented and recommendations are provided.Keywords: real option, investment, option to abandon, option to shut down and restart, risk, flexibility
Procedia PDF Downloads 5483716 Building Biodiversity Conservation Plans Robust to Human Land Use Uncertainty
Authors: Yingxiao Ye, Christopher Doehring, Angelos Georghiou, Hugh Robinson, Phebe Vayanos
Abstract:
Human development is a threat to biodiversity, and conservation organizations (COs) are purchasing land to protect areas for biodiversity preservation. However, COs have limited budgets and thus face hard prioritization decisions that are confounded by uncertainty in future human land use. This research proposes a data-driven sequential planning model to help COs choose land parcels that minimize the uncertain human impact on biodiversity. The proposed model is robust to uncertain development, and the sequential decision-making process is adaptive, allowing land purchase decisions to adapt to human land use as it unfolds. The cellular automata model is leveraged to simulate land use development based on climate data, land characteristics, and development threat index from NASA Socioeconomic Data and Applications Center. This simulation is used to model uncertainty in the problem. This research leverages state-of-the-art techniques in the robust optimization literature to propose a computationally tractable reformulation of the model, which can be solved routinely by off-the-shelf solvers like Gurobi or CPLEX. Numerical results based on real data from the Jaguar in Central and South America show that the proposed method reduces conservation loss by 19.46% on average compared to standard approaches such as MARXAN used in practice for biodiversity conservation. Our method may better help guide the decision process in land acquisition and thereby allow conservation organizations to maximize the impact of limited resources.Keywords: data-driven robust optimization, biodiversity conservation, uncertainty simulation, adaptive sequential planning
Procedia PDF Downloads 2083715 Satellite Data to Understand Changes in Carbon Dioxide for Surface Mining and Green Zone
Authors: Carla Palencia-Aguilar
Abstract:
In order to attain the 2050’s zero emissions goal, it is necessary to know the carbon dioxide changes over time either from pollution to attenuations in the mining industry versus at green zones to establish real goals and redirect efforts to reduce greenhouse effects. Two methods were used to compute the amount of CO2 tons in specific mining zones in Colombia. The former by means of NPP with MODIS MOD17A3HGF from years 2000 to 2021. The latter by using MODIS MYD021KM bands 33 to 36 with maximum values of 644 data points distributed in 7 sites corresponding to surface mineral mining of: coal, nickel, iron and limestone. The green zones selected were located at the proximities of the studied sites, but further than 1 km to avoid information overlapping. Year 2012 was selected for method 2 to compare the results with data provided by the Colombian government to determine range of values. Some data was compared with 2022 MODIS energy values and converted to kton of CO2 by using the Greenhouse Gas Equivalencies Calculator by EPA. The results showed that Nickel mining was the least pollutant with 81 kton of CO2 e.q on average and maximum of 102 kton of CO2 e.q. per year, with green zones attenuating carbon dioxide in 103 kton of CO2 on average and 125 kton maximum per year in the last 22 years. Following Nickel, there was Coal with average kton of CO2 per year of 152 and maximum of 188, values very similar to the subjacent green zones with average and maximum kton of CO2 of 157 and 190 respectively. Iron had similar results with respect to 3 Limestone sites with average values of 287 kton of CO2 for mining and 310 kton for green zones, and maximum values of 310 kton for iron mining and 356 kton for green zones. One of the limestone sites exceeded the other sites with an average value of 441 kton per year and maximum of 490 kton per year, eventhough it had higher attenuation by green zones than a close Limestore site (3.5 Km apart): 371 kton versus 281 kton on average and maximum 416 kton versus 323 kton, such vegetation contribution is not enough, meaning that manufacturing process should be improved for the most pollutant site. By comparing bands 33 to 36 for years 2012 and 2022 from January to August, it can be seen that on average the kton of CO2 were similar for mining sites and green zones; showing an average yearly balance of carbon dioxide emissions and attenuation. However, efforts on improving manufacturing process are needed to overcome the carbon dioxide effects specially during emissions’ peaks because surrounding vegetation cannot fully attenuate it.Keywords: carbon dioxide, MODIS, surface mining, vegetation
Procedia PDF Downloads 1013714 A Simple Fluid Dynamic Model for Slippery Pulse Pattern in Traditional Chinese Pulse Diagnosis
Authors: Yifang Gong
Abstract:
Pulse diagnosis is one of the most important diagnosis methods in traditional Chinese medicine. It is also the trickiest method to learn. It is known as that it can only to be sensed not explained. This becomes a serious threat to the survival of this diagnostic method. However, there are a large amount of experiences accumulated during the several thousand years of practice of Chinese doctors. A pulse pattern called 'Slippery pulse' is one of the indications of pregnancy. A simple fluid dynamic model is proposed to simulate the effects of the existence of a placenta. The placenta is modeled as an extra plenum in an extremely simplified fluid network model. It is found that because of the existence of the extra plenum, indeed the pulse pattern shows a secondary peak in one pulse period. As for the author’s knowledge, this work is the first time to show the link between Pulse diagnoses and basic physical principle. Key parameters which might affect the pattern are also investigated.Keywords: Chinese medicine, flow network, pregnancy, pulse
Procedia PDF Downloads 3833713 Design of an Air and Land Multi-Element Expression Pattern of Navigation Electronic Map for Ground Vehicles under United Navigation Mechanism
Authors: Rui Liu, Pengyu Cui, Nan Jiang
Abstract:
At present, there is much research on the application of centralized management and cross-integration application of basic geographic information. However, the idea of information integration and sharing between land, sea, and air navigation targets is not deeply applied into the research of navigation information service, especially in the information expression. Targeting at this problem, the paper carries out works about the expression pattern of navigation electronic map for ground vehicles under air and land united navigation mechanism. At first, with the support from multi-source information fusion of GIS vector data, RS data, GPS data, etc., an air and land united information expression pattern is designed aiming at specific navigation task of emergency rescue in the earthquake. And then, the characteristics and specifications of the united expression of air and land navigation information under the constraints of map load are summarized and transferred into expression rules in the rule bank. At last, the related navigation experiment is implemented to evaluate the effect of the expression pattern. The experiment selects evaluation factors of the navigation task accomplishment time and the navigation error rate as the main index, and make comparisons with the traditional single information expression pattern. To sum up, the research improved the theory of navigation electronic map and laid a certain foundation for the design and realization of united navigation system in the aspect of real-time navigation information delivery.Keywords: navigation electronic map, united navigation, multi-element expression pattern, multi-source information fusion
Procedia PDF Downloads 1993712 Anticandidal and Antibacterial Silver and Silver(Core)-Gold(Shell) Bimetallic Nanoparticles by Fusarium graminearum
Authors: Dipali Nagaonkar, Mahendra Rai
Abstract:
Nanotechnology has experienced significant developments in engineered nanomaterials in the core-shell arrangement. Nanomaterials having nanolayers of silver and gold are of primary interest due to their wide applications in catalytical and biomedical fields. Further, mycosynthesis of nanoparticles has been proved as a sustainable synthetic approach of nanobiotechnology. In this context, we have synthesized silver and silver (core)-gold (shell) bimetallic nanoparticles using a fungal extract of Fusarium graminearum by sequential reduction. The core-shell deposition of nanoparticles was confirmed by the red shift in the surface plasmon resonance from 434 nm to 530 nm with the aid of the UV-Visible spectrophotometer. The mean particle size of Ag and Ag-Au nanoparticles was confirmed by nanoparticle tracking analysis as 37 nm and 50 nm respectively. Quite polydispersed and spherical nanoparticles are evident by TEM analysis. These mycosynthesized bimetallic nanoparticles were tested against some pathogenic bacteria and Candida sp. The antimicrobial analysis confirmed enhanced anticandidal and antibacterial potential of bimetallic nanoparticles over their monometallic counterparts.Keywords: bimetallic nanoparticles, core-shell arrangement, mycosynthesis, sequential reduction
Procedia PDF Downloads 5723711 Bayesian Network and Feature Selection for Rank Deficient Inverse Problem
Authors: Kyugneun Lee, Ikjin Lee
Abstract:
Parameter estimation with inverse problem often suffers from unfavorable conditions in the real world. Useless data and many input parameters make the problem complicated or insoluble. Data refinement and reformulation of the problem can solve that kind of difficulties. In this research, a method to solve the rank deficient inverse problem is suggested. A multi-physics system which has rank deficiency caused by response correlation is treated. Impeditive information is removed and the problem is reformulated to sequential estimations using Bayesian network (BN) and subset groups. At first, subset grouping of the responses is performed. Feature selection with singular value decomposition (SVD) is used for the grouping. Next, BN inference is used for sequential conditional estimation according to the group hierarchy. Directed acyclic graph (DAG) structure is organized to maximize the estimation ability. Variance ratio of response to noise is used to pairing the estimable parameters by each response.Keywords: Bayesian network, feature selection, rank deficiency, statistical inverse analysis
Procedia PDF Downloads 3143710 A Framework of Product Information Service System Using Mobile Image Retrieval and Text Mining Techniques
Authors: Mei-Yi Wu, Shang-Ming Huang
Abstract:
The online shoppers nowadays often search the product information on the Internet using some keywords of products. To use this kind of information searching model, shoppers should have a preliminary understanding about their interesting products and choose the correct keywords. However, if the products are first contact (for example, the worn clothes or backpack of passengers which you do not have any idea about the brands), these products cannot be retrieved due to insufficient information. In this paper, we discuss and study the applications in E-commerce using image retrieval and text mining techniques. We design a reasonable E-commerce application system containing three layers in the architecture to provide users product information. The system can automatically search and retrieval similar images and corresponding web pages on Internet according to the target pictures which taken by users. Then text mining techniques are applied to extract important keywords from these retrieval web pages and search the prices on different online shopping stores with these keywords using a web crawler. Finally, the users can obtain the product information including photos and prices of their favorite products. The experiments shows the efficiency of proposed system.Keywords: mobile image retrieval, text mining, product information service system, online marketing
Procedia PDF Downloads 3593709 Real-Time Mine Safety System with the Internet of Things
Authors: Şakir Bingöl, Bayram İslamoğlu, Ebubekir Furkan Tepeli, Fatih Mehmet Karakule, Fatih Küçük, Merve Sena Arpacık, Mustafa Taha Kabar, Muhammet Metin Molak, Osman Emre Turan, Ömer Faruk Yesir, Sıla İnanır
Abstract:
This study introduces an IoT-based real-time safety system for mining, addressing global safety challenges. The wearable device, seamlessly integrated into miners' jackets, employs LoRa technology for communication and offers real-time monitoring of vital health and environmental data. Unique features include an LCD panel for immediate information display and sound-based location tracking for emergency response. The methodology involves sensor integration, data transmission, and ethical testing. Validation confirms the system's effectiveness in diverse mining scenarios. The study calls for ongoing research to adapt the system to different mining contexts, emphasizing its potential to significantly enhance safety standards in the industry.Keywords: mining safety, internet of things, wearable technology, LoRa, RFID tracking, real-time safety system, safety alerts, safety measures
Procedia PDF Downloads 633708 Large-Capacity Image Information Reduction Based on Single-Cue Saliency Map for Retinal Prosthesis System
Authors: Yili Chen, Xiaokun Liang, Zhicheng Zhang, Yaoqin Xie
Abstract:
In an effort to restore visual perception in retinal diseases, an electronic retinal prosthesis with thousands of electrodes has been developed. The image processing strategies of retinal prosthesis system converts the original images from the camera to the stimulus pattern which can be interpreted by the brain. Practically, the original images are with more high resolution (256x256) than that of the stimulus pattern (such as 25x25), which causes a technical image processing challenge to do large-capacity image information reduction. In this paper, we focus on developing an efficient image processing stimulus pattern extraction algorithm by using a single cue saliency map for extracting salient objects in the image with an optimal trimming threshold. Experimental results showed that the proposed stimulus pattern extraction algorithm performs quite well for different scenes in terms of the stimulus pattern. In the algorithm performance experiment, our proposed SCSPE algorithm have almost five times of the score compared with Boyle’s algorithm. Through experiment s we suggested that when there are salient objects in the scene (such as the blind meet people or talking with people), the trimming threshold should be set around 0.4max, in other situations, the trimming threshold values can be set between 0.2max-0.4max to give the satisfied stimulus pattern.Keywords: retinal prosthesis, image processing, region of interest, saliency map, trimming threshold selection
Procedia PDF Downloads 2463707 Mining Coupled to Agriculture: Systems Thinking in Scalable Food Production
Authors: Jason West
Abstract:
Low profitability in agriculture production along with increasing scrutiny over environmental effects is limiting food production at scale. In contrast, the mining sector offers access to resources including energy, water, transport and chemicals for food production at low marginal cost. Scalable agricultural production can benefit from the nexus of resources (water, energy, transport) offered by mining activity in remote locations. A decision support bioeconomic model for controlled environment vertical farms was used. Four submodels were used: crop structure, nutrient requirements, resource-crop integration, and economic. They escalate to a macro mathematical model. A demonstrable dynamic systems framework is needed to prove productive outcomes are feasible. We demonstrate a generalized bioeconomic macro model for controlled environment production systems in minesites using systems dynamics modeling methodology. Despite the complexity of bioeconomic modelling of resource-agricultural dynamic processes and interactions, the economic potential greater than general economic models would assume. Scalability of production as an input becomes a key success feature.Keywords: crop production systems, mathematical model, mining, agriculture, dynamic systems
Procedia PDF Downloads 773706 Using New Machine Algorithms to Classify Iranian Musical Instruments According to Temporal, Spectral and Coefficient Features
Authors: Ronak Khosravi, Mahmood Abbasi Layegh, Siamak Haghipour, Avin Esmaili
Abstract:
In this paper, a study on classification of musical woodwind instruments using a small set of features selected from a broad range of extracted ones by the sequential forward selection method was carried out. Firstly, we extract 42 features for each record in the music database of 402 sound files belonging to five different groups of Flutes (end blown and internal duct), Single –reed, Double –reed (exposed and capped), Triple reed and Quadruple reed. Then, the sequential forward selection method is adopted to choose the best feature set in order to achieve very high classification accuracy. Two different classification techniques of support vector machines and relevance vector machines have been tested out and an accuracy of up to 96% can be achieved by using 21 time, frequency and coefficient features and relevance vector machine with the Gaussian kernel function.Keywords: coefficient features, relevance vector machines, spectral features, support vector machines, temporal features
Procedia PDF Downloads 3203705 Analytical Study of Data Mining Techniques for Software Quality Assurance
Authors: Mariam Bibi, Rubab Mehboob, Mehreen Sirshar
Abstract:
Satisfying the customer requirements is the ultimate goal of producing or developing any product. The quality of the product is decided on the bases of the level of customer satisfaction. There are different techniques which have been reported during the survey which enhance the quality of the product through software defect prediction and by locating the missing software requirements. Some mining techniques were proposed to assess the individual performance indicators in collaborative environment to reduce errors at individual level. The basic intention is to produce a product with zero or few defects thereby producing a best product quality wise. In the analysis of survey the techniques like Genetic algorithm, artificial neural network, classification and clustering techniques and decision tree are studied. After analysis it has been discovered that these techniques contributed much to the improvement and enhancement of the quality of the product.Keywords: data mining, defect prediction, missing requirements, software quality
Procedia PDF Downloads 4673704 Data Mining Spatial: Unsupervised Classification of Geographic Data
Authors: Chahrazed Zouaoui
Abstract:
In recent years, the volume of geospatial information is increasing due to the evolution of communication technologies and information, this information is presented often by geographic information systems (GIS) and stored on of spatial databases (BDS). The classical data mining revealed a weakness in knowledge extraction at these enormous amounts of data due to the particularity of these spatial entities, which are characterized by the interdependence between them (1st law of geography). This gave rise to spatial data mining. Spatial data mining is a process of analyzing geographic data, which allows the extraction of knowledge and spatial relationships from geospatial data, including methods of this process we distinguish the monothematic and thematic, geo- Clustering is one of the main tasks of spatial data mining, which is registered in the part of the monothematic method. It includes geo-spatial entities similar in the same class and it affects more dissimilar to the different classes. In other words, maximize intra-class similarity and minimize inter similarity classes. Taking account of the particularity of geo-spatial data. Two approaches to geo-clustering exist, the dynamic processing of data involves applying algorithms designed for the direct treatment of spatial data, and the approach based on the spatial data pre-processing, which consists of applying clustering algorithms classic pre-processed data (by integration of spatial relationships). This approach (based on pre-treatment) is quite complex in different cases, so the search for approximate solutions involves the use of approximation algorithms, including the algorithms we are interested in dedicated approaches (clustering methods for partitioning and methods for density) and approaching bees (biomimetic approach), our study is proposed to design very significant to this problem, using different algorithms for automatically detecting geo-spatial neighborhood in order to implement the method of geo- clustering by pre-treatment, and the application of the bees algorithm to this problem for the first time in the field of geo-spatial.Keywords: mining, GIS, geo-clustering, neighborhood
Procedia PDF Downloads 3753703 An Exploratory Sequential Design: A Mixed Methods Model for the Statistics Learning Assessment with a Bayesian Network Representation
Authors: Zhidong Zhang
Abstract:
This study established a mixed method model in assessing statistics learning with Bayesian network models. There are three variants in exploratory sequential designs. There are three linked steps in one of the designs: qualitative data collection and analysis, quantitative measure, instrument, intervention, and quantitative data collection analysis. The study used a scoring model of analysis of variance (ANOVA) as a content domain. The research study is to examine students’ learning in both semantic and performance aspects at fine grain level. The ANOVA score model, y = α+ βx1 + γx1+ ε, as a cognitive task to collect data during the student learning process. When the learning processes were decomposed into multiple steps in both semantic and performance aspects, a hierarchical Bayesian network was established. This is a theory-driven process. The hierarchical structure was gained based on qualitative cognitive analysis. The data from students’ ANOVA score model learning was used to give evidence to the hierarchical Bayesian network model from the evidential variables. Finally, the assessment results of students’ ANOVA score model learning were reported. Briefly, this was a mixed method research design applied to statistics learning assessment. The mixed methods designs expanded more possibilities for researchers to establish advanced quantitative models initially with a theory-driven qualitative mode.Keywords: exploratory sequential design, ANOVA score model, Bayesian network model, mixed methods research design, cognitive analysis
Procedia PDF Downloads 1783702 Decision Support System in Air Pollution Using Data Mining
Authors: E. Fathallahi Aghdam, V. Hosseini
Abstract:
Environmental pollution is not limited to a specific region or country; that is why sustainable development, as a necessary process for improvement, pays attention to issues such as destruction of natural resources, degradation of biological system, global pollution, and climate change in the world, especially in the developing countries. According to the World Health Organization, as a developing city, Tehran (capital of Iran) is one of the most polluted cities in the world in terms of air pollution. In this study, three pollutants including particulate matter less than 10 microns, nitrogen oxides, and sulfur dioxide were evaluated in Tehran using data mining techniques and through Crisp approach. The data from 21 air pollution measuring stations in different areas of Tehran were collected from 1999 to 2013. Commercial softwares Clementine was selected for this study. Tehran was divided into distinct clusters in terms of the mentioned pollutants using the software. As a data mining technique, clustering is usually used as a prologue for other analyses, therefore, the similarity of clusters was evaluated in this study through analyzing local conditions, traffic behavior, and industrial activities. In fact, the results of this research can support decision-making system, help managers improve the performance and decision making, and assist in urban studies.Keywords: data mining, clustering, air pollution, crisp approach
Procedia PDF Downloads 4273701 Pattern Identification in Statistical Process Control Using Artificial Neural Networks
Authors: M. Pramila Devi, N. V. N. Indra Kiran
Abstract:
Control charts, predominantly in the form of X-bar chart, are important tools in statistical process control (SPC). They are useful in determining whether a process is behaving as intended or there are some unnatural causes of variation. A process is out of control if a point falls outside the control limits or a series of point’s exhibit an unnatural pattern. In this paper, a study is carried out on four training algorithms for CCPs recognition. For those algorithms optimal structure is identified and then they are studied for type I and type II errors for generalization without early stopping and with early stopping and the best one is proposed.Keywords: control chart pattern recognition, neural network, backpropagation, generalization, early stopping
Procedia PDF Downloads 3723700 Organic Matter Distribution in Bazhenov Source Rock: Insights from Sequential Extraction and Molecular Geochemistry
Authors: Margarita S. Tikhonova, Alireza Baniasad, Anton G. Kalmykov, Georgy A. Kalmykov, Ralf Littke
Abstract:
There is a high complexity in the pore structure of organic-rich rocks caused by the combination of inter-particle porosity from inorganic mineral matter and ultrafine intra-particle porosity from both organic matter and clay minerals. Fluids are retained in that pore space, but there are major uncertainties in how and where the fluids are stored and to what extent they are accessible or trapped in 'closed' pores. A large degree of tortuosity may lead to fractionation of organic matter so that the lighter and flexible compounds would diffuse to the reservoir whereas more complicated compounds may be locked in place. Additionally, parts of hydrocarbons could be bound to solid organic matter –kerogen– and mineral matrix during expulsion and migration. Larger compounds can occupy thin channels so that clogging or oil and gas entrapment will occur. Sequential extraction of applying different solvents is a powerful tool to provide more information about the characteristics of trapped organic matter distribution. The Upper Jurassic – Lower Cretaceous Bazhenov shale is one of the most petroliferous source rock extended in West Siberia, Russia. Concerning the variable mineral composition, pore space distribution and thermal maturation, there are high uncertainties in distribution and composition of organic matter in this formation. In order to address this issue geological and geochemical properties of 30 samples including mineral composition (XRD and XRF), structure and texture (thin-section microscopy), organic matter contents, type and thermal maturity (Rock-Eval) as well as molecular composition (GC-FID and GC-MS) of different extracted materials during sequential extraction were considered. Sequential extraction was performed by a Soxhlet apparatus using different solvents, i.e., n-hexane, chloroform and ethanol-benzene (1:1 v:v) first on core plugs and later on pulverized materials. The results indicate that the studied samples are mainly composed of type II kerogen with TOC contents varied from 5 to 25%. The thermal maturity ranged from immature to late oil window. Whereas clay contents decreased with increasing maturity, the amount of silica increased in the studied samples. According to molecular geochemistry, stored hydrocarbons in open and closed pore space reveal different geochemical fingerprints. The results improve our understanding of hydrocarbon expulsion and migration in the organic-rich Bazhenov shale and therefore better estimation of hydrocarbon potential for this formation.Keywords: Bazhenov formation, bitumen, molecular geochemistry, sequential extraction
Procedia PDF Downloads 1703699 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning
Authors: Walid Cherif
Abstract:
Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.Keywords: data mining, knowledge discovery, machine learning, similarity measurement, supervised classification
Procedia PDF Downloads 4643698 Emergence of Information Centric Networking and Web Content Mining: A Future Efficient Internet Architecture
Authors: Sajjad Akbar, Rabia Bashir
Abstract:
With the growth of the number of users, the Internet usage has evolved. Due to its key design principle, there is an incredible expansion in its size. This tremendous growth of the Internet has brought new applications (mobile video and cloud computing) as well as new user’s requirements i.e. content distribution environment, mobility, ubiquity, security and trust etc. The users are more interested in contents rather than their communicating peer nodes. The current Internet architecture is a host-centric networking approach, which is not suitable for the specific type of applications. With the growing use of multiple interactive applications, the host centric approach is considered to be less efficient as it depends on the physical location, for this, Information Centric Networking (ICN) is considered as the potential future Internet architecture. It is an approach that introduces uniquely named data as a core Internet principle. It uses the receiver oriented approach rather than sender oriented. It introduces the naming base information system at the network layer. Although ICN is considered as future Internet architecture but there are lot of criticism on it which mainly concerns that how ICN will manage the most relevant content. For this Web Content Mining(WCM) approaches can help in appropriate data management of ICN. To address this issue, this paper contributes by (i) discussing multiple ICN approaches (ii) analyzing different Web Content Mining approaches (iii) creating a new Internet architecture by merging ICN and WCM to solve the data management issues of ICN. From ICN, Content-Centric Networking (CCN) is selected for the new architecture, whereas, Agent-based approach from Web Content Mining is selected to find most appropriate data.Keywords: agent based web content mining, content centric networking, information centric networking
Procedia PDF Downloads 4753697 A Parallel Implementation of Artificial Bee Colony Algorithm within CUDA Architecture
Authors: Selcuk Aslan, Dervis Karaboga, Celal Ozturk
Abstract:
Artificial Bee Colony (ABC) algorithm is one of the most successful swarm intelligence based metaheuristics. It has been applied to a number of constrained or unconstrained numerical and combinatorial optimization problems. In this paper, we presented a parallelized version of ABC algorithm by adapting employed and onlooker bee phases to the Compute Unified Device Architecture (CUDA) platform which is a graphical processing unit (GPU) programming environment by NVIDIA. The execution speed and obtained results of the proposed approach and sequential version of ABC algorithm are compared on functions that are typically used as benchmarks for optimization algorithms. Tests on standard benchmark functions with different colony size and number of parameters showed that proposed parallelization approach for ABC algorithm decreases the execution time consumed by the employed and onlooker bee phases in total and achieved similar or better quality of the results compared to the standard sequential implementation of the ABC algorithm.Keywords: Artificial Bee Colony algorithm, GPU computing, swarm intelligence, parallelization
Procedia PDF Downloads 3783696 Robust Pattern Recognition via Correntropy Generalized Orthogonal Matching Pursuit
Authors: Yulong Wang, Yuan Yan Tang, Cuiming Zou, Lina Yang
Abstract:
This paper presents a novel sparse representation method for robust pattern classification. Generalized orthogonal matching pursuit (GOMP) is a recently proposed efficient sparse representation technique. However, GOMP adopts the mean square error (MSE) criterion and assign the same weights to all measurements, including both severely and slightly corrupted ones. To reduce the limitation, we propose an information-theoretic GOMP (ITGOMP) method by exploiting the correntropy induced metric. The results show that ITGOMP can adaptively assign small weights on severely contaminated measurements and large weights on clean ones, respectively. An ITGOMP based classifier is further developed for robust pattern classification. The experiments on public real datasets demonstrate the efficacy of the proposed approach.Keywords: correntropy induced metric, matching pursuit, pattern classification, sparse representation
Procedia PDF Downloads 355