Search results for: Blocks Mining

620 PTFE Capillary-Based DNA Amplification within an Oscillatory Thermal Cycling Device

Authors: Jyh J. Chen, Fu H. Yang, Ming H. Liao

Abstract:

This study describes a capillary-based device integrated with the heating and cooling modules for polymerase chain reaction (PCR). The device consists of the reaction polytetrafluoroethylene (PTFE) capillary, the aluminum blocks, and is equipped with two cartridge heaters, a thermoelectric (TE) cooler, a fan, and some thermocouples for temperature control. The cartridge heaters are placed into the heating blocks and maintained at two different temperatures to achieve the denaturation and the extension step. Some thermocouples inserted into the capillary are used to obtain the transient temperature profiles of the reaction sample during thermal cycles. A 483-bp DNA template is amplified successfully in the designed system and the traditional thermal cycler. This work should be interesting to persons involved in the high-temperature based reactions and genomics or cell analysis.

Keywords: Polymerase chain reaction, thermal cycles, capillary, TE cooler.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2036

619 Data Mining for Cancer Management in Egypt Case Study: Childhood Acute Lymphoblastic Leukemia

Authors: Nevine M. Labib, Michael N. Malek

Abstract:

Data Mining aims at discovering knowledge out of data and presenting it in a form that is easily comprehensible to humans. One of the useful applications in Egypt is the Cancer management, especially the management of Acute Lymphoblastic Leukemia or ALL, which is the most common type of cancer in children. This paper discusses the process of designing a prototype that can help in the management of childhood ALL, which has a great significance in the health care field. Besides, it has a social impact on decreasing the rate of infection in children in Egypt. It also provides valubale information about the distribution and segmentation of ALL in Egypt, which may be linked to the possible risk factors. Undirected Knowledge Discovery is used since, in the case of this research project, there is no target field as the data provided is mainly subjective. This is done in order to quantify the subjective variables. Therefore, the computer will be asked to identify significant patterns in the provided medical data about ALL. This may be achieved through collecting the data necessary for the system, determimng the data mining technique to be used for the system, and choosing the most suitable implementation tool for the domain. The research makes use of a data mining tool, Clementine, so as to apply Decision Trees technique. We feed it with data extracted from real-life cases taken from specialized Cancer Institutes. Relevant medical cases details such as patient medical history and diagnosis are analyzed, classified, and clustered in order to improve the disease management.

Keywords: Data Mining, Decision Trees, Knowledge Discovery, Leukemia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2202

618 Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems

Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan

Abstract:

Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.

Keywords: Data mining, hybrid storage system, recurrent neural network, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1727

617 Laser Ultrasonic Imaging Based on Synthetic Aperture Focusing Technique Algorithm

Authors: Sundara Subramanian Karuppasamy, Che Hua Yang

Abstract:

In this work, the laser ultrasound technique has been used for analyzing and imaging the inner defects in metal blocks. To detect the defects in blocks, traditionally the researchers used piezoelectric transducers for the generation and reception of ultrasonic signals. These transducers can be configured into the sparse and phased array. But these two configurations have their drawbacks including the requirement of many transducers, time-consuming calculations, limited bandwidth, and provide confined image resolution. Here, we focus on the non-contact method for generating and receiving the ultrasound to examine the inner defects in aluminum blocks. A Q-switched pulsed laser has been used for the generation and the reception is done by using Laser Doppler Vibrometer (LDV). Based on the Doppler effect, LDV provides a rapid and high spatial resolution way for sensing ultrasonic waves. From the LDV, a series of scanning points are selected which serves as the phased array elements. The side-drilled hole of 10 mm diameter with a depth of 25 mm has been introduced and the defect is interrogated by the linear array of scanning points obtained from the LDV. With the aid of the Synthetic Aperture Focusing Technique (SAFT) algorithm, based on the time-shifting principle the inspected images are generated from the A-scan data acquired from the 1-D linear phased array elements. Thus the defect can be precisely detected with good resolution.

Keywords: Laser ultrasonics, linear phased array, nondestructive testing, synthetic aperture focusing technique, ultrasonic imaging.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 932

616 Estimation Model of Dry Docking Duration Using Data Mining

Authors: Isti Surjandari, Riara Novita

Abstract:

Maintenance is one of the most important activities in the shipyard industry. However, sometimes it is not supported by adequate services from the shipyard, where inaccuracy in estimating the duration of the ship maintenance is still common. This makes estimation of ship maintenance duration is crucial. This study uses Data Mining approach, i.e., CART (Classification and Regression Tree) to estimate the duration of ship maintenance that is limited to dock works or which is known as dry docking. By using the volume of dock works as an input to estimate the maintenance duration, 4 classes of dry docking duration were obtained with different linear model and job criteria for each class. These linear models can then be used to estimate the duration of dry docking based on job criteria.

Keywords: Classification and regression tree (CART), data mining, dry docking, maintenance duration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2426

615 A New Algorithm for Cluster Initialization

Authors: Moth'd Belal. Al-Daoud

Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the k-means algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximum variance. The algorithm has been applied to different data sets and good results are obtained.

Keywords: clustering, k-means, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2095

614 Low Complexity Hybrid Scheme for PAPR Reduction in OFDM Systems Based on SLM and Clipping

Authors: V. Sudha, D. Sriram Kumar

Abstract:

In this paper, we present a low complexity hybrid scheme using conventional selective mapping (C-SLM) and clipping algorithms to reduce the high peak-to-average power ratio (PAPR) of orthogonal frequency division multiplexing (OFDM) signal. In the proposed scheme, the input data sequence (X) is divided into two sub-blocks, then clipping algorithm is applied to the first sub-block, whereas C-SLM algorithm is applied to the second sub-block in order to reduce both computational complexity and PAPR. The resultant time domain OFDM signal is obtained by combining the output of two sub-blocks. The simulation results show that the proposed hybrid scheme provides 0.45 dB PAPR reduction gain at CCDF value of 10-2 and 52% of computational complexity reduction when compared to C-SLM scheme at the expense of slight degradation in bit error rate (BER) performance.

Keywords: CCDF, Clipping, OFDM, PAPR, SLM.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1255

613 Arsenic Mobility from Mining Tailings of Monte San Nicolas to Presa de Mata in Guanajuato, Mexico

Authors: I. Cano-Aguilera, B. E. Rubio-Campos, G. De la Rosa, A. F. Aguilera-Alvarado

Abstract:

Mining tailings represent a generating source of rich heavy metal material with a potential danger the public health and the environment, since these metals, under certain conditions, can leach and contaminate aqueous systems that serve like supplying potable water sources. The strategy for this work is based on the observation, experimentation and the simulation that can be obtained by binding real answers of the hydrodynamic behavior of metals leached from mining tailings, and the applied mathematics that provides the logical structure to decipher the individual effects of the general physicochemical phenomenon. The case of study presented herein focuses on mining tailings deposits located in Monte San Nicolas, Guanajuato, Mexico, an abandoned mine. This was considered the contamination source that under certain physicochemical conditions can favor the metal leaching, and its transport towards aqueous systems. In addition, the cartography, meteorology, geology and the hydrodynamics and hydrological characteristics of the place, will be helpful in determining the way and the time in which these systems can interact. Preliminary results demonstrated that arsenic presents a great mobility, since this one was identified in several superficial aqueous systems of the micro watershed, as well as in sediments in concentrations that exceed the established maximum limits in the official norms. Also variations in pH and potential oxide-reduction were registered, conditions that favor the presence of different species from this element its solubility and therefore its mobility.

Keywords: Arsenic, mining tailings, transport.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1677

612 Time and Distance Dependence of Protons Energy Loss for Laser (pw-ps) Fusion Driven Ion Acceleration

Authors: B. Malekynia

Abstract:

The anomalous generation of plasma blocks by interaction of petawatt-picosecond laser pulses permits side-on ignition of uncompressed solid fusion fuel following an improved application of the hydrodynamic Chu-model for deuterium-tritium. The new possibility of side-on laser ignition depends on accelerated ions and produced ions beams of high energy particles by the nonlinear ponderomotive force of the laser pulse in the plasma block, a re-evaluation of the early hydrodynamic analysis for ignition of inertial fusion by including inhibition factor, collective effect of stopping power of alpha particles and the energy loss rate reabsorption to plasma by the protons of plasma blocks being reduced by about a factor 40.

Keywords: Block ignition, Charged particles, Reabsorption, Skin layer ponderomotive acceleration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1524

611 Association of Smoking with Chest Radiographic and Lung Function Findings in Retired Bauxite Mining Workers

Authors: L. R. Ferreira, R. C. G. Bianchi, L. C.R. Ferreira, C. M. Galhardi, E. P. Baciuk, L. H. Oliveira

Abstract:

Inhalation hazards are associated with potentially injurious exposure and increased risk for lung diseases, within the bauxite mining industry, especially for the smelter workers. Smoking is related to decreased lung function and leads to chronic lung diseases. This study had the objective to evaluate whether smoking is related to functional and radiographic respiratory changes in retired bauxite mining workers. Methods: This was a retrospective and cross-sectional study involving the analysis of database information of 140 retired bauxite mining workers from Poços de Caldas-MG evaluated at Worker’s Health Reference Center and at the Social Security Brazilian National Institute, from July 1^st, 2015 until June 30^th, 2016. The workers were divided into three groups: non-smokers (n = 47), ex-smokers (n = 46), and smokers (n = 47). The data included: age, gender, spirometry results, and the presence or not of pulmonary pleural and/or parenchymal changes in chest radiographs. Chi-Squared test was used (p < 0,05). Results: In the smokers’ group, 83% of spirometry tests and 64% of chest x-rays were altered. In the non-smokers’ group, 19% of spirometry tests and 13% of chest x-rays were altered. In the ex-smokers’ group, 35% of spirometry tests and 30% of chest x-rays were altered. Most of the results were statistically significant. Results demonstrated a significant difference between smokers’ and non-smokers’ groups in regard to spirometric and radiographic pulmonary alterations. Ex-smokers’ and non-smokers’ group demonstrated better results when compared to the smokers’ group in relation to altered spirometry and radiograph findings. These data may contribute to planning strategies to enhance smoking cessation programs within the bauxite mining industry.

Keywords: Bauxite mining, spirometry, chest radiography, smoking.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 681

610 Recycled Aggregates from Construction and Demolition Waste in the Production of Concrete Blocks

Authors: Juan A. Ferriz-Papi, Simon Thomas

Abstract:

The construction industry generates large amounts of waste, usually mixed, which can be composed of different origin materials, most of them catalogued as non-hazardous. The European Union targets for this waste for 2020 have been already achieved by the UK, but it is mainly developed in downcycling processes (backfilling) whereas upcycling (such as recycle in new concrete batches) still keeps at a low percentage. The aim of this paper is to explore further in the use of recycled aggregates from construction and demolition waste (CDW) in concrete mixes so as to improve upcycling. A review of most recent research and legislation applied in the UK is developed regarding the production of concrete blocks. As a case study, initial tests were developed with a CDW recycled aggregate sample from a CDW plant in Swansea. Composition by visual inspection and sieving tests of two samples were developed and compared to original aggregates. More than 70% was formed by soil waste from excavation, and the rest was a mix of waste from mortar, concrete, and ceramics with small traces of plaster, glass and organic matter. Two concrete mixes were made with 80% replacement of recycled aggregates and different water/cement ratio. Tests were carried out for slump, absorption, density and compression strength. The results were compared to a reference sample and showed a substantial reduction of quality in both mixes. Despite that, the discussion brings to identify different aspects to solve, such as heterogeneity or composition, and analyze them for the successful use of these recycled aggregates in the production of concrete blocks. The conclusions obtained can help increase upcycling processes ratio with mixed CDW as recycled aggregates in concrete mixes.

Keywords: Recycled aggregate, concrete, concrete block, construction and demolition waste, recycling.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2007

609 Applying Fuzzy FP-Growth to Mine Fuzzy Association Rules

Authors: Chien-Hua Wang, Wei-Hsuan Lee, Chin-Tzong Pang

Abstract:

In data mining, the association rules are used to find for the associations between the different items of the transactions database. As the data collected and stored, rules of value can be found through association rules, which can be applied to help managers execute marketing strategies and establish sound market frameworks. This paper aims to use Fuzzy Frequent Pattern growth (FFP-growth) to derive from fuzzy association rules. At first, we apply fuzzy partition methods and decide a membership function of quantitative value for each transaction item. Next, we implement FFP-growth to deal with the process of data mining. In addition, in order to understand the impact of Apriori algorithm and FFP-growth algorithm on the execution time and the number of generated association rules, the experiment will be performed by using different sizes of databases and thresholds. Lastly, the experiment results show FFPgrowth algorithm is more efficient than other existing methods.

Keywords: Data mining, association rule, fuzzy frequent patterngrowth.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1794

608 Applications of Genetic Programming in Data Mining

Authors: Saleh Mesbah Elkaffas, Ahmed A. Toony

Abstract:

This paper details the application of a genetic programming framework for induction of useful classification rules from a database of income statements, balance sheets, and cash flow statements for North American public companies. Potentially interesting classification rules are discovered. Anomalies in the discovery process merit further investigation of the application of genetic programming to the dataset for the problem domain.

Keywords: Genetic programming, data mining classification rule.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1534

607 UB-Tree Indexing for Semantic Query Optimization of Range Queries

Authors: S. Housseno, A. Simonet, M. Simonet

Abstract:

Semantic query optimization consists in restricting the search space in order to reduce the set of objects of interest for a query. This paper presents an indexing method based on UB-trees and a static analysis of the constraints associated to the views of the database and to any constraint expressed on attributes. The result of the static analysis is a partitioning of the object space into disjoint blocks. Through Space Filling Curve (SFC) techniques, each fragment (block) of the partition is assigned a unique identifier, enabling the efficient indexing of fragments by UB-trees. The search space corresponding to a range query is restricted to a subset of the blocks of the partition. This approach has been developed in the context of a KB-DBMS but it can be applied to any relational system.

Keywords: Index, Range query, UB-tree, Space Filling Curve, Query optimization, Views, Database, Integrity Constraint, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1491

606 Utilization of Rice Husk Ash with Clay to Produce Lightweight Coarse Aggregates for Concrete

Authors: Shegufta Zahan, Muhammad A. Zahin, Muhammad M. Hossain, Raquib Ahsan

Abstract:

Rice Husk Ash (RHA) is one of the agricultural waste byproducts available widely in the world and contains a large amount of silica. In Bangladesh, stones cannot be used as coarse aggregate in infrastructure works as they are not available and need to be imported from abroad. As a result, bricks are mostly used as coarse aggregates in concrete as they are cheaper and easily produced here. Clay is the raw material for producing brick. Due to rapid urban growth and the industrial revolution, demand for brick is increasing, which led to a decrease in the topsoil. This study aims to produce lightweight block aggregates with sufficient strength utilizing RHA at low cost and use them as an ingredient of concrete. RHA, because of its pozzolanic behavior, can be utilized to produce better quality block aggregates at lower cost, replacing clay content in the bricks. The whole study can be divided into three parts. In the first part, characterization tests on RHA and clay were performed to determine their properties. Six different types of RHA from different mills were characterized by XRD and SEM analysis. Their fineness was determined by conducting a fineness test. The result of XRD confirmed the amorphous state of RHA. The characterization test for clay identifies the sample as “silty clay” with a specific gravity of 2.59 and 14% optimum moisture content. In the second part, blocks were produced with six different types of RHA with different combinations by volume with clay. Then mixtures were manually compacted in molds before subjecting them to oven drying at 120 °C for 7 days. After that, dried blocks were placed in a furnace at 1200 °C to produce ultimate blocks. Loss on ignition test, apparent density test, crushing strength test, efflorescence test, and absorption test were conducted on the blocks to compare their performance with the bricks. For 40% of RHA, the crushing strength result was found 60 MPa, where crushing strength for brick was observed 48.1 MPa. In the third part, the crushed blocks were used as coarse aggregate in concrete cylinders and compared them with brick concrete cylinders. Specimens were cured for 7 days and 28 days. The highest compressive strength of block cylinders for 7 days curing was calculated as 26.1 MPa, whereas, for 28 days curing, it was found 34 MPa. On the other hand, for brick cylinders, the value of compressing strength of 7 days and 28 days curing was observed as 20 MPa and 30 MPa, respectively. These research findings can help with the increasing demand for topsoil of the earth, and also turn a waste product into a valuable one.

Keywords: Characterization, furnace, pozzolanic behavior, rice husk ash.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 457

605 Novelty as a Measure of Interestingness in Knowledge Discovery

Authors: Vasudha Bhatnagar, Ahmed Sultan Al-Hegami, Naveen Kumar

Abstract:

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules leads to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach based on both objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules (knowledge). We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are promising.

Keywords: Knowledge Discovery in Databases (KDD), Interestingness, Subjective Measures, Novelty Index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1798

604 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1518

603 Comparative Analysis of Two Approaches to Joint Signal Detection, ToA and AoA Estimation in Multi-Element Antenna Arrays

Authors: Olesya Bolkhovskaya, Alexey Davydov, Alexander Maltsev

Abstract:

In this paper two approaches to joint signal detection, time of arrival (ToA) and angle of arrival (AoA) estimation in multi-element antenna array are investigated. Two scenarios were considered: first one, when the waveform of the useful signal is known a priori and, second one, when the waveform of the desired signal is unknown. For first scenario, the antenna array signal processing based on multi-element matched filtering (MF) with the following non-coherent detection scheme and maximum likelihood (ML) parameter estimation blocks is exploited. For second scenario, the signal processing based on the antenna array elements covariance matrix estimation with the following eigenvector analysis and ML parameter estimation blocks is applied. The performance characteristics of both signal processing schemes are thoroughly investigated and compared for different useful signals and noise parameters.

Keywords: Antenna array, signal detection, ToA, AoA estimation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2038

602 Generation of Photo-Mosaic Images through Block Matching and Color Adjustment

Authors: Hae-Yeoun Lee

Abstract:

Mosaic refers to a technique that makes image by gathering lots of small materials in various colors. This paper presents an automatic algorithm that makes the photo-mosaic image using photos. The algorithm is composed of 4 steps: partition and feature extraction, block matching, redundancy removal and color adjustment. The input image is partitioned in the small block to extract feature. Each block is matched to find similar photo in database by comparing similarity with Euclidean difference between blocks. The intensity of the block is adjusted to enhance the similarity of image by replacing the value of light and darkness with that of relevant block. Further, the quality of image is improved by minimizing the redundancy of tiles in the adjacent blocks. Experimental results support that the proposed algorithm is excellent in quantitative analysis and qualitative analysis.

Keywords: Photo-mosaic, Euclidean distance, Block matching, Intensity adjustment.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3558

601 A Cumulative Learning Approach to Data Mining Employing Censored Production Rules (CPRs)

Authors: Rekha Kandwal, Kamal K.Bharadwaj

Abstract:

Knowledge is indispensable but voluminous knowledge becomes a bottleneck for efficient processing. A great challenge for data mining activity is the generation of large number of potential rules as a result of mining process. In fact sometimes result size is comparable to the original data. Traditional data mining pruning activities such as support do not sufficiently reduce the huge rule space. Moreover, many practical applications are characterized by continual change of data and knowledge, thereby making knowledge voluminous with each change. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. Michalski & Winston proposed Censored Production Rules (CPRs), as an extension of production rules, that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence, are tight or there is simply no information available as to whether it holds or not. Thus the 'If P Then D' part of the CPR expresses important information while the Unless C part acts only as a switch changes the polarity of D to ~D. In this paper a scheme based on Dempster-Shafer Theory (DST) interpretation of a CPR is suggested for discovering CPRs from the discovered flat PRs. The discovery of CPRs from flat rules would result in considerable reduction of the already discovered rules. The proposed scheme incrementally incorporates new knowledge and also reduces the size of knowledge base considerably with each episode. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested cumulative learning scheme would be useful in mining data streams.

Keywords: Censored production rules, cumulative learning, data mining, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1475

600 Improving the Performance of Proxy Server by Using Data Mining Technique

Authors: P. Jomsri

Abstract:

Currently, web usage make a huge data from a lot of user attention. In general, proxy server is a system to support web usage from user and can manage system by using hit rates. This research tries to improve hit rates in proxy system by applying data mining technique. The data set are collected from proxy servers in the university and are investigated relationship based on several features. The model is used to predict the future access websites. Association rule technique is applied to get the relation among Date, Time, Main Group web, Sub Group web, and Domain name for created model. The results showed that this technique can predict web content for the next day, moreover the future accesses of websites increased from 38.15% to 85.57 %. This model can predict web page access which tends to increase the efficient of proxy servers as a result. In additional, the performance of internet access will be improved and help to reduce traffic in networks.

Keywords: Association rule, proxy server, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3049

599 Classifier Based Text Mining for Neural Network

Authors: M. Govindarajan, R. M. Chandrasekaran

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In Neural Network that address classification problems, training set, testing set, learning rate are considered as key tasks. That is collection of input/output patterns that are used to train the network and used to assess the network performance, set the rate of adjustments. This paper describes a proposed back propagation neural net classifier that performs cross validation for original Neural Network. In order to reduce the optimization of classification accuracy, training time. The feasibility the benefits of the proposed approach are demonstrated by means of five data sets like contact-lenses, cpu, weather symbolic, Weather, labor-nega-data. It is shown that , compared to exiting neural network, the training time is reduced by more than 10 times faster when the dataset is larger than CPU or the network has many hidden units while accuracy ('percent correct') was the same for all datasets but contact-lences, which is the only one with missing attributes. For contact-lences the accuracy with Proposed Neural Network was in average around 0.3 % less than with the original Neural Network. This algorithm is independent of specify data sets so that many ideas and solutions can be transferred to other classifier paradigms.

Keywords: Back propagation, classification accuracy, textmining, time complexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4206

598 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2477

597 Analysis of a Population of Diabetic Patients Databases with Classifiers

Authors: Murat Koklu, Yavuz Unal

Abstract:

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Keywords: Artificial Intelligence, Classifiers, Data Mining, Diabetic Patients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5418

596 On the Optimality of Blocked Main Effects Plans

Authors: Rita SahaRay, Ganesh Dutta

Abstract:

In this article, experimental situations are considered where a main effects plan is to be used to study m two-level factors using n runs which are partitioned into b blocks, not necessarily of same size. Assuming the block sizes to be even for all blocks, for the case n ≡ 2 (mod 4), optimal designs are obtained with respect to type 1 and type 2 optimality criteria in the class of designs providing estimation of all main effects orthogonal to the block effects. In practice, such orthogonal estimation of main effects is often a desirable condition. In the wider class of all available m two level even sized blocked main effects plans, where the factors do not occur at high and low levels equally often in each block, E-optimal designs are also characterized. Simple construction methods based on Hadamard matrices and Kronecker product for these optimal designs are presented.

Keywords: Design matrix, Hadamard matrix, Kronecker product, type 1 criteria, type 2 criteria.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1047

595 Multi-Dimensional Concerns Mining for Web Applications via Concept-Analysis

Authors: Carlo Bellettini, Alessandro Marchetto, Andrea Trentini

Abstract:

Web applications have become very complex and crucial, especially when combined with areas such as CRM (Customer Relationship Management) and BPR (Business Process Reengineering), the scientific community has focused attention to Web applications design, development, analysis, and testing, by studying and proposing methodologies and tools. This paper proposes an approach to automatic multi-dimensional concern mining for Web Applications, based on concepts analysis, impact analysis, and token-based concern identification. This approach lets the user to analyse and traverse Web software relevant to a particular concern (concept, goal, purpose, etc.) via multi-dimensional separation of concerns, to document, understand and test Web applications. This technique was developed in the context of WAAT (Web Applications Analysis and Testing) project. A semi-automatic tool to support this technique is currently under development.

Keywords: Concepts Analysis, Concerns Mining, Multi-Dimensional Separation of Concerns, Impact Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1460

594 Ensemble Approach for Predicting Student's Academic Performance

Authors: L. A. Muhammad, M. S. Argungu

Abstract:

Educational data mining (EDM) has recorded substantial considerations. Techniques of data mining in one way or the other have been proposed to dig out out-of-sight knowledge in educational data. The result of the study got assists academic institutions in further enhancing their process of learning and methods of passing knowledge to students. Consequently, the performance of students boasts and the educational products are by no doubt enhanced. This study adopted a student performance prediction model premised on techniques of data mining with Students' Essential Features (SEF). SEF are linked to the learner's interactivity with the e-learning management system. The performance of the student's predictive model is assessed by a set of classifiers, viz. Bayes Network, Logistic Regression, and Reduce Error Pruning Tree (REP). Consequently, ensemble methods of Bagging, Boosting, and Random Forest (RF) are applied to improve the performance of these single classifiers. The study reveals that the result shows a robust affinity between learners' behaviors and their academic attainment. Result from the study shows that the REP Tree and its ensemble record the highest accuracy of 83.33% using SEF. Hence, in terms of the Receiver Operating Curve (ROC), boosting method of REP Tree records 0.903, which is the best. This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, bagging, Random Forest, boosting, data mining, classifiers, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 733

593 Arabic Light Stemmer for Better Search Accuracy

Authors: Sahar Khedr, Dina Sayed, Ayman Hanafy

Abstract:

Arabic is one of the most ancient and critical languages in the world. It has over than 250 million Arabic native speakers and more than twenty countries having Arabic as one of its official languages. In the past decade, we have witnessed a rapid evolution in smart devices, social network and technology sector which led to the need to provide tools and libraries that properly tackle the Arabic language in different domains. Stemming is one of the most crucial linguistic fundamentals. It is used in many applications especially in information extraction and text mining fields. The motivation behind this work is to enhance the Arabic light stemmer to serve the data mining industry and leverage it in an open source community. The presented implementation works on enhancing the Arabic light stemmer by utilizing and enhancing an algorithm that provides an extension for a new set of rules and patterns accompanied by adjusted procedure. This study has proven a significant enhancement for better search accuracy with an average 10% improvement in comparison with previous works.

Keywords: Arabic data mining, Arabic Information extraction, Arabic Light stemmer, Arabic stemmer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1484

592 Mining Network Data for Intrusion Detection through Naïve Bayesian with Clustering

Authors: Dewan Md. Farid, Nouria Harbi, Suman Ahmmed, Md. Zahidur Rahman, Chowdhury Mofizur Rahman

Abstract:

Network security attacks are the violation of information security policy that received much attention to the computational intelligence society in the last decades. Data mining has become a very useful technique for detecting network intrusions by extracting useful knowledge from large number of network data or logs. Naïve Bayesian classifier is one of the most popular data mining algorithm for classification, which provides an optimal way to predict the class of an unknown example. It has been tested that one set of probability derived from data is not good enough to have good classification rate. In this paper, we proposed a new learning algorithm for mining network logs to detect network intrusions through naïve Bayesian classifier, which first clusters the network logs into several groups based on similarity of logs, and then calculates the prior and conditional probabilities for each group of logs. For classifying a new log, the algorithm checks in which cluster the log belongs and then use that cluster-s probability set to classify the new log. We tested the performance of our proposed algorithm by employing KDD99 benchmark network intrusion detection dataset, and the experimental results proved that it improves detection rates as well as reduces false positives for different types of network intrusions.

Keywords: Clustering, detection rate, false positive, naïveBayesian classifier, network intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5529

591 Investigating Crime Hotspot Places and their Implication to Urban Environmental Design: A Geographic Visualization and Data Mining Approach

Authors: Donna R. Tabangin, Jacqueline C. Flores, Nelson F. Emperador

Abstract:

Information is power. Geographical information is an emerging science that is advancing the development of knowledge to further help in the understanding of the relationship of “place" with other disciplines such as crime. The researchers used crime data for the years 2004 to 2007 from the Baguio City Police Office to determine the incidence and actual locations of crime hotspots. Combined qualitative and quantitative research methodology was employed through extensive fieldwork and observation, geographic visualization with Geographic Information Systems (GIS) and Global Positioning Systems (GPS), and data mining. The paper discusses emerging geographic visualization and data mining tools and methodologies that can be used to generate baseline data for environmental initiatives such as urban renewal and rejuvenation. The study was able to demonstrate that crime hotspots can be computed and were seen to be occurring to some select places in the Central Business District (CBD) of Baguio City. It was observed that some characteristics of the hotspot places- physical design and milieu may play an important role in creating opportunities for crime. A list of these environmental attributes was generated. This derived information may be used to guide the design or redesign of the urban environment of the City to be able to reduce crime and at the same time improve it physically.

Keywords: Crime mapping, data mining, environmental design, geographic visualization, GIS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2600