Search results for: association rule mining.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1151

Search results for: association rule mining.

911 Availability of Sports Facilities does not explain the Association between Economic Environment and Physical Inactivity in a Southern European city

Authors: Cruz Pascual, Enrique Regidor, Paloma Ortega, David Martínez, Paloma Astasio

Abstract:

This paper evaluates the association between economic environment in the districts of Madrid (Spain) and physical inactivity, using income per capita as indicator of economic environment. The analysis included 6,601 individuals aged 16 to 74 years. The measure of association estimated was the prevalence odds ratio for physical inactivity by income per capita. After adjusting for sex, age, and individual socioeconomic characteristics, people living in the districts with the lowest per capita income had an odds ratio for physical inactivity 1.58 times higher (95% confidence interval 1.35 to 1.85) than those living in districts with the highest per capita income. Additional adjustment for the availability of sports facilities in each district did not decrease the magnitude of the association. These findings show that the widely believed assumption that the availability of sports and recreational facilities, as a possible explanation for the relation between economic environment and physical inactivity, cannot be considered a universal observation.

Keywords: Economic environment, physical inactivity, sports facilities, districts, Madrid, Spain

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1850
910 Rule Based Architecture for Collaborative Multidisciplinary Aircraft Design Optimisation

Authors: Nickolay Jelev, Andy Keane, Carren Holden, András Sóbester

Abstract:

In aircraft design, the jump from the conceptual to preliminary design stage introduces a level of complexity which cannot be realistically handled by a single optimiser, be that a human (chief engineer) or an algorithm. The design process is often partitioned along disciplinary lines, with each discipline given a level of autonomy. This introduces a number of challenges including, but not limited to: coupling of design variables; coordinating disciplinary teams; handling of large amounts of analysis data; reaching an acceptable design within time constraints. A number of classical Multidisciplinary Design Optimisation (MDO) architectures exist in academia specifically designed to address these challenges. Their limited use in the industrial aircraft design process has inspired the authors of this paper to develop an alternative strategy based on well established ideas from Decision Support Systems. The proposed rule based architecture sacrifices possibly elusive guarantees of convergence for an attractive return in simplicity. The method is demonstrated on analytical and aircraft design test cases and its performance is compared to a number of classical distributed MDO architectures.

Keywords: Multidisciplinary design optimisation, rule based architecture, aircraft design, decision support system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1071
909 Data Mining for Cancer Management in Egypt Case Study: Childhood Acute Lymphoblastic Leukemia

Authors: Nevine M. Labib, Michael N. Malek

Abstract:

Data Mining aims at discovering knowledge out of data and presenting it in a form that is easily comprehensible to humans. One of the useful applications in Egypt is the Cancer management, especially the management of Acute Lymphoblastic Leukemia or ALL, which is the most common type of cancer in children. This paper discusses the process of designing a prototype that can help in the management of childhood ALL, which has a great significance in the health care field. Besides, it has a social impact on decreasing the rate of infection in children in Egypt. It also provides valubale information about the distribution and segmentation of ALL in Egypt, which may be linked to the possible risk factors. Undirected Knowledge Discovery is used since, in the case of this research project, there is no target field as the data provided is mainly subjective. This is done in order to quantify the subjective variables. Therefore, the computer will be asked to identify significant patterns in the provided medical data about ALL. This may be achieved through collecting the data necessary for the system, determimng the data mining technique to be used for the system, and choosing the most suitable implementation tool for the domain. The research makes use of a data mining tool, Clementine, so as to apply Decision Trees technique. We feed it with data extracted from real-life cases taken from specialized Cancer Institutes. Relevant medical cases details such as patient medical history and diagnosis are analyzed, classified, and clustered in order to improve the disease management.

Keywords: Data Mining, Decision Trees, Knowledge Discovery, Leukemia.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2215
908 Data Mining Approach for Commercial Data Classification and Migration in Hybrid Storage Systems

Authors: Mais Haj Qasem, Maen M. Al Assaf, Ali Rodan

Abstract:

Parallel hybrid storage systems consist of a hierarchy of different storage devices that vary in terms of data reading speed performance. As we ascend in the hierarchy, data reading speed becomes faster. Thus, migrating the application’ important data that will be accessed in the near future to the uppermost level will reduce the application I/O waiting time; hence, reducing its execution elapsed time. In this research, we implement trace-driven two-levels parallel hybrid storage system prototype that consists of HDDs and SSDs. The prototype uses data mining techniques to classify application’ data in order to determine its near future data accesses in parallel with the its on-demand request. The important data (i.e. the data that the application will access in the near future) are continuously migrated to the uppermost level of the hierarchy. Our simulation results show that our data migration approach integrated with data mining techniques reduces the application execution elapsed time when using variety of traces in at least to 22%.

Keywords: Data mining, hybrid storage system, recurrent neural network, support vector machine.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1736
907 Directors’ Duties, Civil Liability, and the Business Judgment Rule under the Portuguese Legal Framework

Authors: Marisa Catarina da Conceição Dinis

Abstract:

The commercial companies’ management has suffered an important material and legal transformation in the last years, mainly related to the changes in the Portuguese legal framework and because of the fact they were recently object of great expansion. In fact, next to the smaller family businesses, whose management is regularly assumed by partners, companies with social investment highly scattered, whose owners are completely out from administration, are now arising. In those particular cases, the business transactions are much more complex and require from the companies’ managers a highly technical knowledge and some specific professionals’ skills and abilities. This kind of administration carries a high-level risk that can both result in great success or in great losses. Knowing that the administration performance can result in important losses to the companies, the Portuguese legislator has created a legal structure to impute them some responsibilities and sanctions. The main goal of this study is to analyze the Portuguese law and some jurisprudence about companies’ management rules and about the conflicts between the directors and the company. In order to achieve these purposes we have to consider, on the one hand, the legal duties directly connected to the directors’ functions and on the other hand the disrespect for those same rules. The Portuguese law in this matter, influenced by the common law, determines that the directors’ attitude should be guided by loyalty and honesty. Consequently, we must reflect in which cases the administrators should respond to losses that they might cause to companies as a result of their duties’ disrespect. In this way is necessary to study the business judgment rule wich is a rule that refers to a liability exclusion rule. We intend, in the same way, to evaluate if the civil liability that results from the directors’ duties disrespect can extend itself to those who have elected them ignoring or even knowing that they don´t have the necessary skills or appropriate knowledge to the position they hold. To charge directors’, without ruining entrepreneurship, charging, in the same way, those who select them reinforces the need for more responsible and cautious attitudes which will lead consequently to more confidence in the markets.

Keywords: Duty of loyalty, duty of care, business judgment rule, civil liability of directors.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1252
906 On Preprocessing of Speech Signals

Authors: Ayaz Keerio, Bhargav Kumar Mitra, Philip Birch, Rupert Young, Chris Chatwin

Abstract:

Preprocessing of speech signals is considered a crucial step in the development of a robust and efficient speech or speaker recognition system. In this paper, we present some popular statistical outlier-detection based strategies to segregate the silence/unvoiced part of the speech signal from the voiced portion. The proposed methods are based on the utilization of the 3 σ edit rule, and the Hampel Identifier which are compared with the conventional techniques: (i) short-time energy (STE) based methods, and (ii) distribution based methods. The results obtained after applying the proposed strategies on some test voice signals are encouraging.

Keywords: STE based methods, Mahalanobis distance, 3 edit σ rule, Hampel Identifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1710
905 The Hybrid Knowledge Model for Product Development Management

Authors: Heejung Lee, Hyo-Won Suh

Abstract:

Hybrid knowledge model is suggested as an underlying framework for product development management. It can support such hybrid features as ontologies and rules. Effective collaboration in product development environment depends on sharing and reasoning product information as well as engineering knowledge. Many studies have considered product information and engineering knowledge. However, most previous research has focused either on building the ontology of product information or rule-based systems of engineering knowledge. This paper shows that F-logic based knowledge model can support such desirable features in a hybrid way.

Keywords: Ontology, rule, F-logic, product development.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1475
904 Two Cases of VACTERL Association in Pregnancy with Lymphocyte Therapy

Authors: Seyed Mazyar Mortazavi, Masod Memari, Hasan Ali Ahmadi, Zhaleh Abed

Abstract:

VACTERL association is a rare disorder with various congenital malformations. The aetiology remains unknown. Combination of at least three congenital anomalies of the following criteria is required for diagnosis: vertebral defects, anal atresia, cardiac anomalies, tracheo-esophageal fistula, renal anomalies, and limb defects. The first case was 1-day old male neonate with multiple congenital anomalies was bore from 28 years old mother. The mother had history of pregnancy with lymphocyte therapy. His anomalies included: defects in thoracic and lumbar vertebral, anal atresia, bilateral hydronephrosis, atrial septal defect, and lower limb abnormality. Other anomalies were cryptorchidism and nasal canal narrowing. The second case was born with 32 weeks gestational age from mother with history of pregnancy with lymphocyte therapy. He had thoracic vertebral defect, cardiac anomalies and renal defect. diagnosis based on clinical finding is VACTERL association. Early diagnosis is very important to investigation and treatment of other coexistence anomalies. VACTERL association in mothers with history of pregnancy with lymphocyte therapy has suggested possibly of relationship between VACTERL association and this method of pregnancy.

Keywords: Anal atresia, tracheo-esophageal fistula, atrial septal defect, lymphocyte therapy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2545
903 Estimation Model of Dry Docking Duration Using Data Mining

Authors: Isti Surjandari, Riara Novita

Abstract:

Maintenance is one of the most important activities in the shipyard industry. However, sometimes it is not supported by adequate services from the shipyard, where inaccuracy in estimating the duration of the ship maintenance is still common. This makes estimation of ship maintenance duration is crucial. This study uses Data Mining approach, i.e., CART (Classification and Regression Tree) to estimate the duration of ship maintenance that is limited to dock works or which is known as dry docking. By using the volume of dock works as an input to estimate the maintenance duration, 4 classes of dry docking duration were obtained with different linear model and job criteria for each class. These linear models can then be used to estimate the duration of dry docking based on job criteria.

Keywords: Classification and regression tree (CART), data mining, dry docking, maintenance duration.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2433
902 A New Algorithm for Cluster Initialization

Authors: Moth'd Belal. Al-Daoud

Abstract:

Clustering is a very well known technique in data mining. One of the most widely used clustering techniques is the k-means algorithm. Solutions obtained from this technique are dependent on the initialization of cluster centers. In this article we propose a new algorithm to initialize the clusters. The proposed algorithm is based on finding a set of medians extracted from a dimension with maximum variance. The algorithm has been applied to different data sets and good results are obtained.

Keywords: clustering, k-means, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2104
901 Arsenic Mobility from Mining Tailings of Monte San Nicolas to Presa de Mata in Guanajuato, Mexico

Authors: I. Cano-Aguilera, B. E. Rubio-Campos, G. De la Rosa, A. F. Aguilera-Alvarado

Abstract:

Mining tailings represent a generating source of rich heavy metal material with a potential danger the public health and the environment, since these metals, under certain conditions, can leach and contaminate aqueous systems that serve like supplying potable water sources. The strategy for this work is based on the observation, experimentation and the simulation that can be obtained by binding real answers of the hydrodynamic behavior of metals leached from mining tailings, and the applied mathematics that provides the logical structure to decipher the individual effects of the general physicochemical phenomenon. The case of study presented herein focuses on mining tailings deposits located in Monte San Nicolas, Guanajuato, Mexico, an abandoned mine. This was considered the contamination source that under certain physicochemical conditions can favor the metal leaching, and its transport towards aqueous systems. In addition, the cartography, meteorology, geology and the hydrodynamics and hydrological characteristics of the place, will be helpful in determining the way and the time in which these systems can interact. Preliminary results demonstrated that arsenic presents a great mobility, since this one was identified in several superficial aqueous systems of the micro watershed, as well as in sediments in concentrations that exceed the established maximum limits in the official norms. Also variations in pH and potential oxide-reduction were registered, conditions that favor the presence of different species from this element its solubility and therefore its mobility.

Keywords: Arsenic, mining tailings, transport.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1689
900 Chinese Event Detection Technique Based on Dependency Parsing and Rule Matching

Authors: Weitao Lin

Abstract:

To quickly extract adequate information from large-scale unstructured text data, this paper studies the representation of events in Chinese scenarios and performs the regularized abstraction. It proposes a Chinese event detection technique based on dependency parsing and rule matching. The method first performs dependency parsing on the original utterance, then performs pattern matching at the word or phrase granularity based on the results of dependent syntactic analysis, filters out the utterances with prominent non-event characteristics, and obtains the final results. The experimental results show the effectiveness of the method.

Keywords: Natural Language Processing, Chinese event detection, rules matching, dependency parsing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 175
899 Judicial Institutions in a Post-Conflict Society: Gaining Legitimacy through a Holistic Reform

Authors: Abdul Salim Amin

Abstract:

This paper focuses on how judiciaries in post-conflict societies can gain legitimacy through reformation. Legitimacy plays a pivotal role in shaping people’s behavior to submit to the law and verifies the rightfulness of an organ for taking binding decisions. Among various dynamics, judicial independence, access to justice and behavioral changes of the judicial officials broadly contribute to legitimation of judiciary in general, and the courts in particular. Increasing independence of judiciary through reform limits, inter alia, government interference in judicial issues and protects basic rights of the citizens. Judicial independence does not only matter in institutional terms, individual independence also influences the impartiality and integrity of judges, which can be increased through education and better administration of justice. Finally, access to justice as an intertwined concept both at the legal and moral spectrum of judicial reform avails justice to the citizens and increases the level of public trust and confidence. Efficient legal decisions on fostering such elements through holistic reform create a rule of law atmosphere. Citizens neither accept an illegitimate judiciary nor do they trust its decisions. Lack of such tolerance and confidence deters the rule of law and thus, undermines the democratic development of a society.

Keywords: Legitimacy, judicial reform, judicial independence, access to justice, legal training, informal justice, rule of law.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1543
898 Thematic Role Extraction Using Shallow Parsing

Authors: Mehrnoush Shamsfard, Maryam Sadr Mousavi

Abstract:

Extracting thematic (semantic) roles is one of the major steps in representing text meaning. It refers to finding the semantic relations between a predicate and syntactic constituents in a sentence. In this paper we present a rule-based approach to extract semantic roles from Persian sentences. The system exploits a twophase architecture to (1) identify the arguments and (2) label them for each predicate. For the first phase we developed a rule based shallow parser to chunk Persian sentences and for the second phase we developed a knowledge-based system to assign 16 selected thematic roles to the chunks. The experimental results of testing each phase are shown at the end of the paper.

Keywords: Natural Language Processing, Semantic RoleLabeling, Shallow parsing, Thematic Roles.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2027
897 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1529
896 Server Virtualization Using User Behavior Model Focus on Provisioning Concept

Authors: D. Prangchumpol

Abstract:

Server provisioning is one of the most attractive topics in virtualization systems. Virtualization is a method of running multiple independent virtual operating systems on a single physical computer. It is a way of maximizing physical resources to maximize the investment in hardware. Additionally, it can help to consolidate servers, improve hardware utilization and reduce the consumption of power and physical space in the data center. However, management of heterogeneous workloads, especially for resource utilization of the server, or so called provisioning becomes a challenge. In this paper, a new concept for managing workloads based on user behavior is presented. The experimental results show that user behaviors are different in each type of service workload and time. Understanding user behaviors may improve the efficiency of management in provisioning concept. This preliminary study may be an approach to improve management of data centers running heterogeneous workloads for provisioning in virtualization system.

Keywords: association rule, provisioning, server virtualization.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1723
895 Hypertension and Its Association with Oral Health Status in Adults: A Pilot Study in Padusunan Adults Community

Authors: Murniwati, Nurul Khairiyah, Putri Ovieza Maizar

Abstract:

The association between general and oral health is clearly important, particularly in adults with medical conditions. Many of the medical systemic conditions are either caused or aggravated by poor oral hygiene and vice versa. Hypertension is one of common medical systemic problem which has been a public health concern worldwide due to its known consequences. Those consequences must be related to oral health status as well, whether it may cause or worsen the oral health conditions. The objective of this study was to find out the association between hypertension and oral health status in adults. This study was an analytical observational study by using cross-sectional method. A total of 42 adults both male and female in Padusunan Village, Pariaman, West Sumatra, Indonesia were selected as subjects by using purposive sampling. Manual sphygmomanometer was used to measure blood pressure and dental examination was performed to calculate the decayed, missing, and filled teeth (DMFT) scores in order to represent oral health status. The data obtained was analyzed statistically using One Way ANOVA to determine the association between hypertensive adults and their oral health status. The result showed that majority age of the subjects was ranging from 51-70 years (40.5%). Based on blood pressure examination, 57.1% of subjects were classified to prehypertension. Overall, the mean of DMFT score calculated in normal, prehypertension and hypertension group was not considered statistically significant. There was no significant association (p>0.05) between hypertension and oral health status in adults.

Keywords: Blood pressure, hypertension, DMFT, oral health status.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1774
894 CSTR Control by Using Model Reference Adaptive Control and PSO

Authors: Neha Khanduja

Abstract:

This paper presents a comparative analysis of continuously stirred tank reactor (CSTR) control based on adaptive control and optimal tuning of PID control based on particle swarm optimization. In the design of adaptive control, Model reference adaptive control (MRAC) scheme is used, in which the adaptation law have been developed by MIT rule & Lyapunov’s rule. In PSO control parameters of PID controller is tuned by using the concept of particle swarm optimization to get optimized operating point for minimum integral square error (ISE) condition. The results show the adjustment of PID parameters converting into the optimal operating point and the good control response can be obtained by the PSO technique.

Keywords: Model reference adaptive control (MRAC), optimal control, particle swarm optimization (PSO).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2338
893 Emotion Classification by Incremental Association Language Features

Authors: Jheng-Long Wu, Pei-Chann Chang, Shih-Ling Chang, Liang-Chih Yu, Jui-Feng Yeh, Chin-Sheng Yang

Abstract:

The Major Depressive Disorder has been a burden of medical expense in Taiwan as well as the situation around the world. Major Depressive Disorder can be defined into different categories by previous human activities. According to machine learning, we can classify emotion in correct textual language in advance. It can help medical diagnosis to recognize the variance in Major Depressive Disorder automatically. Association language incremental is the characteristic and relationship that can discovery words in sentence. There is an overlapping-category problem for classification. In this paper, we would like to improve the performance in classification in principle of no overlapping-category problems. We present an approach that to discovery words in sentence and it can find in high frequency in the same time and can-t overlap in each category, called Association Language Features by its Category (ALFC). Experimental results show that ALFC distinguish well in Major Depressive Disorder and have better performance. We also compare the approach with baseline and mutual information that use single words alone or correlation measure.

Keywords: Association language features, Emotion Classification, Overlap-Category Feature, Nature Language Processing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1897
892 Implementation of Generalized Plasticity in Load-Deformation Behavior of Foundation with Emphasis on Localization Problem

Authors: A. H. Akhaveissy

Abstract:

Nonlinear finite element method with eight noded isoparametric quadrilateral element is used for prediction of loaddeformation behavior including bearing capacity of foundations. Modified generalized plasticity model with non-associated flow rule is applied for analysis of soil-footing system. Also Von Mises and Tresca criterions are used for simulation of soil behavior. Modified generalized plasticity model is able to simulate load-deformation including softening behavior. Localization phenomena are considered by different meshes. Localization phenomena have not been seen in the examples. Predictions by modified generalized plasticity model show good agreement with laboratory data and theoretical prediction in comparison the other models.

Keywords: Localization phenomena, Generalized plasticity, Non-associated Flow Rule

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1595
891 Classifier Based Text Mining for Neural Network

Authors: M. Govindarajan, R. M. Chandrasekaran

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In Neural Network that address classification problems, training set, testing set, learning rate are considered as key tasks. That is collection of input/output patterns that are used to train the network and used to assess the network performance, set the rate of adjustments. This paper describes a proposed back propagation neural net classifier that performs cross validation for original Neural Network. In order to reduce the optimization of classification accuracy, training time. The feasibility the benefits of the proposed approach are demonstrated by means of five data sets like contact-lenses, cpu, weather symbolic, Weather, labor-nega-data. It is shown that , compared to exiting neural network, the training time is reduced by more than 10 times faster when the dataset is larger than CPU or the network has many hidden units while accuracy ('percent correct') was the same for all datasets but contact-lences, which is the only one with missing attributes. For contact-lences the accuracy with Proposed Neural Network was in average around 0.3 % less than with the original Neural Network. This algorithm is independent of specify data sets so that many ideas and solutions can be transferred to other classifier paradigms.

Keywords: Back propagation, classification accuracy, textmining, time complexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4218
890 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2489
889 Analysis of a Population of Diabetic Patients Databases with Classifiers

Authors: Murat Koklu, Yavuz Unal

Abstract:

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Keywords: Artificial Intelligence, Classifiers, Data Mining, Diabetic Patients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5431
888 Multi-Dimensional Concerns Mining for Web Applications via Concept-Analysis

Authors: Carlo Bellettini, Alessandro Marchetto, Andrea Trentini

Abstract:

Web applications have become very complex and crucial, especially when combined with areas such as CRM (Customer Relationship Management) and BPR (Business Process Reengineering), the scientific community has focused attention to Web applications design, development, analysis, and testing, by studying and proposing methodologies and tools. This paper proposes an approach to automatic multi-dimensional concern mining for Web Applications, based on concepts analysis, impact analysis, and token-based concern identification. This approach lets the user to analyse and traverse Web software relevant to a particular concern (concept, goal, purpose, etc.) via multi-dimensional separation of concerns, to document, understand and test Web applications. This technique was developed in the context of WAAT (Web Applications Analysis and Testing) project. A semi-automatic tool to support this technique is currently under development.

Keywords: Concepts Analysis, Concerns Mining, Multi-Dimensional Separation of Concerns, Impact Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1473
887 Ensemble Approach for Predicting Student's Academic Performance

Authors: L. A. Muhammad, M. S. Argungu

Abstract:

Educational data mining (EDM) has recorded substantial considerations. Techniques of data mining in one way or the other have been proposed to dig out out-of-sight knowledge in educational data. The result of the study got assists academic institutions in further enhancing their process of learning and methods of passing knowledge to students. Consequently, the performance of students boasts and the educational products are by no doubt enhanced. This study adopted a student performance prediction model premised on techniques of data mining with Students' Essential Features (SEF). SEF are linked to the learner's interactivity with the e-learning management system. The performance of the student's predictive model is assessed by a set of classifiers, viz. Bayes Network, Logistic Regression, and Reduce Error Pruning Tree (REP). Consequently, ensemble methods of Bagging, Boosting, and Random Forest (RF) are applied to improve the performance of these single classifiers. The study reveals that the result shows a robust affinity between learners' behaviors and their academic attainment. Result from the study shows that the REP Tree and its ensemble record the highest accuracy of 83.33% using SEF. Hence, in terms of the Receiver Operating Curve (ROC), boosting method of REP Tree records 0.903, which is the best. This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, bagging, Random Forest, boosting, data mining, classifiers, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 766
886 Arabic Light Stemmer for Better Search Accuracy

Authors: Sahar Khedr, Dina Sayed, Ayman Hanafy

Abstract:

Arabic is one of the most ancient and critical languages in the world. It has over than 250 million Arabic native speakers and more than twenty countries having Arabic as one of its official languages. In the past decade, we have witnessed a rapid evolution in smart devices, social network and technology sector which led to the need to provide tools and libraries that properly tackle the Arabic language in different domains. Stemming is one of the most crucial linguistic fundamentals. It is used in many applications especially in information extraction and text mining fields. The motivation behind this work is to enhance the Arabic light stemmer to serve the data mining industry and leverage it in an open source community. The presented implementation works on enhancing the Arabic light stemmer by utilizing and enhancing an algorithm that provides an extension for a new set of rules and patterns accompanied by adjusted procedure. This study has proven a significant enhancement for better search accuracy with an average 10% improvement in comparison with previous works.

Keywords: Arabic data mining, Arabic Information extraction, Arabic Light stemmer, Arabic stemmer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1499
885 Mining Network Data for Intrusion Detection through Naïve Bayesian with Clustering

Authors: Dewan Md. Farid, Nouria Harbi, Suman Ahmmed, Md. Zahidur Rahman, Chowdhury Mofizur Rahman

Abstract:

Network security attacks are the violation of information security policy that received much attention to the computational intelligence society in the last decades. Data mining has become a very useful technique for detecting network intrusions by extracting useful knowledge from large number of network data or logs. Naïve Bayesian classifier is one of the most popular data mining algorithm for classification, which provides an optimal way to predict the class of an unknown example. It has been tested that one set of probability derived from data is not good enough to have good classification rate. In this paper, we proposed a new learning algorithm for mining network logs to detect network intrusions through naïve Bayesian classifier, which first clusters the network logs into several groups based on similarity of logs, and then calculates the prior and conditional probabilities for each group of logs. For classifying a new log, the algorithm checks in which cluster the log belongs and then use that cluster-s probability set to classify the new log. We tested the performance of our proposed algorithm by employing KDD99 benchmark network intrusion detection dataset, and the experimental results proved that it improves detection rates as well as reduces false positives for different types of network intrusions.

Keywords: Clustering, detection rate, false positive, naïveBayesian classifier, network intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5536
884 Investigating Crime Hotspot Places and their Implication to Urban Environmental Design: A Geographic Visualization and Data Mining Approach

Authors: Donna R. Tabangin, Jacqueline C. Flores, Nelson F. Emperador

Abstract:

Information is power. Geographical information is an emerging science that is advancing the development of knowledge to further help in the understanding of the relationship of “place" with other disciplines such as crime. The researchers used crime data for the years 2004 to 2007 from the Baguio City Police Office to determine the incidence and actual locations of crime hotspots. Combined qualitative and quantitative research methodology was employed through extensive fieldwork and observation, geographic visualization with Geographic Information Systems (GIS) and Global Positioning Systems (GPS), and data mining. The paper discusses emerging geographic visualization and data mining tools and methodologies that can be used to generate baseline data for environmental initiatives such as urban renewal and rejuvenation. The study was able to demonstrate that crime hotspots can be computed and were seen to be occurring to some select places in the Central Business District (CBD) of Baguio City. It was observed that some characteristics of the hotspot places- physical design and milieu may play an important role in creating opportunities for crime. A list of these environmental attributes was generated. This derived information may be used to guide the design or redesign of the urban environment of the City to be able to reduce crime and at the same time improve it physically.

Keywords: Crime mapping, data mining, environmental design, geographic visualization, GIS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2623
883 Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining by Improving Apriori Algorithm with Fuzzy Logic

Authors: Pejman Hosseinioun, Hasan Shakeri, Ghasem Ghorbanirostam

Abstract:

In recent years, we have seen an increasing importance of research and study on knowledge source, decision support systems, data mining and procedure of knowledge discovery in data bases and it is considered that each of these aspects affects the others. In this article, we have merged information source and knowledge source to suggest a knowledge based system within limits of management based on storing and restoring of knowledge to manage information and improve decision making and resources. In this article, we have used method of data mining and Apriori algorithm in procedure of knowledge discovery one of the problems of Apriori algorithm is that, a user should specify the minimum threshold for supporting the regularity. Imagine that a user wants to apply Apriori algorithm for a database with millions of transactions. Definitely, the user does not have necessary knowledge of all existing transactions in that database, and therefore cannot specify a suitable threshold. Our purpose in this article is to improve Apriori algorithm. To achieve our goal, we tried using fuzzy logic to put data in different clusters before applying the Apriori algorithm for existing data in the database and we also try to suggest the most suitable threshold to the user automatically.

Keywords: Decision support system, data mining, knowledge discovery, data discovery, fuzzy logic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2132
882 Discovering Complex Regularities by Adaptive Self Organizing Classification

Authors: A. Faro, D. Giordano, F. Maiorana

Abstract:

Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optmize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is also able to automatically suggest a strategy for number of classes optimization.The tool is used to classify macroeconomic data that report the most developed countries? import and export. It is possible to classify the countries based on their economic behaviour and use an ad hoc tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation.

Keywords: Unsupervised classification, Kohonen networks, macroeconomics, Visual data mining, cluster interpretation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1564