Search results for: frequent itemset
187 Frequent Itemset Mining Using Rough-Sets
Authors: Usman Qamar, Younus Javed
Abstract:
Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and roughsets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.
Keywords: Rough-sets, Classification, Feature Selection, Entropy, Outliers, Frequent itemset mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2434186 Generating Frequent Patterns through Intersection between Transactions
Authors: M. Jamali, F. Taghiyareh
Abstract:
The problem of frequent itemset mining is considered in this paper. One new technique proposed to generate frequent patterns in large databases without time-consuming candidate generation. This technique is based on focusing on transaction instead of concentrating on itemset. This algorithm based on take intersection between one transaction and others transaction and the maximum shared items between transactions computed instead of creating itemset and computing their frequency. With applying real life transactions and some consumption is taken from real life data, the significant efficiency acquire from databases in generation association rules mining.Keywords: Association rules, data mining, frequent patterns, shared itemset.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1403185 Mining Frequent Patterns with Functional Programming
Authors: Nittaya Kerdprasop, Kittisak Kerdprasop
Abstract:
Frequent patterns are patterns such as sets of features or items that appear in data frequently. Finding such frequent patterns has become an important data mining task because it reveals associations, correlations, and many other interesting relationships hidden in a dataset. Most of the proposed frequent pattern mining algorithms have been implemented with imperative programming languages such as C, Cµ, Java. The imperative paradigm is significantly inefficient when itemset is large and the frequent pattern is long. We suggest a high-level declarative style of programming using a functional language. Our supposition is that the problem of frequent pattern discovery can be efficiently and concisely implemented via a functional paradigm since pattern matching is a fundamental feature supported by most functional languages. Our frequent pattern mining implementation using the Haskell language confirms our hypothesis about conciseness of the program. The performance studies on speed and memory usage support our intuition on efficiency of functional language.Keywords: Association, frequent pattern mining, functionalprogramming, pattern matching.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2135184 Automata Theory Approach for Solving Frequent Pattern Discovery Problems
Authors: Renáta Iváncsy, István Vajk
Abstract:
The various types of frequent pattern discovery problem, namely, the frequent itemset, sequence and graph mining problems are solved in different ways which are, however, in certain aspects similar. The main approach of discovering such patterns can be classified into two main classes, namely, in the class of the levelwise methods and in that of the database projection-based methods. The level-wise algorithms use in general clever indexing structures for discovering the patterns. In this paper a new approach is proposed for discovering frequent sequences and tree-like patterns efficiently that is based on the level-wise issue. Because the level-wise algorithms spend a lot of time for the subpattern testing problem, the new approach introduces the idea of using automaton theory to solve this problem.Keywords: Frequent pattern discovery, graph mining, pushdownautomaton, sequence mining, state machine, tree mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1628183 An Efficient Approach to Mining Frequent Itemsets on Data Streams
Authors: Sara Ansari, Mohammad Hadi Sadreddini
Abstract:
The increasing importance of data stream arising in a wide range of advanced applications has led to the extensive study of mining frequent patterns. Mining data streams poses many new challenges amongst which are the one-scan nature, the unbounded memory requirement and the high arrival rate of data streams. In this paper, we propose a new approach for mining itemsets on data stream. Our approach SFIDS has been developed based on FIDS algorithm. The main attempts were to keep some advantages of the previous approach and resolve some of its drawbacks, and consequently to improve run time and memory consumption. Our approach has the following advantages: using a data structure similar to lattice for keeping frequent itemsets, separating regions from each other with deleting common nodes that results in a decrease in search space, memory consumption and run time; and Finally, considering CPU constraint, with increasing arrival rate of data that result in overloading system, SFIDS automatically detect this situation and discard some of unprocessing data. We guarantee that error of results is bounded to user pre-specified threshold, based on a probability technique. Final results show that SFIDS algorithm could attain about 50% run time improvement than FIDS approach.Keywords: Data stream, frequent itemset, stream mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1419182 Content Based Sampling over Transactional Data Streams
Authors: Mansour Tarafdar, Mohammad Saniee Abade
Abstract:
This paper investigates the problem of sampling from transactional data streams. We introduce CFISDS as a content based sampling algorithm that works on a landmark window model of data streams and preserve more informed sample in sample space. This algorithm that work based on closed frequent itemset mining tasks, first initiate a concept lattice using initial data, then update lattice structure using an incremental mechanism.Incremental mechanism insert, update and delete nodes in/from concept lattice in batch manner. Presented algorithm extracts the final samples on demand of user. Experimental results show the accuracy of CFISDS on synthetic and real datasets, despite on CFISDS algorithm is not faster than exist sampling algorithms such as Z and DSS.
Keywords: Sampling, data streams, closed frequent item set mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1709181 A Patricia-Tree Approach for Frequent Closed Itemsets
Authors: Moez Ben Hadj Hamida, Yahya SlimaniI
Abstract:
In this paper, we propose an adaptation of the Patricia-Tree for sparse datasets to generate non redundant rule associations. Using this adaptation, we can generate frequent closed itemsets that are more compact than frequent itemsets used in Apriori approach. This adaptation has been experimented on a set of datasets benchmarks.
Keywords: Datamining, Frequent itemsets, Frequent closeditemsets, Sparse datasets.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1884180 On Pattern-Based Programming towards the Discovery of Frequent Patterns
Authors: Kittisak Kerdprasop, Nittaya Kerdprasop
Abstract:
The problem of frequent pattern discovery is defined as the process of searching for patterns such as sets of features or items that appear in data frequently. Finding such frequent patterns has become an important data mining task because it reveals associations, correlations, and many other interesting relationships hidden in a database. Most of the proposed frequent pattern mining algorithms have been implemented with imperative programming languages. Such paradigm is inefficient when set of patterns is large and the frequent pattern is long. We suggest a high-level declarative style of programming apply to the problem of frequent pattern discovery. We consider two languages: Haskell and Prolog. Our intuitive idea is that the problem of finding frequent patterns should be efficiently and concisely implemented via a declarative paradigm since pattern matching is a fundamental feature supported by most functional languages and Prolog. Our frequent pattern mining implementation using the Haskell and Prolog languages confirms our hypothesis about conciseness of the program. The comparative performance studies on line-of-code, speed and memory usage of declarative versus imperative programming have been reported in the paper.Keywords: Frequent pattern mining, functional programming, pattern matching, logic programming.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1343179 Approximate Frequent Pattern Discovery Over Data Stream
Authors: Kittisak Kerdprasop, Nittaya Kerdprasop
Abstract:
Frequent pattern discovery over data stream is a hard problem because a continuously generated nature of stream does not allow a revisit on each data element. Furthermore, pattern discovery process must be fast to produce timely results. Based on these requirements, we propose an approximate approach to tackle the problem of discovering frequent patterns over continuous stream. Our approximation algorithm is intended to be applied to process a stream prior to the pattern discovery process. The results of approximate frequent pattern discovery have been reported in the paper.Keywords: Frequent pattern discovery, Approximate algorithm, Data stream analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1342178 W3-Miner: Mining Weighted Frequent Subtree Patterns in a Collection of Trees
Authors: R. AliMohammadzadeh, M. Haghir Chehreghani, A. Zarnani, M. Rahgozar
Abstract:
Mining frequent tree patterns have many useful applications in XML mining, bioinformatics, network routing, etc. Most of the frequent subtree mining algorithms (i.e. FREQT, TreeMiner and CMTreeMiner) use anti-monotone property in the phase of candidate subtree generation. However, none of these algorithms have verified the correctness of this property in tree structured data. In this research it is shown that anti-monotonicity does not generally hold, when using weighed support in tree pattern discovery. As a result, tree mining algorithms that are based on this property would probably miss some of the valid frequent subtree patterns in a collection of trees. In this paper, we investigate the correctness of anti-monotone property for the problem of weighted frequent subtree mining. In addition we propose W3-Miner, a new algorithm for full extraction of frequent subtrees. The experimental results confirm that W3-Miner finds some frequent subtrees that the previously proposed algorithms are not able to discover.Keywords: Semi-Structured Data Mining, Anti-Monotone Property, Trees.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1381177 Anomaly Based On Frequent-Outlier for Outbreak Detection in Public Health Surveillance
Authors: Zalizah Awang Long, Abdul Razak Hamdan, Azuraliza Abu Bakar
Abstract:
Public health surveillance system focuses on outbreak detection and data sources used. Variation or aberration in the frequency distribution of health data, compared to historical data is often used to detect outbreaks. It is important that new techniques be developed to improve the detection rate, thereby reducing wastage of resources in public health. Thus, the objective is to developed technique by applying frequent mining and outlier mining techniques in outbreak detection. 14 datasets from the UCI were tested on the proposed technique. The performance of the effectiveness for each technique was measured by t-test. The overall performance shows that DTK can be used to detect outlier within frequent dataset. In conclusion the outbreak detection technique using anomaly-based on frequent-outlier technique can be used to identify the outlier within frequent dataset.
Keywords: Outlier detection, frequent-outlier, outbreak, anomaly, surveillance, public health
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2274176 Proposing an Efficient Method for Frequent Pattern Mining
Authors: Vaibhav Kant Singh, Vijay Shah, Yogendra Kumar Jain, Anupam Shukla, A.S. Thoke, Vinay KumarSingh, Chhaya Dule, Vivek Parganiha
Abstract:
Data mining, which is the exploration of knowledge from the large set of data, generated as a result of the various data processing activities. Frequent Pattern Mining is a very important task in data mining. The previous approaches applied to generate frequent set generally adopt candidate generation and pruning techniques for the satisfaction of the desired objective. This paper shows how the different approaches achieve the objective of frequent mining along with the complexities required to perform the job. This paper will also look for hardware approach of cache coherence to improve efficiency of the above process. The process of data mining is helpful in generation of support systems that can help in Management, Bioinformatics, Biotechnology, Medical Science, Statistics, Mathematics, Banking, Networking and other Computer related applications. This paper proposes the use of both upward and downward closure property for the extraction of frequent item sets which reduces the total number of scans required for the generation of Candidate Sets.Keywords: Data Mining, Candidate Sets, Frequent Item set, Pruning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1682175 An Efficient Graph Query Algorithm Based on Important Vertices and Decision Features
Authors: Xiantong Li, Jianzhong Li
Abstract:
Graph has become increasingly important in modeling complicated structures and schemaless data such as proteins, chemical compounds, and XML documents. Given a graph query, it is desirable to retrieve graphs quickly from a large database via graph-based indices. Different from the existing methods, our approach, called VFM (Vertex to Frequent Feature Mapping), makes use of vertices and decision features as the basic indexing feature. VFM constructs two mappings between vertices and frequent features to answer graph queries. The VFM approach not only provides an elegant solution to the graph indexing problem, but also demonstrates how database indexing and query processing can benefit from data mining, especially frequent pattern mining. The results show that the proposed method not only avoids the enumeration method of getting subgraphs of query graph, but also effectively reduces the subgraph isomorphism tests between the query graph and graphs in candidate answer set in verification stage.Keywords: Decision Feature, Frequent Feature, Graph Dataset, Graph Query
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1871174 Discussion about Frequent Adjustment of Urban Master Planning in China: A Case Study of Changshou District, Chongqing City
Authors: Sun Ailu, Zhao Wanmin
Abstract:
Since the reform and opening, the urbanization process of China has entered a rapid development period. In recent years, the authors participated in some projects of urban master planning in China and found a phenomenon that the rapid urbanization area of China is experiencing frequent adjustment process of urban master planning. This phenomenon is not the natural process of urbanization development. It may be caused by different government roles from different levels. Through the methods of investigation, data comparison and case study, this paper aims to explore the reason why the rapid urbanization area is experiencing frequent adjustment of master planning and give some solution strategies. Firstly, taking Changshou district of Chongqing city as an example, this paper wants to introduce the phenomenon about frequent adjustment process in China. And then, discuss distinct roles in the process between national government, provincial government and local government of China. At last, put forward preliminary solutions strategies for this area in China from the aspects of land use, intergovernmental cooperation and so on.
Keywords: Urban master planning, frequent adjustment, urbanization development, problems and strategies, China.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1090173 Correction of Frequent English Writing Errors by Using Coded Indirect Corrective Feedback and Error Treatment
Authors: Chaiwat Tantarangsee
Abstract:
The purposes of this study are 1) to study the frequent English writing errors of students registering the course: Reading and Writing English for Academic Purposes II, and 2) to find out the results of writing error correction by using coded indirect corrective feedback and writing error treatments. Samples include 28 2nd year English Major students, Faculty of Education, Suan Sunandha Rajabhat University. Tool for experimental study includes the lesson plan of the course; Reading and Writing English for Academic Purposes II, and tool for data collection includes 4 writing tests of short texts. The research findings disclose that frequent English writing errors found in this course comprise 7 types of grammatical errors, namely Fragment sentence, Subject-verb agreement, Wrong form of verb tense, Singular or plural noun endings, Run-ons sentence, Wrong form of verb pattern and Lack of parallel structure. Moreover, it is found that the results of writing error correction by using coded indirect corrective feedback and error treatment reveal the overall reduction of the frequent English writing errors and the increase of students’ achievement in the writing of short texts with the significance at .05.
Keywords: Coded indirect corrective feedback, error correction, error treatment, frequent English writing errors.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2197172 Association between ADHD Medication, Cannabis, and Nicotine Use, Mental Distress, and Other Psychoactive Substances
Authors: Nicole Scott, Emily Dwyer, Cara Patrissy, Samantha Bonventre, Lina Begdache
Abstract:
Across North America, the use and abuse of Attention Deficit Hyperactivity Disorder (ADHD) medication, cannabis, nicotine, and other psychoactive substances across college campuses have become an increasingly prevalent problem. Students frequently use these substances to aid their studying or deal with their mental health issues. However, it is still unknown what psychoactive substances are likely to be abused when college students illicitly use ADHD medication. In addition, it is not clear which psychoactive substance is associated with mental distress. Thus, the purpose of this study is to fill these gaps by assessing the use of different psychoactive substances when illicit ADHD medication is used; and how this association relates to mental stress. A total of 702 undergraduate students from different college campuses in the US completed an anonymous survey distributed online. Data were self-reported on demographics, the use of ADHD medications, cannabis, nicotine, other psychoactive drugs, and mental distress, and feelings and opinions on the use of illicit study drugs were all included in the survey. Mental distress was assessed using the Kessler Psychological Distress 6 Scale. Data were analyzed in SPSS, Version 25.0, using Pearson’s Correlation Coefficient. Our results show use of ADHD medication, cannabis use (non-frequent and very frequent), and nicotine use (non-frequent and very frequent); there were both statistically significant positive and negative correlations to specific psychoactive substances and their corresponding frequencies. Along the same lines, ADHD medication, cannabis use (non-frequent and very frequent), and nicotine use (non-frequent and very frequent) had statistically significant positive and negative correlations to specific mental distress experiences. As these findings are combined, a vicious loop can initiate a cycle where individuals who abuse psychoactive substances may or may not be inclined to use other psychoactive substances. This may later inhibit brain functions in those main areas of the brain stem, amygdala, and prefrontal cortex where this vicious cycle may or may not impact their mental distress. Addressing the impact of study drug abuse and its potential to be associated with further substance abuse may provide an educational framework and support proactive approaches to promote awareness among college students.
Keywords: Stimulant, depressant, nicotine, ADHD medication, psychoactive substances, mental health, illicit, ecstasy, adrenochrome.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1315171 Revised PLWAP Tree with Non-frequent Items for Mining Sequential Pattern
Authors: R. Vishnu Priya, A. Vadivel
Abstract:
Sequential pattern mining is a challenging task in data mining area with large applications. One among those applications is mining patterns from weblog. Recent times, weblog is highly dynamic and some of them may become absolute over time. In addition, users may frequently change the threshold value during the data mining process until acquiring required output or mining interesting rules. Some of the recently proposed algorithms for mining weblog, build the tree with two scans and always consume large time and space. In this paper, we build Revised PLWAP with Non-frequent Items (RePLNI-tree) with single scan for all items. While mining sequential patterns, the links related to the nonfrequent items are not considered. Hence, it is not required to delete or maintain the information of nodes while revising the tree for mining updated transactions. The algorithm supports both incremental and interactive mining. It is not required to re-compute the patterns each time, while weblog is updated or minimum support changed. The performance of the proposed tree is better, even the size of incremental database is more than 50% of existing one. For evaluation purpose, we have used the benchmark weblog dataset and found that the performance of proposed tree is encouraging compared to some of the recently proposed approaches.
Keywords: Sequential pattern mining, weblog, frequent and non-frequent items, incremental and interactive mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1931170 Correction of Frequent English Writing Errors by Using Coded Indirect Corrective Feedback and Error Treatment: The Case of Reading and Writing English for Academic Purposes II
Authors: Chaiwat Tantarangsee
Abstract:
The purposes of this study are 1) to study the frequent English writing errors of students registering the course: Reading and Writing English for Academic Purposes II, and 2) to find out the results of writing error correction by using coded indirect corrective feedback and writing error treatments. Samples include 28 2nd year English Major students, Faculty of Education, Suan Sunandha Rajabhat University. Tool for experimental study includes the lesson plan of the course; Reading and Writing English for Academic Purposes II, and tool for data collection includes 4 writing tests of short texts. The research findings disclose that frequent English writing errors found in this course comprise 7 types of grammatical errors, namely Fragment sentence, Subject-verb agreement, Wrong form of verb tense, Singular or plural noun endings, Run-ons sentence, Wrong form of verb pattern and Lack of parallel structure. Moreover, it is found that the results of writing error correction by using coded indirect corrective feedback and error treatment reveal the overall reduction of the frequent English writing errors and the increase of students’ achievement in the writing of short texts with the significance at .05.Keywords: Coded indirect corrective feedback, error correction, and error treatment.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1093169 A New Model for Discovering XML Association Rules from XML Documents
Authors: R. AliMohammadzadeh, M. Rahgozar, A. Zarnani
Abstract:
The inherent flexibilities of XML in both structure and semantics makes mining from XML data a complex task with more challenges compared to traditional association rule mining in relational databases. In this paper, we propose a new model for the effective extraction of generalized association rules form a XML document collection. We directly use frequent subtree mining techniques in the discovery process and do not ignore the tree structure of data in the final rules. The frequent subtrees based on the user provided support are split to complement subtrees to form the rules. We explain our model within multi-steps from data preparation to rule generation.Keywords: XML, Data Mining, Association Rule Mining.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1631168 Elimination of Redundant Links in Web Pages– Mathematical Approach
Authors: G. Poonkuzhali, K.Thiagarajan, K.Sarukesi
Abstract:
With the enormous growth on the web, users get easily lost in the rich hyper structure. Thus developing user friendly and automated tools for providing relevant information without any redundant links to the users to cater to their needs is the primary task for the website owners. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent one that are likely to contain the outlying data such as noise, irrelevant and redundant data. This paper proposes new algorithm for mining the web content by detecting the redundant links from the web documents using set theoretical(classical mathematics) such as subset, union, intersection etc,. Then the redundant links is removed from the original web content to get the required information by the user..Keywords: Web documents, Web content mining, redundantlink, outliers, set theory.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2014167 Comparing Academically Gifted and Non-Gifted Students- Supportive Environments in Jordan
Authors: Mustafa Qaseem Hielat, Ahmad Mohammad Al-Shabatat
Abstract:
Jordan exerts many efforts to nurture their academically gifted students in special schools since 2001. During the past nine years of launching these schools, their learning and excellence environments were believed to be distinguished compared to public schools. This study investigated the environments of gifted students compared with other non-gifted, using a survey instrument that measures the dimensions of family, peers, teachers, school- support, society, and resources –dimensions rooted deeply in supporting gifted education, learning, and achievement. A total number of 109 were selected from excellence schools for academically gifted students, and 119 non-gifted students were selected from public schools. Around 8.3% of the non-gifted students reported that they “Never" received any support from their surrounding environments, 14.9% reported “Seldom" support, 23.7% reported “ Often" support, 26.0% reported “Frequent" support, and 32.8% reported “Very frequent" support. Where the gifted students reported more “Never" support than the non-gifted did with 11.3%, “Seldom" support with 15.4%, “Often" support with 26.6%, “Frequent" support with 29.0%, and reported “Very frequent" support less than the non-gifted students with 23.6%. Unexpectedly, statistical differences were found between the two groups favoring non-gifted students in perception of their surrounding environments in specific dimensions, namely, school- support, teachers, and society. No statistical differences were found in the other dimensions of the survey, namely, family, peers, and resources. As the differences were found in teachers, school- support, and society, the nurturing environments for the excellence schools need to be revised to adopt more creative teaching styles, rich school atmosphere and infrastructures, interactive guiding for the students and their parents, promoting for the excellence environments, and re-build successful identification models. Thus, families, schools, and society should increase their cooperation, communication, and awareness of the gifted supportive environments. However, more studies to investigate other aspects of promoting academic giftedness and excellence are recommended.Keywords: Academic giftedness, Supportive environment, Excellence schools, Gifted grouping, Gifted nurturing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1881166 Applying Fuzzy FP-Growth to Mine Fuzzy Association Rules
Authors: Chien-Hua Wang, Wei-Hsuan Lee, Chin-Tzong Pang
Abstract:
In data mining, the association rules are used to find for the associations between the different items of the transactions database. As the data collected and stored, rules of value can be found through association rules, which can be applied to help managers execute marketing strategies and establish sound market frameworks. This paper aims to use Fuzzy Frequent Pattern growth (FFP-growth) to derive from fuzzy association rules. At first, we apply fuzzy partition methods and decide a membership function of quantitative value for each transaction item. Next, we implement FFP-growth to deal with the process of data mining. In addition, in order to understand the impact of Apriori algorithm and FFP-growth algorithm on the execution time and the number of generated association rules, the experiment will be performed by using different sizes of databases and thresholds. Lastly, the experiment results show FFPgrowth algorithm is more efficient than other existing methods.Keywords: Data mining, association rule, fuzzy frequent patterngrowth.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1800165 Signed Approach for Mining Web Content Outliers
Authors: G. Poonkuzhali, K.Thiagarajan, K.Sarukesi, G.V.Uma
Abstract:
The emergence of the Internet has brewed the revolution of information storage and retrieval. As most of the data in the web is unstructured, and contains a mix of text, video, audio etc, there is a need to mine information to cater to the specific needs of the users without loss of important hidden information. Thus developing user friendly and automated tools for providing relevant information quickly becomes a major challenge in web mining research. Most of the existing web mining algorithms have concentrated on finding frequent patterns while neglecting the less frequent ones that are likely to contain outlying data such as noise, irrelevant and redundant data. This paper mainly focuses on Signed approach and full word matching on the organized domain dictionary for mining web content outliers. This Signed approach gives the relevant web documents as well as outlying web documents. As the dictionary is organized based on the number of characters in a word, searching and retrieval of documents takes less time and less space.Keywords: Outliers, Relevant document, , Signed Approach, Web content mining, Web documents..
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2349164 Constraint Based Frequent Pattern Mining Technique for Solving GCS Problem
Authors: First G.M. Karthik, Second Ramachandra.V.Pujeri, Dr.
Abstract:
Generalized Center String (GCS) problem are generalized from Common Approximate Substring problem and Common substring problems. GCS are known to be NP-hard allowing the problems lies in the explosion of potential candidates. Finding longest center string without concerning the sequence that may not contain any motifs is not known in advance in any particular biological gene process. GCS solved by frequent pattern-mining techniques and known to be fixed parameter tractable based on the fixed input sequence length and symbol set size. Efficient method known as Bpriori algorithms can solve GCS with reasonable time/space complexities. Bpriori 2 and Bpriori 3-2 algorithm are been proposed of any length and any positions of all their instances in input sequences. In this paper, we reduced the time/space complexity of Bpriori algorithm by Constrained Based Frequent Pattern mining (CBFP) technique which integrates the idea of Constraint Based Mining and FP-tree mining. CBFP mining technique solves the GCS problem works for all center string of any length, but also for the positions of all their mutated copies of input sequence. CBFP mining technique construct TRIE like with FP tree to represent the mutated copies of center string of any length, along with constraints to restraint growth of the consensus tree. The complexity analysis for Constrained Based FP mining technique and Bpriori algorithm is done based on the worst case and average case approach. Algorithm's correctness compared with the Bpriori algorithm using artificial data is shown.Keywords: Constraint Based Mining, FP tree, Data mining, GCS problem, CBFP mining technique.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1702163 Design of a Pneumonia Ontology for Diagnosis Decision Support System
Authors: Sabrina Azzi, Michal Iglewski, Véronique Nabelsi
Abstract:
Diagnosis error problem is frequent and one of the most important safety problems today. One of the main objectives of our work is to propose an ontological representation that takes into account the diagnostic criteria in order to improve the diagnostic. We choose pneumonia disease since it is one of the frequent diseases affected by diagnosis errors and have harmful effects on patients. To achieve our aim, we use a semi-automated method to integrate diverse knowledge sources that include publically available pneumonia disease guidelines from international repositories, biomedical ontologies and electronic health records. We follow the principles of the Open Biomedical Ontologies (OBO) Foundry. The resulting ontology covers symptoms and signs, all the types of pneumonia, antecedents, pathogens, and diagnostic testing. The first evaluation results show that most of the terms are covered by the ontology. This work is still in progress and represents a first and major step toward a development of a diagnosis decision support system for pneumonia.
Keywords: Clinical decision support system, diagnostic errors, ontology, pneumonia.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 882162 Effects of Beak Trimming on Behavior and Agonistic Activity of Thai Native Pullets Raised in Floor Pens
Authors: Pongchan Na-Lampang
Abstract:
The effect of beak trimming on behavior of two strains of Thai native pullets kept in floor pens was studied. Six general activities (standing, crouching, moving, comforting, roosting, and nesting), 6 beak related activities (preening, feeding, drinking, pecking at inedible object, feather pecking, and litter pecking), and 4 agonistic activities (head pecking, threatening, avoiding, and fighting) were measured twice a for 15 consecutive days, started when the pullets were 19 wk old. It was found that beak trimmed pullets drank more frequent (P<.01) but fed less frequent (P<.05) and show lower number of avoiding acts (P<.01) than intact pullets. Beak trimmed pullets showed all kind of agonistic activities less (P<.05). Genetic effect was found significant (P<.01) for drinking, nesting, and agonistic activities. Genetic by beak trimming interaction was found only for avoiding behavior (P<.01).Keywords: Agonistic Behavior, Beak Trimming, Behavior, Thai Native Pullet
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1472161 Study on Discharge Current Phenomena of Epoxy Resin Insulator Specimen
Authors: Waluyo, Ngapuli I. Sinisuka, Suwarno, Maman A. Djauhari
Abstract:
This paper presents the experimental results of discharge current phenomena on various humidity, temperature, pressure and pollutant conditions of epoxy resin specimen. The leakage distance of specimen was 3 cm, that it was supplied by high voltage. The polluted condition was given with NaCl artificial pollutant. The conducted measurements were discharge current and applied voltage. The specimen was put in a hermetically sealed chamber, and the current waveforms were analyzed with FFT. The result indicated that on discharge condition, the fifth harmonics still had dominant, rather than third one. The third harmonics tent to be appeared on low pressure heavily polluted condition, and followed by high humidity heavily polluted condition. On the heavily polluted specimen, the peaks discharge current points would be high and more frequent. Nevertheless, the specimen still had capacitive property. Besides that, usually discharge current points were more frequent. The influence of low pressure was still dominant to be easier to discharge. The non-linear property would be appear explicitly on low pressure and heavily polluted condition.Keywords: discharge current, third harmonic, fifth harmonic, epoxy resin, non-linear.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1437160 Biomarkers in a Post-Stroke Population: Allied to Health Care in Brazil
Authors: M. Ricardo Lang, A. Costa, I. Iesbik, K. Haag, L. Trindade Buffara, O. Reimann Junior, C. Auswaldt Steclan
Abstract:
Stroke affects not only the individual, but has significant impacts on the social and family context. Therefore, it is necessary to know the peculiarities of each region, in order to contribute to regional public health policies effectively. Thus, the present study discusses biomarkers in a post-stroke population, admitted to a stroke unit (U-stroke) of reference in the southern region of Brazil. Biomarkers were analyzed, such as age, length of stay, mortality rate, survival time, risk factors and family history of stroke in patients after ischemic stroke. In this studied population, comparing men and women, it was identified that men were more affected than women, and the average age of women affected was higher, as they also had the highest mortality rate and the shortest hospital stay. The risk factors identified here were according to the global scenario; with systemic arterial hypertension (SAH) being the most frequent and those associated with sedentary lifestyle in women the most frequent (dyslipidemia, heart disease and obesity). In view of this, the importance of studies that characterize populations regionally is evident, strengthening the strategic planning of policies in favor of health care.
Keywords: Biomarkers, population, stroke, sex, stroke unit.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 602159 Semantic Preference across Research Articles: A Corpus-Based Study of Adjectives in English
Authors: Valdênia Carvalho e Almeida
Abstract:
The goal of the present study is to investigate the semantic preference of the most frequent adjectives in research articles through a corpus-based analysis of texts published in journals in Applied Linguistics (AL). The corpus used in this study contains texts published in the period from 2014 to 2018 in the three journals: Language Learning and Technology; English for Academic Purposes, and TESOL Quaterly, totaling more than one million words. A corpus-based analysis was carried out on the corpus to identify the most frequent adjectives that co-occurred in the three journals. By observing the concordance lines of the adjectives and analyzing the words they associated with, the semantic preferences of each adjective were determined. Later, the AL corpus analysis was compared to the investigation of the same adjectives in a corpus of Chemistry. This second part of the study aimed to identify possible differences and similarities between the two corpora in relation to the use of the adjectives in research articles from both areas. The results show that there are some preferences which seem to be closely related not only to the academic genre of the texts but also to the specific domain of the discipline and, to a lesser extent, to the context of research in each journal. This research illustrates a possible contribution of Corpus Linguistics to explore the concept of semantic preference in more detail, considering the complex nature of the phenomenon.
Keywords: Applied linguistics, corpus linguistics, chemistry, research article, semantic preference.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1364158 Successful Management of a Boy with Mild Persistent Asthma (A Longitudinal Case)
Authors: Lubis A., Setiawati L., Setyoningrum A. R., Suryawan A., Irwanto
Abstract:
Asthma is a condition that causing chronic health problems in children. In addition to basic therapy against disease, we must try to reduce the impact of chronic health problems and also optimize their medical aspect of growth and development. A boy with mild asthma attack frequent episode did not showed any improvement with medical treatment and his asthma control test was 11. From radiologic examination he got hyperaerated lung and billateral sinusitis maxillaris; skin test results were house dust, food and pet allergy; an overweight body; bad school grades; psychological and environmental problem. We followed and evaluated this boy in 6 months, treated holistically. Even we could not do much on environmental but no more psychological and school problems, his on a good bodyweight and his asthma control test was 22. A case of a child with mild asthma attack frequent episode was reported. Asthma clinical course show no significant improvement when other predisposing factor is not well-controlled and a child’s growth and development may be affected. Improving condition of the patient can be created with the help of loving and caring way of nurturing from the parents and supportive peer group. Therefore, continuous and consistent monitoring is required because prognosis of asthma is generally good when regularly and properly controlled.
Keywords: Asthma, chronic health problems, growth and development.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1655