Search results for: frequent itemset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 980

Search results for: frequent itemset

980 Frequent Itemset Mining Using Rough-Sets

Authors: Usman Qamar, Younus Javed

Abstract:

Frequent pattern mining is the process of finding a pattern (a set of items, subsequences, substructures, etc.) that occurs frequently in a data set. It was proposed in the context of frequent itemsets and association rule mining. Frequent pattern mining is used to find inherent regularities in data. What products were often purchased together? Its applications include basket data analysis, cross-marketing, catalog design, sale campaign analysis, Web log (click stream) analysis, and DNA sequence analysis. However, one of the bottlenecks of frequent itemset mining is that as the data increase the amount of time and resources required to mining the data increases at an exponential rate. In this investigation a new algorithm is proposed which can be uses as a pre-processor for frequent itemset mining. FASTER (FeAture SelecTion using Entropy and Rough sets) is a hybrid pre-processor algorithm which utilizes entropy and rough-sets to carry out record reduction and feature (attribute) selection respectively. FASTER for frequent itemset mining can produce a speed up of 3.1 times when compared to original algorithm while maintaining an accuracy of 71%.

Keywords: rough-sets, classification, feature selection, entropy, outliers, frequent itemset mining

Procedia PDF Downloads 437
979 An Enhanced MEIT Approach for Itemset Mining Using Levelwise Pruning

Authors: Tanvi P. Patel, Warish D. Patel

Abstract:

Association rule mining forms the core of data mining and it is termed as one of the well-known methodologies of data mining. Objectives of mining is to find interesting correlations, frequent patterns, associations or casual structures among sets of items in the transaction databases or other data repositories. Hence, association rule mining is imperative to mine patterns and then generate rules from these obtained patterns. For efficient targeted query processing, finding frequent patterns and itemset mining, there is an efficient way to generate an itemset tree structure named Memory Efficient Itemset Tree. Memory efficient IT is efficient for storing itemsets, but takes more time as compare to traditional IT. The proposed strategy generates maximal frequent itemsets from memory efficient itemset tree by using levelwise pruning. For that firstly pre-pruning of items based on minimum support count is carried out followed by itemset tree reconstruction. By having maximal frequent itemsets, less number of patterns are generated as well as tree size is also reduced as compared to MEIT. Therefore, an enhanced approach of memory efficient IT proposed here, helps to optimize main memory overhead as well as reduce processing time.

Keywords: association rule mining, itemset mining, itemset tree, meit, maximal frequent pattern

Procedia PDF Downloads 370
978 Efficient Frequent Itemset Mining Methods over Real-Time Spatial Big Data

Authors: Hamdi Sana, Emna Bouazizi, Sami Faiz

Abstract:

In recent years, there is a huge increase in the use of spatio-temporal applications where data and queries are continuously moving. As a result, the need to process real-time spatio-temporal data seems clear and real-time stream data management becomes a hot topic. Sliding window model and frequent itemset mining over dynamic data are the most important problems in the context of data mining. Thus, sliding window model for frequent itemset mining is a widely used model for data stream mining due to its emphasis on recent data and its bounded memory requirement. These methods use the traditional transaction-based sliding window model where the window size is based on a fixed number of transactions. Actually, this model supposes that all transactions have a constant rate which is not suited for real-time applications. And the use of this model in such applications endangers their performance. Based on these observations, this paper relaxes the notion of window size and proposes the use of a timestamp-based sliding window model. In our proposed frequent itemset mining algorithm, support conditions are used to differentiate frequents and infrequent patterns. Thereafter, a tree is developed to incrementally maintain the essential information. We evaluate our contribution. The preliminary results are quite promising.

Keywords: real-time spatial big data, frequent itemset, transaction-based sliding window model, timestamp-based sliding window model, weighted frequent patterns, tree, stream query

Procedia PDF Downloads 161
977 HPPDFIM-HD: Transaction Distortion and Connected Perturbation Approach for Hierarchical Privacy Preserving Distributed Frequent Itemset Mining over Horizontally-Partitioned Dataset

Authors: Fuad Ali Mohammed Al-Yarimi

Abstract:

Many algorithms have been proposed to provide privacy preserving in data mining. These protocols are based on two main approaches named as: the perturbation approach and the Cryptographic approach. The first one is based on perturbation of the valuable information while the second one uses cryptographic techniques. The perturbation approach is much more efficient with reduced accuracy while the cryptographic approach can provide solutions with perfect accuracy. However, the cryptographic approach is a much slower method and requires considerable computation and communication overhead. In this paper, a new scalable protocol is proposed which combines the advantages of the perturbation and distortion along with cryptographic approach to perform privacy preserving in distributed frequent itemset mining on horizontally distributed data. Both the privacy and performance characteristics of the proposed protocol are studied empirically.

Keywords: anonymity data, data mining, distributed frequent itemset mining, gaussian perturbation, perturbation approach, privacy preserving data mining

Procedia PDF Downloads 505
976 Using Closed Frequent Itemsets for Hierarchical Document Clustering

Authors: Cheng-Jhe Lee, Chiun-Chieh Hsu

Abstract:

Due to the rapid development of the Internet and the increased availability of digital documents, the excessive information on the Internet has led to information overflow problem. In order to solve these problems for effective information retrieval, document clustering in text mining becomes a popular research topic. Clustering is the unsupervised classification of data items into groups without the need of training data. Many conventional document clustering methods perform inefficiently for large document collections because they were originally designed for relational database. Therefore they are impractical in real-world document clustering and require special handling for high dimensionality and high volume. We propose the FIHC (Frequent Itemset-based Hierarchical Clustering) method, which is a hierarchical clustering method developed for document clustering, where the intuition of FIHC is that there exist some common words for each cluster. FIHC uses such words to cluster documents and builds hierarchical topic tree. In this paper, we combine FIHC algorithm with ontology to solve the semantic problem and mine the meaning behind the words in documents. Furthermore, we use the closed frequent itemsets instead of only use frequent itemsets, which increases efficiency and scalability. The experimental results show that our method is more accurate than those of well-known document clustering algorithms.

Keywords: FIHC, documents clustering, ontology, closed frequent itemset

Procedia PDF Downloads 399
975 RA-Apriori: An Efficient and Faster MapReduce-Based Algorithm for Frequent Itemset Mining on Apache Flink

Authors: Sanjay Rathee, Arti Kashyap

Abstract:

Extraction of useful information from large datasets is one of the most important research problems. Association rule mining is one of the best methods for this purpose. Finding possible associations between items in large transaction based datasets (finding frequent patterns) is most important part of the association rule mining. There exist many algorithms to find frequent patterns but Apriori algorithm always remains a preferred choice due to its ease of implementation and natural tendency to be parallelized. Many single-machine based Apriori variants exist but massive amount of data available these days is above capacity of a single machine. Therefore, to meet the demands of this ever-growing huge data, there is a need of multiple machines based Apriori algorithm. For these types of distributed applications, MapReduce is a popular fault-tolerant framework. Hadoop is one of the best open-source software frameworks with MapReduce approach for distributed storage and distributed processing of huge datasets using clusters built from commodity hardware. However, heavy disk I/O operation at each iteration of a highly iterative algorithm like Apriori makes Hadoop inefficient. A number of MapReduce-based platforms are being developed for parallel computing in recent years. Among them, two platforms, namely, Spark and Flink have attracted a lot of attention because of their inbuilt support to distributed computations. Earlier we proposed a reduced- Apriori algorithm on Spark platform which outperforms parallel Apriori, one because of use of Spark and secondly because of the improvement we proposed in standard Apriori. Therefore, this work is a natural sequel of our work and targets on implementing, testing and benchmarking Apriori and Reduced-Apriori and our new algorithm ReducedAll-Apriori on Apache Flink and compares it with Spark implementation. Flink, a streaming dataflow engine, overcomes disk I/O bottlenecks in MapReduce, providing an ideal platform for distributed Apriori. Flink's pipelining based structure allows starting a next iteration as soon as partial results of earlier iteration are available. Therefore, there is no need to wait for all reducers result to start a next iteration. We conduct in-depth experiments to gain insight into the effectiveness, efficiency and scalability of the Apriori and RA-Apriori algorithm on Flink.

Keywords: apriori, apache flink, Mapreduce, spark, Hadoop, R-Apriori, frequent itemset mining

Procedia PDF Downloads 294
974 Frequent Item Set Mining for Big Data Using MapReduce Framework

Authors: Tamanna Jethava, Rahul Joshi

Abstract:

Frequent Item sets play an essential role in many data Mining tasks that try to find interesting patterns from the database. Typically it refers to a set of items that frequently appear together in transaction dataset. There are several mining algorithm being used for frequent item set mining, yet most do not scale to the type of data we presented with today, so called “BIG DATA”. Big Data is a collection of large data sets. Our approach is to work on the frequent item set mining over the large dataset with scalable and speedy way. Big Data basically works with Map Reduce along with HDFS is used to find out frequent item sets from Big Data on large cluster. This paper focuses on using pre-processing & mining algorithm as hybrid approach for big data over Hadoop platform.

Keywords: frequent item set mining, big data, Hadoop, MapReduce

Procedia PDF Downloads 434
973 Hybrid Approximate Structural-Semantic Frequent Subgraph Mining

Authors: Montaceur Zaghdoud, Mohamed Moussaoui, Jalel Akaichi

Abstract:

Frequent subgraph mining refers usually to graph matching and it is widely used in when analyzing big data with large graphs. A lot of research works dealt with structural exact or inexact graph matching but a little attention is paid to semantic matching when graph vertices and/or edges are attributed and typed. Therefore, it seems very interesting to integrate background knowledge into the analysis and that extracted frequent subgraphs should become more pruned by applying a new semantic filter instead of using only structural similarity in graph matching process. Consequently, this paper focuses on developing a new hybrid approximate structuralsemantic graph matching to discover a set of frequent subgraphs. It uses simultaneously an approximate structural similarity function based on graph edit distance function and a possibilistic vertices similarity function based on affinity function. Both structural and semantic filters contribute together to prune extracted frequent set. Indeed, new hybrid structural-semantic frequent subgraph mining approach searches will be suitable to be applied to several application such as community detection in social networks.

Keywords: approximate graph matching, hybrid frequent subgraph mining, graph mining, possibility theory

Procedia PDF Downloads 402
972 Frequent-Flyer Program: The Connection between Commercial Partners and Spin-off

Authors: Changmin Jiang

Abstract:

In this paper, we build a theoretical model to investigate the relationship between two recent trends in airline frequent-flyer programs (FFPs): the adoption of the “coalition” business model with other commercial partners, and the separation from airlines’ operations. We show that commercial partners benefit from teaming up with FFP, while increasing the number of commercial partners will increase the total profit; it reduces the average profit of the parties involved. Furthermore, we show that the number of commercial partners of an FFP is negatively related with the benefit to keep the FFP in-house.

Keywords: frequent flyer program, coalition, commercial partners, spin-off

Procedia PDF Downloads 301
971 Discussion about Frequent Adjustment of Urban Master Planning in China: A Case Study of Changshou District, Chongqing City

Authors: Sun Ailu, Zhao Wanmin

Abstract:

Since the reform and opening, the urbanization process of China has entered a rapid development period. In recent years, the authors participated in some projects of urban master planning in China and found a phenomenon that the rapid urbanization area of China is experiencing frequent adjustment process of urban master planning. This phenomenon is not the natural process of urbanization development. It may be caused by different government roles from different levels. Through the methods of investigation, data comparison and case study, this paper aims to explore the reason why the rapid urbanization area is experiencing frequent adjustment of master planning and give some solution strategies. Firstly, taking Changshou district of Chongqing city as an example, this paper wants to introduce the phenomenon about frequent adjustment process in China. And then, discuss distinct roles in the process between national government, provincial government and local government of China. At last, put forward preliminary solutions strategies for this area in China from the aspects of land use, intergovernmental cooperation and so on.

Keywords: urban master planning, frequent adjustment, urbanization development, problems and strategies, China

Procedia PDF Downloads 364
970 Efficient Recommendation System for Frequent and High Utility Itemsets over Incremental Datasets

Authors: J. K. Kavitha, D. Manjula, U. Kanimozhi

Abstract:

Mining frequent and high utility item sets have gained much significance in the recent years. When the data arrives sporadically, incremental and interactive rule mining and utility mining approaches can be adopted to handle user’s dynamic environmental needs and avoid redundancies, using previous data structures, and mining results. The dependence on recommendation systems has exponentially risen since the advent of search engines. This paper proposes a model for building a recommendation system that suggests frequent and high utility item sets over dynamic datasets for a cluster based location prediction strategy to predict user’s trajectories using the Efficient Incremental Rule Mining (EIRM) algorithm and the Fast Update Utility Pattern Tree (FUUP) algorithm. Through comprehensive evaluations by experiments, this scheme has shown to deliver excellent performance.

Keywords: data sets, recommendation system, utility item sets, frequent item sets mining

Procedia PDF Downloads 293
969 Trends and Inequalities in Distance to and Use of Nearest Natural Space in the Context of the 20-Minute Neighbourhood: A 4-Wave National Repeat Crosssectional Study, 2013 to 2019

Authors: Jonathan R. Olsen, Natalie Nicholls, Jenna Panter, Hannah Burnett, Michael Tornow, Richard Mitchell

Abstract:

The 20-minute neighborhood is a policy priority for governments worldwide and a key feature of this policy is providing access to natural space within 800 meters of home. The study aims were to (1) examine the association between distance to nearest natural space and frequent use over time and (2) examine whether frequent use and changes in use were patterned by income and housing tenure over time. Bi-annual Scottish Household Survey data were obtained for 2013 to 2019 (n:42128 aged 16+). Adults were asked the walking distance to their nearest natural space, the frequency of visits to this space and their housing tenure, as well as age, sex and income. We examined the association between distance from home of nearest natural space, housing tenure, and the likelihood of frequent natural space use (visited once a week or more). Two-way interaction terms were further applied to explore variation in the association between tenure and frequent natural space use over time. We found that 87% of respondents lived within 10 minute walk of a natural space, meeting the policy specification for a 20-minute neighbourhood. Greater proximity to natural space was associated with increased use; individuals living a 6 to 10 minute walk and over 10 minute walk were respectively 53% and 78% less likely to report frequent natural space use than those living within a 5 minute walk. Housing tenure was an important predictor of frequent natural space use; private renters and homeowners were more likely to report frequent natural space use than social renters. Our findings provide evidence that proximity to natural space is a strong predictor of frequent use. Our study provides important evidence that time-based access measures alone do not consider deep-rooted socioeconomic variation in use of Natural space. Policy makers should ensure a nuanced lens is applied to operationalising and monitoring the 20-minute neighbourhood to safeguard against exacerbating existing inequalities.

Keywords: natural space, housing, inequalities, 20-minute neighbourhood, urban design

Procedia PDF Downloads 120
968 Correction of Frequent English Writing Errors by Using Coded Indirect Corrective Feedback and Error Treatment

Authors: Chaiwat Tantarangsee

Abstract:

The purposes of this study are: 1) to study the frequent English writing errors of students registering the course: Reading and Writing English for Academic Purposes II, and 2) to find out the results of writing error correction by using coded indirect corrective feedback and writing error treatments. Samples include 28 2nd year English Major students, Faculty of Education, Suan Sunandha Rajabhat University. Tool for experimental study includes the lesson plan of the course; Reading and Writing English for Academic Purposes II, and tool for data collection includes 4 writing tests of short texts. The research findings disclose that frequent English writing errors found in this course comprise 7 types of grammatical errors, namely Fragment sentence, Subject-verb agreement, Wrong form of verb tense, Singular or plural noun endings, Run-ons sentence, Wrong form of verb pattern and Lack of parallel structure. Moreover, it is found that the results of writing error correction by using coded indirect corrective feedback and error treatment reveal the overall reduction of the frequent English writing errors and the increase of students’ achievement in the writing of short texts with the significance at .05.

Keywords: coded indirect corrective feedback, error correction, error treatment, frequent English writing errors

Procedia PDF Downloads 237
967 Association between Attention Deficit Hyperactivity Disorder Medication, Cannabis, and Nicotine Use, Mental Distress, and Other Psychoactive Substances

Authors: Nicole Scott, Emily Dwyer, Cara Patrissy, Samantha Bonventre, Lina Begdache

Abstract:

Across North America, the use and abuse of Attention Deficit Hyperactivity Disorder (ADHD) medication, cannabis, nicotine, and other psychoactive substances across college campuses have become an increasingly prevalent problem. Students frequently use these substances to aid their studying or deal with their mental health issues. However, it is still unknown what psychoactive substances are likely to be abused when college students illicitly use ADHD medication. In addition, it is not clear which psychoactive substance is associated with mental distress. Thus, the purpose of this study is to fill these gaps by assessing the use of different psychoactive substances when illicit ADHD medication is used; and how this association relates to mental stress. A total of 702 undergraduate students from different college campuses in the U.S. completed an anonymous survey distributed online. Data were self-reported on demographics, the use of ADHD medications, cannabis, nicotine, other psychoactive drugs, and mental distress, and feelings and opinions on the use of illicit study drugs were all included in the survey. Mental distress was assessed using the Kessler Psychological Distress 6 Scale. Data were analyzed in SPSS, Version 25.0, using Pearson’s Correlation Coefficient. Our results show that use of ADHD medication, cannabis use (non-frequent and very frequent), and nicotine use (non-frequent and very frequent), there were both statistically significant positive and negative correlations to specific psychoactive substances and their corresponding frequencies. Along the same lines, ADHD medication, cannabis use (non-frequent and very frequent), and nicotine use (non-frequent and very frequent) had statistically significant positive and negative correlations to specific mental distress experiences. As these findings are combined, a vicious loop can initiate a cycle where individuals who abuse psychoactive substances may or may not be inclined to use other psychoactive substances. This may later inhibit brain functions in those main areas of the brain stem, amygdala, and prefrontal cortex where this vicious cycle may or may not impact their mental distress. Addressing the impact of study drug abuse and its potential to be associated with further substance abuse may provide an educational framework and support proactive approaches to promote awareness among college students.

Keywords: stimulant, depressant, nicotine, ADHD medication, psychoactive substances, mental health, illicit, ecstasy, adrenochrome

Procedia PDF Downloads 62
966 On an Approach for Rule Generation in Association Rule Mining

Authors: B. Chandra

Abstract:

In Association Rule Mining, much attention has been paid for developing algorithms for large (frequent/closed/maximal) itemsets but very little attention has been paid to improve the performance of rule generation algorithms. Rule generation is an important part of Association Rule Mining. In this paper, a novel approach named NARG (Association Rule using Antecedent Support) has been proposed for rule generation that uses memory resident data structure named FCET (Frequent Closed Enumeration Tree) to find frequent/closed itemsets. In addition, the computational speed of NARG is enhanced by giving importance to the rules that have lower antecedent support. Comparative performance evaluation of NARG with fast association rule mining algorithm for rule generation has been done on synthetic datasets and real life datasets (taken from UCI Machine Learning Repository). Performance analysis shows that NARG is computationally faster in comparison to the existing algorithms for rule generation.

Keywords: knowledge discovery, association rule mining, antecedent support, rule generation

Procedia PDF Downloads 324
965 Frequent-Pattern Tree Algorithm Application to S&P and Equity Indexes

Authors: E. Younsi, H. Andriamboavonjy, A. David, S. Dokou, B. Lemrabet

Abstract:

Software and time optimization are very important factors in financial markets, which are competitive fields, and emergence of new computer tools further stresses the challenge. In this context, any improvement of technical indicators which generate a buy or sell signal is a major issue. Thus, many tools have been created to make them more effective. This worry about efficiency has been leading in present paper to seek best (and most innovative) way giving largest improvement in these indicators. The approach consists in attaching a signature to frequent market configurations by application of frequent patterns extraction method which is here most appropriate to optimize investment strategies. The goal of proposed trading algorithm is to find most accurate signatures using back testing procedure applied to technical indicators for improving their performance. The problem is then to determine the signatures which, combined with an indicator, outperform this indicator alone. To do this, the FP-Tree algorithm has been preferred, as it appears to be the most efficient algorithm to perform this task.

Keywords: quantitative analysis, back-testing, computational models, apriori algorithm, pattern recognition, data mining, FP-tree

Procedia PDF Downloads 361
964 Correction of Frequent English Writing Errors by Using Coded Indirect Corrective Feedback and Error Treatment: The Case of Reading and Writing English for Academic Purposes II

Authors: Chaiwat Tantarangsee

Abstract:

The purposes of this study are 1) to study the frequent English writing errors of students registering the course: Reading and Writing English for Academic Purposes II, and 2) to find out the results of writing error correction by using coded indirect corrective feedback and writing error treatments. Samples include 28 2nd year English Major students, Faculty of Education, Suan Sunandha Rajabhat University. Tool for experimental study includes the lesson plan of the course; Reading and Writing English for Academic Purposes II, and tool for data collection includes 4 writing tests of short texts. The research findings disclose that frequent English writing errors found in this course comprise 7 types of grammatical errors, namely Fragment sentence, Subject-verb agreement, Wrong form of verb tense, Singular or plural noun endings, Run-ons sentence, Wrong form of verb pattern and Lack of parallel structure. Moreover, it is found that the results of writing error correction by using coded indirect corrective feedback and error treatment reveal the overall reduction of the frequent English writing errors and the increase of students’ achievement in the writing of short texts with the significance at .05.

Keywords: coded indirect corrective feedback, error correction, error treatment, English writing

Procedia PDF Downloads 305
963 Analysis of Travel Behavior Patterns of Frequent Passengers after the Section Shutdown of Urban Rail Transit - Taking the Huaqiao Section of Shanghai Metro Line 11 Shutdown During the COVID-19 Epidemic as an Example

Authors: Hongyun Li, Zhibin Jiang

Abstract:

The travel of passengers in the urban rail transit network is influenced by changes in network structure and operational status, and the response of individual travel preferences to these changes also varies. Firstly, the influence of the suspension of urban rail transit line sections on passenger travel along the line is analyzed. Secondly, passenger travel trajectories containing multi-dimensional semantics are described based on network UD data. Next, passenger panel data based on spatio-temporal sequences is constructed to achieve frequent passenger clustering. Then, the Graph Convolutional Network (GCN) is used to model and identify the changes in travel modes of different types of frequent passengers. Finally, taking Shanghai Metro Line 11 as an example, the travel behavior patterns of frequent passengers after the Huaqiao section shutdown during the COVID-19 epidemic are analyzed. The results showed that after the section shutdown, most passengers would transfer to the nearest Anting station for boarding, while some passengers would transfer to other stations for boarding or cancel their travels directly. Among the passengers who transferred to Anting station for boarding, most of passengers maintained the original normalized travel mode, a small number of passengers waited for a few days before transferring to Anting station for boarding, and only a few number of passengers stopped traveling at Anting station or transferred to other stations after a few days of boarding on Anting station. The results can provide a basis for understanding urban rail transit passenger travel patterns and improving the accuracy of passenger flow prediction in abnormal operation scenarios.

Keywords: urban rail transit, section shutdown, frequent passenger, travel behavior pattern

Procedia PDF Downloads 84
962 Your Second Step to Understanding Research Ethics: Psycho-Methodological Approach

Authors: Sadeq Al Yaari, Ayman Al Yaari, Adham Al Yaari, Montaha Al Yaari, Aayah Al Yaari, Sajedah Al Yaari

Abstract:

Objective: The study is a summary of a book on research ethics in the scientific field. It aims at investigating ethics that researchers should follow before, during and after doing research. Method: It is an analytic research design wherein the researchers attempted to cover the phenomenon at hand from all specialists’ viewpoints by giving their answers to the most frequent asked questions. Results Questions on the research draft can only be answered when doing the research. This determines understanding the usage of research, questions on the on-line research, specializations and research-related concepts. Questions on the university’s library determines understanding where the library sections do exist, the periodicals, forums, and all about journals, theses and dissertations along with references.

Keywords: research ethics, most frequent questions, scientific answers, journals, library

Procedia PDF Downloads 57
961 Recognizing Customer Preferences Using Review Documents: A Hybrid Text and Data Mining Approach

Authors: Oshin Anand, Atanu Rakshit

Abstract:

The vast increment in the e-commerce ventures makes this area a prominent research stream. Besides several quantified parameters, the textual content of reviews is a storehouse of many information that can educate companies and help them earn profit. This study is an attempt in this direction. The article attempts to categorize data based on a computed metric that quantifies the influencing capacity of reviews rendering two categories of high and low influential reviews. Further, each of these document is studied to conclude several product feature categories. Each of these categories along with the computed metric is converted to linguistic identifiers and are used in an association mining model. The article makes a novel attempt to combine feature attraction with quantified metric to categorize review text and finally provide frequent patterns that depict customer preferences. Frequent mentions in a highly influential score depict customer likes or preferred features in the product whereas prominent pattern in low influencing reviews highlights what is not important for customers. This is achieved using a hybrid approach of text mining for feature and term extraction, sentiment analysis, multicriteria decision-making technique and association mining model.

Keywords: association mining, customer preference, frequent pattern, online reviews, text mining

Procedia PDF Downloads 388
960 Phonological Variation in the Speech of Grade 1 Teachers in Select Public Elementary Schools in the Philippines

Authors: M. Leonora D. Guerrero

Abstract:

The study attempted to uncover the most and least frequent phonological variation evident in the speech patterns of grade 1 teachers in select public elementary schools in the Philippines. It also determined the lectal description of the participants based on Tayao’s consonant charts for American and Philippine English. Descriptive method was utilized. A total of 24 grade 1 teachers participated in the study. The instrument used was word list. Each column in the word list is represented by words with the target consonant phonemes: labiodental fricatives f/ and /v/ and lingua-alveolar fricative /z/. These phonemes were in the initial, medial, and final positions, respectively. Findings of the study revealed that the most frequent variation happened when the participants read words with /z/ in the final position while the least frequent variation happened when the participants read words with /z/ in the initial position. The study likewise proved that the grade 1 teachers exhibited the segmental features of both the mesolect and basilect. Based on these results, it is suggested that teachers of English in the Philippines must aspire to manifest the features of the mesolect, if not, the acrolect since it is expected of the academicians not to be displaying the phonological features of the acrolects since this variety is only used by the 'uneducated.' This is especially so with grade 1 teachers who are often mimicked by their students who classify their speech as the 'standard.'

Keywords: consonant phonemes, lectal description, Philippine English, phonological variation

Procedia PDF Downloads 213
959 Implementation of Complete Management Practices in Managing the Cocoa Pod Borer

Authors: B. Saripah, A. Alias

Abstract:

Cocoa Theobroma cacao (Linnaeus) (Malvales: Sterculiaceae) is subjected to be infested by various numbers of insect pests, and Conopomorpha cramerella Snellen (Lepidoptera: Gracillariidae) is the most serious pest of cocoa in Malaysia. The pest was indigenous to the South East Asia. Several control measures have been implemented and the chemicals have been a major approach if not unilateral, in the management of CPB. Despite extensive use of insecticides, CPB continues to cause an unacceptable level of damage; thus, the combination of several control approaches should be sought. The study was commenced for 12 months at three blocks; Block 18C with complete management practices which include insecticide application, pruning, fertilization and frequent harvesting, Block 17C was treated with frequent harvesting at intervals of 7-8 days, and Block 19C was served as control block. The results showed that the mean numbers of CPB eggs were recorded higher in Block 17C compared with Block 18C in all sampling occasions. Block 18C shows the lowest mean number of CPB eggs in both sampling plots, outside and core plots and it was found significantly different (p ≤ 0. 05) compared to the other blocks. The mean number of CPB eggs was fluctuated throughout sampling occasions, the lowest mean number of eggs was recorded in January (17C) and November (18C), while the highest was recorded in April (17C) and December 2012 (18C). Frequent spraying with insecticides at the adjacent block (18C) helps in reducing CPB eggs in the control block (Block 19C), although there was no spraying was implemented Block 19C. In summary, the combination of complete management practices at Block 18C seems to have some effect on the CPB population at Blocks 17 and 19C because all blocks are adjacent to each other.

Keywords: cocoa, theobroma cacao, cocoa pod borer, conopomorpha cramerella

Procedia PDF Downloads 445
958 A Study of a Diachronic Relationship between Two Weak Inflection Classes in Norwegian, with Emphasis on Unexpected Productivity

Authors: Emilija Tribocka

Abstract:

This contribution presents parts of an ongoing study of a diachronic relationship between two weak verb classes in Norwegian, the a-class (cf. the paradigm of ‘throw’: kasta – kastar – kasta – kasta) and the e-class (cf. the paradigm of ‘buy’: kjøpa – kjøper – kjøpte – kjøpt). The study investigates inflection class shifts between the two classes with Old Norse, the ancestor of Modern Norwegian, as a starting point. Examination of inflection in 38 verbs in four chosen dialect areas (106 places of attestations) demonstrates that the shifts from the a-class to the e-class are widespread to varying degrees in three out of four investigated areas and are more common than the shifts in the opposite direction. The diachronic productivity of the e-class is unexpected for several reasons. There is general agreement that type frequency is an important factor influencing productivity. The a-class (53% of all weak verbs) was more type frequent in Old Norse than the e-class (42% of all weak verbs). Thus, given the type frequency, the expansion of the e-class is unexpected. Furthermore, in the ‘core’ areas of expanded e-class inflection, the shifts disregard phonological principles creating forms with uncomfortable consonant clusters, e.g., fiskte instead of fiska, the preterit of fiska ‘fish’. Later on, these forms may be contracted, i.e., fiskte > fiste. In this contribution, two factors influencing the shifts are presented: phonological form and token frequency. Verbs with the stem ending in a consonant cluster, particularly when the cluster ends in -t, hardly ever shift to the e-class. As a matter of fact, verbs with this structure belonging to the e-class in Old Norse shift to the a-class in Modern Norwegian, e.g., ON e-class verb skipta ‘change’ shifts to the a-class. This shift occurs as a result of the lack of morpho-phonological transparency between the stem and the preterit suffix of the e-class, -te. As there is a phonological fusion between the stem ending in -t and the suffix beginning in -t, the transparent a-class inflection is chosen. Token frequency plays an important role in the shifts, too, in some dialects. In one of the investigated areas, the most token frequent verbs of the ON e-class remain in the e-class (e.g., høyra ‘hear’, leva ‘live’, kjøpa ‘buy’), while less frequent verbs may shift to the a-class. Furthermore, the results indicate that the shift from the a-class to the e-class occurs in some of the most token frequent verbs of the ON a-class in this area, e.g., lika ‘like’, lova ‘promise’, svara ‘answer’. The latter is unexpected as frequent items tend to remain stable. This study presents a case of unexpected productivity, demonstrating that minor patterns can grow and outdo major patterns. Thus, type frequency is not the only factor that determines productivity. The study addresses the role of phonological form and token frequency in the spread of inflection patterns.

Keywords: inflection class, productivity, token frequency, phonological form

Procedia PDF Downloads 62
957 Design of a Pneumonia Ontology for Diagnosis Decision Support System

Authors: Sabrina Azzi, Michal Iglewski, Véronique Nabelsi

Abstract:

Diagnosis error problem is frequent and one of the most important safety problems today. One of the main objectives of our work is to propose an ontological representation that takes into account the diagnostic criteria in order to improve the diagnostic. We choose pneumonia disease since it is one of the frequent diseases affected by diagnosis errors and have harmful effects on patients. To achieve our aim, we use a semi-automated method to integrate diverse knowledge sources that include publically available pneumonia disease guidelines from international repositories, biomedical ontologies and electronic health records. We follow the principles of the Open Biomedical Ontologies (OBO) Foundry. The resulting ontology covers symptoms and signs, all the types of pneumonia, antecedents, pathogens, and diagnostic testing. The first evaluation results show that most of the terms are covered by the ontology. This work is still in progress and represents a first and major step toward a development of a diagnosis decision support system for pneumonia.

Keywords: Clinical decision support system, Diagnostic errors, Ontology, Pneumonia

Procedia PDF Downloads 188
956 Effects of Transit Fare Discount Programs on Passenger Volumes and Transferring Behaviors

Authors: Guan-Ying Chen, Han-Tsung Liou, Shou-Ren Hu

Abstract:

To address traffic congestion problems and encourage the use of public transportation systems in the Taipei metropolitan area, the Taipei City Government and the New Taipei City Government implemented a monthly ticket policy on April 16, 2018. This policy offers unlimited rides on the Taipei MRT, Taipei City Bus, New Taipei City Bus, Danhai Light Rail, and Public Bike (YouBike) on a monthly basis. Additionally, both city governments replaced the smart card discount policy with a new frequent flyer discount program (referred to as the loyal customer program) on February 1, 2020, introducing a differential pricing policy. Specifically, the more frequently the Taipei MRT system is used, the greater the discounts users receive. To analyze the impact of the Taipei public transport monthly ticket policy and the frequent user discount program on the passenger volume of the Taipei MRT system and the transferring behaviors of MRT users, this study conducts a trip-chain analysis using transaction data from Taipei MRT smart cards between September 2017 and December 2020. To achieve these objectives, the study employs four indicators: 1) number of passengers, 2) average number of rides, 3) average trip distance, and 4) instances of multiple consecutive rides. The study applies the t-test and Mann-Kendall trend test to investigate whether the proposed indicators have changed over time due to the implementation of the discount policy. Furthermore, the study examines the travel behaviors of passengers who use monthly tickets. The empirical results of the study indicate that the implementation of the Taipei public transport monthly ticket policy has led to an increase in the average number of passengers and a reduction in the average trip distance. Moreover, there has been a significant increase in instances of multiple consecutive rides, attributable to the unlimited rides offered by the monthly tickets. The impact of the frequent user discount program on changes in MRT passengers is not as pronounced as that of the Taipei public transportation monthly ticket policy. This is partly due to the fact that the frequent user discount program is only applicable to the Taipei MRT system, and the passenger volume was greatly affected by the COVID-19 pandemic. The findings of this research can serve as a reference for Taipei MRT Corporation in formulating its fare strategy and can also provide guidance for the Taipei and New Taipei City Governments in evaluating differential pricing policies for public transportation systems.

Keywords: frequent user discount program, mass rapid transit, monthly ticket, smart card

Procedia PDF Downloads 83
955 A Local Tensor Clustering Algorithm to Annotate Uncharacterized Genes with Many Biological Networks

Authors: Paul Shize Li, Frank Alber

Abstract:

A fundamental task of clinical genomics is to unravel the functions of genes and their associations with disorders. Although experimental biology has made efforts to discover and elucidate the molecular mechanisms of individual genes in the past decades, still about 40% of human genes have unknown functions, not to mention the diseases they may be related to. For those biologists who are interested in a particular gene with unknown functions, a powerful computational method tailored for inferring the functions and disease relevance of uncharacterized genes is strongly needed. Studies have shown that genes strongly linked to each other in multiple biological networks are more likely to have similar functions. This indicates that the densely connected subgraphs in multiple biological networks are useful in the functional and phenotypic annotation of uncharacterized genes. Therefore, in this work, we have developed an integrative network approach to identify the frequent local clusters, which are defined as those densely connected subgraphs that frequently occur in multiple biological networks and consist of the query gene that has few or no disease or function annotations. This is a local clustering algorithm that models multiple biological networks sharing the same gene set as a three-dimensional matrix, the so-called tensor, and employs the tensor-based optimization method to efficiently find the frequent local clusters. Specifically, massive public gene expression data sets that comprehensively cover dynamic, physiological, and environmental conditions are used to generate hundreds of gene co-expression networks. By integrating these gene co-expression networks, for a given uncharacterized gene that is of biologist’s interest, the proposed method can be applied to identify the frequent local clusters that consist of this uncharacterized gene. Finally, those frequent local clusters are used for function and disease annotation of this uncharacterized gene. This local tensor clustering algorithm outperformed the competing tensor-based algorithm in both module discovery and running time. We also demonstrated the use of the proposed method on real data of hundreds of gene co-expression data and showed that it can comprehensively characterize the query gene. Therefore, this study provides a new tool for annotating the uncharacterized genes and has great potential to assist clinical genomic diagnostics.

Keywords: local tensor clustering, query gene, gene co-expression network, gene annotation

Procedia PDF Downloads 168
954 Wind Comfort and Safety of People in the Vicinity of Tall Buildings

Authors: Mohan Kotamrazu

Abstract:

Tall buildings block and divert strong upper level winds to the ground. These high velocity winds many a time create adverse wind effects at ground level which can be uncomfortable and even compromise the safety of pedestrians and people who frequent the spaces in the vicinity of tall buildings. Discomfort can be experienced around the entrances and corners of tall buildings. Activities such as strolling or sitting in a park, waiting for a bus near a tall building can become highly unpleasant. For the elderly unpleasant conditions can also become dangerous leading to accidents and injuries. Today there is a growing concern among architects, planners and urban designers about the wind environment in the vicinity of tall building. Regulating authorities insist on wind tunnel testing of tall buildings in cities such as Wellington, Auckland, Boston, San Francisco, etc. prior to granting permission for their construction The present paper examines the different ways that tall buildings can induce strong winds at pedestrian level and their impact on people who frequent the spaces around tall buildings.

Keywords: tall buildings, wind effects, wind comfort, wind safety

Procedia PDF Downloads 373
953 A Comparative Analysis of Green Buildings Rating Systems

Authors: Shadi Motamedighazvini, Roohollah Taherkhani, Mahdi Mahdikhani, Najme Hashempour

Abstract:

Nowadays, green building rating systems are an inevitable necessity for managing environmental considerations to achieve green buildings. The aim of this paper is to deliver a detailed recognition of what has been the focus of green building policymakers around the world; It is important to conduct this study in a way that can provide a context for researchers who intend to establish or upgrade existing rating systems. In this paper, fifteen rating systems including four worldwide well-known plus eleven local rating systems which have been selected based on the answers to the questionnaires were examined. Their similarities and differences in mandatory and prerequisite clauses, highest and lowest scores for each criterion, the most frequent criteria, and most frequent sub-criteria are determined. The research findings indicated that although the criteria of energy, water, indoor quality (except Homestar), site and materials (except GRIHA) were common core criteria for all rating systems, their sub-criteria were different. This research, as a roadmap, eliminates the lack of a comprehensive reference that encompasses the key criteria of different rating systems. It shows the local systems need to be revised to be more comprehensive and adaptable to their own country’s conditions such as climate.

Keywords: environmental assessment, green buildings, green building criteria, green building rating systems, sustainability, rating tools

Procedia PDF Downloads 242
952 Text Mining Past Medical History in Electrophysiological Studies

Authors: Roni Ramon-Gonen, Amir Dori, Shahar Shelly

Abstract:

Background and objectives: Healthcare professionals produce abundant textual information in their daily clinical practice. The extraction of insights from all the gathered information, mainly unstructured and lacking in normalization, is one of the major challenges in computational medicine. In this respect, text mining assembles different techniques to derive valuable insights from unstructured textual data, so it has led to being especially relevant in Medicine. Neurological patient’s history allows the clinician to define the patient’s symptoms and along with the result of the nerve conduction study (NCS) and electromyography (EMG) test, assists in formulating a differential diagnosis. Past medical history (PMH) helps to direct the latter. In this study, we aimed to identify relevant PMH, understand which PMHs are common among patients in the referral cohort and documented by the medical staff, and examine the differences by sex and age in a large cohort based on textual format notes. Methods: We retrospectively identified all patients with abnormal NCS between May 2016 to February 2022. Age, gender, and all NCS attributes reports were recorded, including the summary text. All patients’ histories were extracted from the text report by a query. Basic text cleansing and data preparation were performed, as well as lemmatization. Very popular words (like ‘left’ and ‘right’) were deleted. Several words were replaced with their abbreviations. A bag of words approach was used to perform the analyses. Different visualizations which are common in text analysis, were created to easily grasp the results. Results: We identified 5282 unique patients. Three thousand and five (57%) patients had documented PMH. Of which 60.4% (n=1817) were males. The total median age was 62 years (range 0.12 – 97.2 years), and the majority of patients (83%) presented after the age of forty years. The top two documented medical histories were diabetes mellitus (DM) and surgery. DM was observed in 16.3% of the patients, and surgery at 15.4%. Other frequent patient histories (among the top 20) were fracture, cancer (ca), motor vehicle accident (MVA), leg, lumbar, discopathy, back and carpal tunnel release (CTR). When separating the data by sex, we can see that DM and MVA are more frequent among males, while cancer and CTR are less frequent. On the other hand, the top medical history in females was surgery and, after that, DM. Other frequent histories among females are breast cancer, fractures, and CTR. In the younger population (ages 18 to 26), the frequent PMH were surgery, fractures, trauma, and MVA. Discussion: By applying text mining approaches to unstructured data, we were able to better understand which medical histories are more relevant in these circumstances and, in addition, gain additional insights regarding sex and age differences. These insights might help to collect epidemiological demographical data as well as raise new hypotheses. One limitation of this work is that each clinician might use different words or abbreviations to describe the same condition, and therefore using a coding system can be beneficial.

Keywords: abnormal studies, healthcare analytics, medical history, nerve conduction studies, text mining, textual analysis

Procedia PDF Downloads 96
951 Predicting Medical Check-Up Patient Re-Coming Using Sequential Pattern Mining and Association Rules

Authors: Rizka Aisha Rahmi Hariadi, Chao Ou-Yang, Han-Cheng Wang, Rajesri Govindaraju

Abstract:

As the increasing of medical check-up popularity, there are a huge number of medical check-up data stored in database and have not been useful. These data actually can be very useful for future strategic planning if we mine it correctly. In other side, a lot of patients come with unpredictable coming and also limited available facilities make medical check-up service offered by hospital not maximal. To solve that problem, this study used those medical check-up data to predict patient re-coming. Sequential pattern mining (SPM) and association rules method were chosen because these methods are suitable for predicting patient re-coming using sequential data. First, based on patient personal information the data was grouped into … groups then discriminant analysis was done to check significant of the grouping. Second, for each group some frequent patterns were generated using SPM method. Third, based on frequent patterns of each group, pairs of variable can be extracted using association rules to get general pattern of re-coming patient. Last, discussion and conclusion was done to give some implications of the results.

Keywords: patient re-coming, medical check-up, health examination, data mining, sequential pattern mining, association rules, discriminant analysis

Procedia PDF Downloads 640