Search results for: Semantic Web Usage Mining

1109 Social Software Approach to E-Learning 3.0

Authors: Anna Nedyalkova, KrassimirNedyalkov, TeodoraBakardjieva

Abstract:

In the present paper, we-ll explore how social media tools provide an opportunity for new developments of the e-Learning in the context of managing personal knowledge. There will be a discussion how social media tools provide a possibility for helping knowledge workersand students to gather, organize and manage their personal information as a part of the e-learning process. At the centre of this social software driven approach to e-learning environments are the challenges of personalization and collaboration. We-ll share concepts of how organizations are using social media for e-Learning and believe that integration of these tools into traditional e-Learning is probably not a choice, but inevitability. Students- Survey of use of web technologies and social networking tools is presented. Newly developed framework for semantic blogging capable of organizing results relevant to user requirements is implemented at Varna Free University (VFU) to provide more effective navigation and search.

Keywords: Semantic blogging, social media tools, e-Learning, web 2.0, web 3.0.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1825

1108 Exploring the Challenges to Usage of Building and Construction Cost Indices in Ghana

Authors: J. J. Gyimah, E. Kissi, S. Osei-Tutu, C. D. Adobor, T. Adjei-Kumi, E. Osei-Tutu

Abstract:

Price fluctuation contract is imperative and of paramount essence in the construction industry as it provides adequate relief and cushioning for changes in the prices of input resources during construction. As a result, several methods have been devised to better help in arriving at fair recompense in the event of price changes. However, stakeholders often appear not to be satisfied with the existing methods of fluctuation evaluation, ostensibly because of the challenges associated with them. The aim of this study was to identify the challenges to usage of building construction cost indices in Ghana. Data were gathered from contractors and quantity surveying firms. The study utilized survey questionnaire approach to elicit responses from the contractors and the consultants. Data gathered were analyzed scientifically, using the Relative Importance Index (RII) to rank the problems associated with the existing methods. The findings revealed the following among others: late release of data; inadequate recovery of costs; and work items of interest not included in the published indices as the main challenges of the existing methods. Findings provided useful lessons for policy makers and practitioners in decision making towards the usage and improvement of available indices.

Keywords: Building construction cost indices, challenges, usage, Ghana.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 636

1107 Females’ Usage Patterns of Information and Communication Technologies (ICTs) in the Vhembe District, South Africa

Authors: F. O. Maphiri-Makananise

Abstract:

This paper explores and provides substantiated evidence on the usage patterns of Information and Communication Technologies (ICTs) by female users at Vhembe District in Limpopo- Province, South Africa. The study presents a comprehensive picture on the usage of ICTs from female users’ perspective. The significance of this study stems from the need to assess the role, relevance and usage patterns of ICTs such as smartphones, computers, laptops, and iPods, the internet and social networking sites among females following the developments of new media technologies in society. The objective of the study is to investigate the usability and accessibility of ICTs to empower female users in South Africa. The study used quantitative and qualitative research methods to determine the major ideas, perceptions and usage patterns of ICTs by users. Data collection involved the use of structured selfadministered questionnaire from two groups of respondents who participated in this study. Thus, (n=50) female students at the University of Venda provided their ideas and perceptions about the usefulness and usage patterns of ICTs such as smartphones, the Internet and computers at the university level, whereas, the second group were (n=50) learners from Makhado Comprehensive School who provided their perceptions and ideas about the use of ICTs at the high school level. The researcher also noted that the findings of the study were useful as a guideline and model for ICT intervention that could work as an empowerment to women in South Africa. It was observed that the central purpose of ICTs among female users was to search for information regarding assignment writing, conducting research, dating, exchanging ideas and networking with friends and relatives. This was demonstrated by a high number of females who used ICTs for e-learning (62%) and social purposes (85%). Therefore, the study revealed that most females used ICTs for social purposes and accessing the internet rather than for entertainment, a gesture that provides an opportune space to empower rural women in South Africa.

Keywords: Female users, Information and Communication Technologies, Internet, Usage patterns.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1738

1106 Applying Fuzzy FP-Growth to Mine Fuzzy Association Rules

Authors: Chien-Hua Wang, Wei-Hsuan Lee, Chin-Tzong Pang

Abstract:

In data mining, the association rules are used to find for the associations between the different items of the transactions database. As the data collected and stored, rules of value can be found through association rules, which can be applied to help managers execute marketing strategies and establish sound market frameworks. This paper aims to use Fuzzy Frequent Pattern growth (FFP-growth) to derive from fuzzy association rules. At first, we apply fuzzy partition methods and decide a membership function of quantitative value for each transaction item. Next, we implement FFP-growth to deal with the process of data mining. In addition, in order to understand the impact of Apriori algorithm and FFP-growth algorithm on the execution time and the number of generated association rules, the experiment will be performed by using different sizes of databases and thresholds. Lastly, the experiment results show FFPgrowth algorithm is more efficient than other existing methods.

Keywords: Data mining, association rule, fuzzy frequent patterngrowth.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1806

1105 Enhanced Conference Organization Based On Correlation of Web Information and Ontology Based Expertise Search

Authors: Hassan Noureddine, Maria Sokhn, Iman Jarkass, Elena Mugellini, Omar Abou Khaled

Abstract:

From the importance of the conference and its constructive role in the studies discussion, there must be a strong organization that allows the exploitation of the discussions in opening new horizons. The vast amount of information scattered across the web, make it difficult to find experts, who can play a prominent role in organizing conferences. In this paper we proposed a new approach of extracting researchers- information from various Web resources and correlating them in order to confirm their correctness. As a validator of this approach, we propose a service that will be useful to set up a conference. Its main objective is to find appropriate experts, as well as the social events for a conference. For this application we us Semantic Web technologies like RDF and ontology to represent the confirmed information, which are linked to another ontology (skills ontology) that are used to present and compute the expertise.

Keywords: Expert finding, Information extraction, Ontologies, Semantic web, Social events.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1637

1104 Applications of Genetic Programming in Data Mining

Authors: Saleh Mesbah Elkaffas, Ahmed A. Toony

Abstract:

This paper details the application of a genetic programming framework for induction of useful classification rules from a database of income statements, balance sheets, and cash flow statements for North American public companies. Potentially interesting classification rules are discovered. Anomalies in the discovery process merit further investigation of the application of genetic programming to the dataset for the problem domain.

Keywords: Genetic programming, data mining classification rule.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1553

1103 The Design and Development of Foot Massage Plate from Coconut Shell

Authors: Chananchida Yuktirat, Nichanant Sermsri

Abstract:

The objectives of this research were to design and develop foot massage plate from coconut shell. The research investigated on the satisfaction of the users on the developed foot massage plate on 4 aspects; usage, practical in use, safety, and materials & production process. The sample group included 64 people joining the service at Wat Paitan Health Center, Bangkok. The samples were randomly tried on the massage plate and evaluated according to the 4 aspects. The data were analyzed to find mean, percentage, and standard deviation. The result showed that the overall satisfaction was at good level (mean = 3.80). When considering in details, it was found that the subjects reported their highest satisfaction on the practical usage (mean = 4.16), followed by safety (mean = 3.82); then, materials and production process (mean = 3.78). The least satisfaction aspect was on function and usage (mean = 3.45) or moderate level.

Keywords: Coconut Shell, Design, Foot Massage, Foot Massage Plate.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1903

1102 Novelty as a Measure of Interestingness in Knowledge Discovery

Authors: Vasudha Bhatnagar, Ahmed Sultan Al-Hegami, Naveen Kumar

Abstract:

Rule Discovery is an important technique for mining knowledge from large databases. Use of objective measures for discovering interesting rules leads to another data mining problem, although of reduced complexity. Data mining researchers have studied subjective measures of interestingness to reduce the volume of discovered rules to ultimately improve the overall efficiency of KDD process. In this paper we study novelty of the discovered rules as a subjective measure of interestingness. We propose a hybrid approach based on both objective and subjective measures to quantify novelty of the discovered rules in terms of their deviations from the known rules (knowledge). We analyze the types of deviation that can arise between two rules and categorize the discovered rules according to the user specified threshold. We implement the proposed framework and experiment with some public datasets. The experimental results are promising.

Keywords: Knowledge Discovery in Databases (KDD), Interestingness, Subjective Measures, Novelty Index.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1814

1101 Barriers to the Uptake of Technology in the Quantity Surveying Industry

Authors: Mnisi Blessing, Christopher Amoah

Abstract:

The usage of modern technology is widespread in industrialised nations. The issue still pertains to developing countries since they struggle to use technology in the building sector. The study aims to identify the barriers to technology usage in quantity surveying firms. Quantity Surveyors were interviewed via Microsoft teams due to the dispersed nature of the participants. However, where the interview was not possible, the interview guide was emailed to the participants to fill in. In all, 12 participants were interviewed out of the 25 participants contacted. The data received were analysed using the content analysis process. The study's findings demonstrate that quantity surveyors have access to a wide range of technology that significantly enhances their project activities. However, quantity surveying companies are hesitant to use technology for several reasons, including the cost and maintenance associated with it. Other obstacles include a lack of knowledge, poor market acceptance, legal obstacles, and budgetary constraints. Despite the advantages associated with modern technology applications, quantity surveying firms are not using them, which may ultimately affect their work output. Therefore, firms need to re-examine these obstacles, inhibiting their adoption of technology in the work process to enhance their production. The study reveals the main hindrances to technology usage, which may help firms institute measures to address them.

Keywords: Technology usage barriers, technology implementation, technology acceptance, quantity surveying.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 277

1100 Hybrid Reliability-Similarity-Based Approach for Supervised Machine Learning

Authors: Walid Cherif

Abstract:

Data mining has, over recent years, seen big advances because of the spread of internet, which generates everyday a tremendous volume of data, and also the immense advances in technologies which facilitate the analysis of these data. In particular, classification techniques are a subdomain of Data Mining which determines in which group each data instance is related within a given dataset. It is used to classify data into different classes according to desired criteria. Generally, a classification technique is either statistical or machine learning. Each type of these techniques has its own limits. Nowadays, current data are becoming increasingly heterogeneous; consequently, current classification techniques are encountering many difficulties. This paper defines new measure functions to quantify the resemblance between instances and then combines them in a new approach which is different from actual algorithms by its reliability computations. Results of the proposed approach exceeded most common classification techniques with an f-measure exceeding 97% on the IRIS Dataset.

Keywords: Data mining, knowledge discovery, machine learning, similarity measurement, supervised classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1534

1099 A Survey on Usage and Diffusion of Project Risk Management Techniques and Software Tools in the Construction Industry

Authors: Muhammad Jamaluddin Thaheem, Alberto De Marco

Abstract:

The area of Project Risk Management (PRM) has been extensively researched, and the utilization of various tools and techniques for managing risk in several industries has been sufficiently reported. Formal and systematic PRM practices have been made available for the construction industry. Based on such body of knowledge, this paper tries to find out the global picture of PRM practices and approaches with the help of a survey to look into the usage of PRM techniques and diffusion of software tools, their level of maturity, and their usefulness in the construction sector. Results show that, despite existing techniques and tools, their usage is limited: software tools are used only by a minority of respondents and their cost is one of the largest hurdles in adoption. Finally, the paper provides some important guidelines for future research regarding quantitative risk analysis techniques and suggestions for PRM software tools development and improvement.

Keywords: Construction industry, Project risk management, Software tools, Survey study.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2983

1098 UB-Tree Indexing for Semantic Query Optimization of Range Queries

Authors: S. Housseno, A. Simonet, M. Simonet

Abstract:

Semantic query optimization consists in restricting the search space in order to reduce the set of objects of interest for a query. This paper presents an indexing method based on UB-trees and a static analysis of the constraints associated to the views of the database and to any constraint expressed on attributes. The result of the static analysis is a partitioning of the object space into disjoint blocks. Through Space Filling Curve (SFC) techniques, each fragment (block) of the partition is assigned a unique identifier, enabling the efficient indexing of fragments by UB-trees. The search space corresponding to a range query is restricted to a subset of the blocks of the partition. This approach has been developed in the context of a KB-DBMS but it can be applied to any relational system.

Keywords: Index, Range query, UB-tree, Space Filling Curve, Query optimization, Views, Database, Integrity Constraint, Classification.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1506

1097 Characteristics of Football Spectators Using Second Screen

Authors: Florian Pfeffel, Christoph A. Kexel, Peter Kexel, Maria Ratz

Abstract:

The parallel usage of different media channels has increased recently owing to technological advances. Second Screen describes the use of a second device by television viewers to consume further content which is related to the program they are watching. This study analysed the characteristics of football spectators regarding their media consumption in relation to Second Screen usage while watching a football match on TV. The existing literature on Second Screen usage is still very limited, especially in the context of particular broadcasting settings such as sport or even more specific such as football matches. Therefore, the primary research objective was to reveal first insights into the user behaviour of football spectators regarding Second Screen services. The survey, which was conducted among German football supporters in 2015, revealed some characteristics such as the identification and involvement into the sports which are related to an increased use of Second Screen services. One important finding for football supporters was that at the time of a match they have a lower parallel media usage compared to other TV broadcastings. Nevertheless, if supporters used a second device while watching a match on TV, then they were using specific Second Screen services. This means they searched for more content related information. The findings on the habits and characteristics of people who are using Second Screen services are relevant for future developments in that area as well as for marketing decisions.

Keywords: Media consumption, second screen, sport marketing, user behaviour.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1800

1096 A Cumulative Learning Approach to Data Mining Employing Censored Production Rules (CPRs)

Authors: Rekha Kandwal, Kamal K.Bharadwaj

Abstract:

Knowledge is indispensable but voluminous knowledge becomes a bottleneck for efficient processing. A great challenge for data mining activity is the generation of large number of potential rules as a result of mining process. In fact sometimes result size is comparable to the original data. Traditional data mining pruning activities such as support do not sufficiently reduce the huge rule space. Moreover, many practical applications are characterized by continual change of data and knowledge, thereby making knowledge voluminous with each change. The most predominant representation of the discovered knowledge is the standard Production Rules (PRs) in the form If P Then D. Michalski & Winston proposed Censored Production Rules (CPRs), as an extension of production rules, that exhibit variable precision and supports an efficient mechanism for handling exceptions. A CPR is an augmented production rule of the form: If P Then D Unless C, where C (Censor) is an exception to the rule. Such rules are employed in situations in which the conditional statement 'If P Then D' holds frequently and the assertion C holds rarely. By using a rule of this type we are free to ignore the exception conditions, when the resources needed to establish its presence, are tight or there is simply no information available as to whether it holds or not. Thus the 'If P Then D' part of the CPR expresses important information while the Unless C part acts only as a switch changes the polarity of D to ~D. In this paper a scheme based on Dempster-Shafer Theory (DST) interpretation of a CPR is suggested for discovering CPRs from the discovered flat PRs. The discovery of CPRs from flat rules would result in considerable reduction of the already discovered rules. The proposed scheme incrementally incorporates new knowledge and also reduces the size of knowledge base considerably with each episode. Examples are given to demonstrate the behaviour of the proposed scheme. The suggested cumulative learning scheme would be useful in mining data streams.

Keywords: Censored production rules, cumulative learning, data mining, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1490

1095 Regression Analysis of Travel Indicators and Public Transport Usage in Urban Areas

Authors: M. Moeinaddini, Z. Asadi-Shekari, M. Zaly Shah, A. Hamzah

Abstract:

Currently, planners try to have more green travel options to decrease economic, social and environmental problems. Therefore, this study tries to find significant urban travel factors to be used to increase the usage of alternative urban travel modes. This paper attempts to identify the relationship between prominent urban mobility indicators and daily trips by public transport in 30 cities from various parts of the world. Different travel modes, infrastructures and cost indicators were evaluated in this research as mobility indicators. The results of multi-linear regression analysis indicate that there is a significant relationship between mobility indicators and the daily usage of public transport.

Keywords: Green travel modes, urban travel indicators, daily trips by public transport, multi-linear regression analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2561

1094 Classifier Based Text Mining for Neural Network

Authors: M. Govindarajan, R. M. Chandrasekaran

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In Neural Network that address classification problems, training set, testing set, learning rate are considered as key tasks. That is collection of input/output patterns that are used to train the network and used to assess the network performance, set the rate of adjustments. This paper describes a proposed back propagation neural net classifier that performs cross validation for original Neural Network. In order to reduce the optimization of classification accuracy, training time. The feasibility the benefits of the proposed approach are demonstrated by means of five data sets like contact-lenses, cpu, weather symbolic, Weather, labor-nega-data. It is shown that , compared to exiting neural network, the training time is reduced by more than 10 times faster when the dataset is larger than CPU or the network has many hidden units while accuracy ('percent correct') was the same for all datasets but contact-lences, which is the only one with missing attributes. For contact-lences the accuracy with Proposed Neural Network was in average around 0.3 % less than with the original Neural Network. This algorithm is independent of specify data sets so that many ideas and solutions can be transferred to other classifier paradigms.

Keywords: Back propagation, classification accuracy, textmining, time complexity.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4223

1093 Text Mining Technique for Data Mining Application

Authors: M. Govindarajan

Abstract:

Text Mining is around applying knowledge discovery techniques to unstructured text is termed knowledge discovery in text (KDT), or Text data mining or Text Mining. In decision tree approach is most useful in classification problem. With this technique, tree is constructed to model the classification process. There are two basic steps in the technique: building the tree and applying the tree to the database. This paper describes a proposed C5.0 classifier that performs rulesets, cross validation and boosting for original C5.0 in order to reduce the optimization of error ratio. The feasibility and the benefits of the proposed approach are demonstrated by means of medial data set like hypothyroid. It is shown that, the performance of a classifier on the training cases from which it was constructed gives a poor estimate by sampling or using a separate test file, either way, the classifier is evaluated on cases that were not used to build and evaluate the classifier are both are large. If the cases in hypothyroid.data and hypothyroid.test were to be shuffled and divided into a new 2772 case training set and a 1000 case test set, C5.0 might construct a different classifier with a lower or higher error rate on the test cases. An important feature of see5 is its ability to classifiers called rulesets. The ruleset has an error rate 0.5 % on the test cases. The standard errors of the means provide an estimate of the variability of results. One way to get a more reliable estimate of predictive is by f-fold –cross- validation. The error rate of a classifier produced from all the cases is estimated as the ratio of the total number of errors on the hold-out cases to the total number of cases. The Boost option with x trials instructs See5 to construct up to x classifiers in this manner. Trials over numerous datasets, large and small, show that on average 10-classifier boosting reduces the error rate for test cases by about 25%.

Keywords: C5.0, Error Ratio, text mining, training data, test data.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2495

1092 Analysis of a Population of Diabetic Patients Databases with Classifiers

Authors: Murat Koklu, Yavuz Unal

Abstract:

Data mining can be called as a technique to extract information from data. It is the process of obtaining hidden information and then turning it into qualified knowledge by statistical and artificial intelligence technique. One of its application areas is medical area to form decision support systems for diagnosis just by inventing meaningful information from given medical data. In this study a decision support system for diagnosis of illness that make use of data mining and three different artificial intelligence classifier algorithms namely Multilayer Perceptron, Naive Bayes Classifier and J.48. Pima Indian dataset of UCI Machine Learning Repository was used. This dataset includes urinary and blood test results of 768 patients. These test results consist of 8 different feature vectors. Obtained classifying results were compared with the previous studies. The suggestions for future studies were presented.

Keywords: Artificial Intelligence, Classifiers, Data Mining, Diabetic Patients.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5436

1091 Multi-Dimensional Concerns Mining for Web Applications via Concept-Analysis

Authors: Carlo Bellettini, Alessandro Marchetto, Andrea Trentini

Abstract:

Web applications have become very complex and crucial, especially when combined with areas such as CRM (Customer Relationship Management) and BPR (Business Process Reengineering), the scientific community has focused attention to Web applications design, development, analysis, and testing, by studying and proposing methodologies and tools. This paper proposes an approach to automatic multi-dimensional concern mining for Web Applications, based on concepts analysis, impact analysis, and token-based concern identification. This approach lets the user to analyse and traverse Web software relevant to a particular concern (concept, goal, purpose, etc.) via multi-dimensional separation of concerns, to document, understand and test Web applications. This technique was developed in the context of WAAT (Web Applications Analysis and Testing) project. A semi-automatic tool to support this technique is currently under development.

Keywords: Concepts Analysis, Concerns Mining, Multi-Dimensional Separation of Concerns, Impact Analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1479

1090 Investigation of Cascade Loop Heat Pipes

Authors: Nandy Putra, Atrialdipa Duanovsah, Kristofer Haliansyah

Abstract:

The aim of this research is to design a LHP with low thermal resistance and low condenser temperature. A Self-designed cascade LHP was tested by using biomaterial, sintered copper powder, and aluminum screen mesh as the wick. Using pure water as the working fluid for the first level of the LHP and 96% alcohol as the working fluid for the second level of LHP, the experiments were run with 10W, 20W, and 30W heat input. Experimental result shows that the usage of biomaterial as wick could reduce more temperature at evaporator than by using sintered copper powder and screen mesh up to 22.63% and 37.41% respectively. The lowest thermal resistance occurred during the usage of biomaterial as wick of heat pipe, which is 2.06 ^oC/W. The usage of cascade system could be applied to LHP to reduce the temperature at condenser and reduced thermal resistance up to 17.6%.

Keywords: Biomaterial, cascade loop heat pipe, screen mesh, sintered Cu.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 916

1089 Ensemble Approach for Predicting Student's Academic Performance

Authors: L. A. Muhammad, M. S. Argungu

Abstract:

Educational data mining (EDM) has recorded substantial considerations. Techniques of data mining in one way or the other have been proposed to dig out out-of-sight knowledge in educational data. The result of the study got assists academic institutions in further enhancing their process of learning and methods of passing knowledge to students. Consequently, the performance of students boasts and the educational products are by no doubt enhanced. This study adopted a student performance prediction model premised on techniques of data mining with Students' Essential Features (SEF). SEF are linked to the learner's interactivity with the e-learning management system. The performance of the student's predictive model is assessed by a set of classifiers, viz. Bayes Network, Logistic Regression, and Reduce Error Pruning Tree (REP). Consequently, ensemble methods of Bagging, Boosting, and Random Forest (RF) are applied to improve the performance of these single classifiers. The study reveals that the result shows a robust affinity between learners' behaviors and their academic attainment. Result from the study shows that the REP Tree and its ensemble record the highest accuracy of 83.33% using SEF. Hence, in terms of the Receiver Operating Curve (ROC), boosting method of REP Tree records 0.903, which is the best. This result further demonstrates the dependability of the proposed model.

Keywords: Ensemble, bagging, Random Forest, boosting, data mining, classifiers, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 786

1088 Arabic Light Stemmer for Better Search Accuracy

Authors: Sahar Khedr, Dina Sayed, Ayman Hanafy

Abstract:

Arabic is one of the most ancient and critical languages in the world. It has over than 250 million Arabic native speakers and more than twenty countries having Arabic as one of its official languages. In the past decade, we have witnessed a rapid evolution in smart devices, social network and technology sector which led to the need to provide tools and libraries that properly tackle the Arabic language in different domains. Stemming is one of the most crucial linguistic fundamentals. It is used in many applications especially in information extraction and text mining fields. The motivation behind this work is to enhance the Arabic light stemmer to serve the data mining industry and leverage it in an open source community. The presented implementation works on enhancing the Arabic light stemmer by utilizing and enhancing an algorithm that provides an extension for a new set of rules and patterns accompanied by adjusted procedure. This study has proven a significant enhancement for better search accuracy with an average 10% improvement in comparison with previous works.

Keywords: Arabic data mining, Arabic Information extraction, Arabic Light stemmer, Arabic stemmer.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1514

1087 A Text Clustering System based on k-means Type Subspace Clustering and Ontology

Authors: Liping Jing, Michael K. Ng, Xinhua Yang, Joshua Zhexue Huang

Abstract:

This paper presents a text clustering system developed based on a k-means type subspace clustering algorithm to cluster large, high dimensional and sparse text data. In this algorithm, a new step is added in the k-means clustering process to automatically calculate the weights of keywords in each cluster so that the important words of a cluster can be identified by the weight values. For understanding and interpretation of clustering results, a few keywords that can best represent the semantic topic are extracted from each cluster. Two methods are used to extract the representative words. The candidate words are first selected according to their weights calculated by our new algorithm. Then, the candidates are fed to the WordNet to identify the set of noun words and consolidate the synonymy and hyponymy words. Experimental results have shown that the clustering algorithm is superior to the other subspace clustering algorithms, such as PROCLUS and HARP and kmeans type algorithm, e.g., Bisecting-KMeans. Furthermore, the word extraction method is effective in selection of the words to represent the topics of the clusters.

Keywords: Subspace Clustering, Text Mining, Feature Weighting, Cluster Interpretation, Ontology

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2467

1086 Mining Network Data for Intrusion Detection through Naïve Bayesian with Clustering

Authors: Dewan Md. Farid, Nouria Harbi, Suman Ahmmed, Md. Zahidur Rahman, Chowdhury Mofizur Rahman

Abstract:

Network security attacks are the violation of information security policy that received much attention to the computational intelligence society in the last decades. Data mining has become a very useful technique for detecting network intrusions by extracting useful knowledge from large number of network data or logs. Naïve Bayesian classifier is one of the most popular data mining algorithm for classification, which provides an optimal way to predict the class of an unknown example. It has been tested that one set of probability derived from data is not good enough to have good classification rate. In this paper, we proposed a new learning algorithm for mining network logs to detect network intrusions through naïve Bayesian classifier, which first clusters the network logs into several groups based on similarity of logs, and then calculates the prior and conditional probabilities for each group of logs. For classifying a new log, the algorithm checks in which cluster the log belongs and then use that cluster-s probability set to classify the new log. We tested the performance of our proposed algorithm by employing KDD99 benchmark network intrusion detection dataset, and the experimental results proved that it improves detection rates as well as reduces false positives for different types of network intrusions.

Keywords: Clustering, detection rate, false positive, naïveBayesian classifier, network intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5546

1085 Investigating Crime Hotspot Places and their Implication to Urban Environmental Design: A Geographic Visualization and Data Mining Approach

Authors: Donna R. Tabangin, Jacqueline C. Flores, Nelson F. Emperador

Abstract:

Information is power. Geographical information is an emerging science that is advancing the development of knowledge to further help in the understanding of the relationship of “place" with other disciplines such as crime. The researchers used crime data for the years 2004 to 2007 from the Baguio City Police Office to determine the incidence and actual locations of crime hotspots. Combined qualitative and quantitative research methodology was employed through extensive fieldwork and observation, geographic visualization with Geographic Information Systems (GIS) and Global Positioning Systems (GPS), and data mining. The paper discusses emerging geographic visualization and data mining tools and methodologies that can be used to generate baseline data for environmental initiatives such as urban renewal and rejuvenation. The study was able to demonstrate that crime hotspots can be computed and were seen to be occurring to some select places in the Central Business District (CBD) of Baguio City. It was observed that some characteristics of the hotspot places- physical design and milieu may play an important role in creating opportunities for crime. A list of these environmental attributes was generated. This derived information may be used to guide the design or redesign of the urban environment of the City to be able to reduce crime and at the same time improve it physically.

Keywords: Crime mapping, data mining, environmental design, geographic visualization, GIS.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2636

1084 Knowledge-Driven Decision Support System Based on Knowledge Warehouse and Data Mining by Improving Apriori Algorithm with Fuzzy Logic

Authors: Pejman Hosseinioun, Hasan Shakeri, Ghasem Ghorbanirostam

Abstract:

In recent years, we have seen an increasing importance of research and study on knowledge source, decision support systems, data mining and procedure of knowledge discovery in data bases and it is considered that each of these aspects affects the others. In this article, we have merged information source and knowledge source to suggest a knowledge based system within limits of management based on storing and restoring of knowledge to manage information and improve decision making and resources. In this article, we have used method of data mining and Apriori algorithm in procedure of knowledge discovery one of the problems of Apriori algorithm is that, a user should specify the minimum threshold for supporting the regularity. Imagine that a user wants to apply Apriori algorithm for a database with millions of transactions. Definitely, the user does not have necessary knowledge of all existing transactions in that database, and therefore cannot specify a suitable threshold. Our purpose in this article is to improve Apriori algorithm. To achieve our goal, we tried using fuzzy logic to put data in different clusters before applying the Apriori algorithm for existing data in the database and we also try to suggest the most suitable threshold to the user automatically.

Keywords: Decision support system, data mining, knowledge discovery, data discovery, fuzzy logic.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2142

1083 Discovering Complex Regularities by Adaptive Self Organizing Classification

Authors: A. Faro, D. Giordano, F. Maiorana

Abstract:

Data mining uses a variety of techniques each of which is useful for some particular task. It is important to have a deep understanding of each technique and be able to perform sophisticated analysis. In this article we describe a tool built to simulate a variation of the Kohonen network to perform unsupervised clustering and support the entire data mining process up to results visualization. A graphical representation helps the user to find out a strategy to optmize classification by adding, moving or delete a neuron in order to change the number of classes. The tool is also able to automatically suggest a strategy for number of classes optimization.The tool is used to classify macroeconomic data that report the most developed countries? import and export. It is possible to classify the countries based on their economic behaviour and use an ad hoc tool to characterize the commercial behaviour of a country in a selected class from the analysis of positive and negative features that contribute to classes formation.

Keywords: Unsupervised classification, Kohonen networks, macroeconomics, Visual data mining, cluster interpretation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1574

1082 Hierarchical Clustering Algorithms in Data Mining

Authors: Z. Abdullah, A. R. Hamdan

Abstract:

Clustering is a process of grouping objects and data into groups of clusters to ensure that data objects from the same cluster are identical to each other. Clustering algorithms in one of the area in data mining and it can be classified into partition, hierarchical, density based and grid based. Therefore, in this paper we do survey and review four major hierarchical clustering algorithms called CURE, ROCK, CHAMELEON and BIRCH. The obtained state of the art of these algorithms will help in eliminating the current problems as well as deriving more robust and scalable algorithms for clustering.

Keywords: Clustering, method, algorithm, hierarchical, survey.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3389

1081 A Heuristics Approach for Fast Detecting Suspicious Money Laundering Cases in an Investment Bank

Authors: Nhien-An Le-Khac, Sammer Markos, M-Tahar Kechadi

Abstract:

Today, money laundering (ML) poses a serious threat not only to financial institutions but also to the nation. This criminal activity is becoming more and more sophisticated and seems to have moved from the cliché of drug trafficking to financing terrorism and surely not forgetting personal gain. Most international financial institutions have been implementing anti-money laundering solutions (AML) to fight investment fraud. However, traditional investigative techniques consume numerous man-hours. Recently, data mining approaches have been developed and are considered as well-suited techniques for detecting ML activities. Within the scope of a collaboration project for the purpose of developing a new solution for the AML Units in an international investment bank, we proposed a data mining-based solution for AML. In this paper, we present a heuristics approach to improve the performance for this solution. We also show some preliminary results associated with this method on analysing transaction datasets.

Keywords: data mining, anti money laundering, clustering, heuristics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3597

1080 In Cognitive Radio the Analysis of Bit-Error- Rate (BER) by using PSO Algorithm

Authors: Shrikrishan Yadav, Akhilesh Saini, Krishna Chandra Roy

Abstract:

The electromagnetic spectrum is a natural resource and hence well-organized usage of the limited natural resources is the necessities for better communication. The present static frequency allocation schemes cannot accommodate demands of the rapidly increasing number of higher data rate services. Therefore, dynamic usage of the spectrum must be distinguished from the static usage to increase the availability of frequency spectrum. Cognitive radio is not a single piece of apparatus but it is a technology that can incorporate components spread across a network. It offers great promise for improving system efficiency, spectrum utilization, more effective applications, reduction in interference and reduced complexity of usage for users. Cognitive radio is aware of its environmental, internal state, and location, and autonomously adjusts its operations to achieve designed objectives. It first senses its spectral environment over a wide frequency band, and then adapts the parameters to maximize spectrum efficiency with high performance. This paper only focuses on the analysis of Bit-Error-Rate in cognitive radio by using Particle Swarm Optimization Algorithm. It is theoretically as well as practically analyzed and interpreted in the sense of advantages and drawbacks and how BER affects the efficiency and performance of the communication system.

Keywords: BER, Cognitive Radio, Environmental Parameters, PSO, Radio spectrum, Transmission Parameters

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2163