Search results for: syntactic tree.
178 Customer Churn Prediction Using Four Machine Learning Algorithms Integrating Feature Selection and Normalization in the Telecom Sector
Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh
Abstract:
A crucial part of maintaining a customer-oriented business in the telecommunications industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years, which has made it more important to understand customers’ needs in this strong market. For those who are looking to turn over their service providers, understanding their needs is especially important. Predictive churn is now a mandatory requirement for retaining customers in the telecommunications industry. Machine learning can be used to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.
Keywords: Machine Learning, Gradient Boosting, Logistic Regression, Churn, Random Forest, Decision Tree, ROC, AUC, F1-score.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 408177 Development of an Ensemble Classification Model Based on Hybrid Filter-Wrapper Feature Selection for Email Phishing Detection
Authors: R. B. Ibrahim, M. S. Argungu, I. M. Mungadi
Abstract:
It is obvious in this present time, internet has become an indispensable part of human life since its inception. The Internet has provided diverse opportunities to make life so easy for human beings, through the adoption of various channels. Among these channels are email, internet banking, video conferencing, and the like. Email is one of the easiest means of communication hugely accepted among individuals and organizations globally. But over decades the security integrity of this platform has been challenged with malicious activities like Phishing. Email phishing is designed by phishers to fool the recipient into handing over sensitive personal information such as passwords, credit card numbers, account credentials, social security numbers, etc. This activity has caused a lot of financial damage to email users globally which has resulted in bankruptcy, sudden death of victims, and other health-related sicknesses. Although many methods have been proposed to detect email phishing, in this research, the results of multiple machine-learning methods for predicting email phishing have been compared with the use of filter-wrapper feature selection. It is worth noting that all three models performed substantially but one outperformed the other. The dataset used for these models is obtained from Kaggle online data repository, while three classifiers: decision tree, Naïve Bayes, and Logistic regression are ensemble (Bagging) respectively. Results from the study show that the Decision Tree (CART) bagging ensemble recorded the highest accuracy of 98.13% using PEF (Phishing Essential Features). This result further demonstrates the dependability of the proposed model.
Keywords: Ensemble, hybrid, filter-wrapper, phishing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 178176 Case Study Analysis of 2017 European Railway Traffic Management Incident: The Application of System for Investigation of Railway Interfaces Methodology
Authors: Sanjeev Kumar Appicharla
Abstract:
This paper presents the results of the modelling and analysis of the European Railway Traffic Management (ERTMS) safety critical incident to raise awareness of biases in systems engineering process on the Cambrian Railway in the UK using the RAIB 17/2019 as a primary input. The RAIB, the UK independent accident investigator, published the Report- RAIB 17/2019 giving the details of their investigation of the focal event in the form of immediate cause, causal factors and underlying factors and recommendations to prevent a repeat of the safety-critical incident on the Cambrian Line. The Systems for Investigation of Railway Interfaces (SIRI) is the Methodology used to model and analyse the safety-critical incident. The SIRI Methodology uses the Swiss Cheese Model to model the incident and identify latent failure conditions (potentially less than adequate conditions) by means of the Management Oversight and Risk Tree technique. The benefits of the SIRI Methodology are threefold: first is that it incorporates “Heuristics and Biases” approach, in the Management Oversight and Risk Tree technique to identify systematic errors. Civil engineering and programme management railway professionals are aware of role “optimism bias” plays in programme cost overruns and are aware of bow tie (fault and event tree) model-based safety risk modelling technique. However, the role of systematic errors due to “Heuristics and Biases” is not appreciated as yet. This overcomes the problems of omission of human and organisational factors from accident analysis. Second, the scope of the investigation includes all levels of the socio-technical system, including government, regulatory, railway safety bodies, duty holders, signalling firms and transport planners, and front-line staff such that lessons learned at the decision making and implementation level as well. Third, the author’s past accident case studies are supplemented with research pieces of evidence drawn from the practitioner’s and academic researchers’ publications as well. This is to discuss the role of system thinking to improve the decision making and risk management processes and practices in the IEC 15288 Systems Engineering standard, and in the industrial context such as the GB railways and Artificial Intelligence (AI) contexts as well.
Keywords: Accident analysis, AI algorithm internal audit, bounded rationality, Byzantine failures, heuristics and biases approach.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 380175 Word Stemming Algorithms and Retrieval Effectiveness in Malay and Arabic Documents Retrieval Systems
Authors: Tengku Mohd T. Sembok
Abstract:
Documents retrieval in Information Retrieval Systems (IRS) is generally about understanding of information in the documents concern. The more the system able to understand the contents of documents the more effective will be the retrieval outcomes. But understanding of the contents is a very complex task. Conventional IRS apply algorithms that can only approximate the meaning of document contents through keywords approach using vector space model. Keywords may be unstemmed or stemmed. When keywords are stemmed and conflated in retrieving process, we are a step forwards in applying semantic technology in IRS. Word stemming is a process in morphological analysis under natural language processing, before syntactic and semantic analysis. We have developed algorithms for Malay and Arabic and incorporated stemming in our experimental systems in order to measure retrieval effectiveness. The results have shown that the retrieval effectiveness has increased when stemming is used in the systems.Keywords: Information Retrieval, Natural Language Processing, Artificial Intelligence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2258174 Minimizing Mutant Sets by Equivalence and Subsumption
Authors: Samia Alblwi, Amani Ayad
Abstract:
Mutation testing is the art of generating syntactic variations of a base program and checking whether a candidate test suite can identify all the mutants that are not semantically equivalent to the base; this technique can be used to assess the quality of test suite. One of the main obstacles to the widespread use of mutation testing is cost, as even small programs (a few dozen lines of code) can give rise to a large number of mutants (up to hundreds); this has created an incentive to seek to reduce the number of mutants while preserving their collective effectiveness. Two criteria have been used to reduce the size of mutant sets: equivalence, which aims to partition the set of mutants into equivalence classes modulo semantic equivalence, and selecting one representative per class; and, subsumption, which aims to define a partial ordering among mutants that ranks mutants by effectiveness and seeks to select maximal elements in this ordering. In this paper, we analyze these two policies using analytical and empirical criteria.
Keywords: Mutation testing, mutant sets, mutant equivalence, mutant subsumption, mutant set minimization.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 193173 Some New Bounds for a Real Power of the Normalized Laplacian Eigenvalues
Authors: Ayşe Dilek Maden
Abstract:
For a given a simple connected graph, we present some new bounds via a new approach for a special topological index given by the sum of the real number power of the non-zero normalized Laplacian eigenvalues. To use this approach presents an advantage not only to derive old and new bounds on this topic but also gives an idea how some previous results in similar area can be developed.
Keywords: Degree Kirchhoff index, normalized Laplacian eigenvalue, spanning tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2201172 Integrating Context Priors into a Decision Tree Classification Scheme
Authors: Kasim Terzic, Bernd Neumann
Abstract:
Scene interpretation systems need to match (often ambiguous) low-level input data to concepts from a high-level ontology. In many domains, these decisions are uncertain and benefit greatly from proper context. This paper demonstrates the use of decision trees for estimating class probabilities for regions described by feature vectors, and shows how context can be introduced in order to improve the matching performance.Keywords: Classification, Decision Trees, Interpretation, Vision
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1300171 PIELG: A Protein Interaction Extraction Systemusing a Link Grammar Parser from Biomedical Abstracts
Authors: Rania A. Abul Seoud, Nahed H. Solouma, Abou-Baker M. Youssef, Yasser M. Kadah
Abstract:
Due to the ever growing amount of publications about protein-protein interactions, information extraction from text is increasingly recognized as one of crucial technologies in bioinformatics. This paper presents a Protein Interaction Extraction System using a Link Grammar Parser from biomedical abstracts (PIELG). PIELG uses linkage given by the Link Grammar Parser to start a case based analysis of contents of various syntactic roles as well as their linguistically significant and meaningful combinations. The system uses phrasal-prepositional verbs patterns to overcome preposition combinations problems. The recall and precision are 74.4% and 62.65%, respectively. Experimental evaluations with two other state-of-the-art extraction systems indicate that PIELG system achieves better performance. For further evaluation, the system is augmented with a graphical package (Cytoscape) for extracting protein interaction information from sequence databases. The result shows that the performance is remarkably promising.Keywords: Link Grammar Parser, Interaction extraction, protein-protein interaction, Natural language processing.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2254170 A New Distribution Network Reconfiguration Approach using a Tree Model
Authors: E. Dolatdar, S. Soleymani, B. Mozafari
Abstract:
Power loss reduction is one of the main targets in power industry and so in this paper, the problem of finding the optimal configuration of a radial distribution system for loss reduction is considered. Optimal reconfiguration involves the selection of the best set of branches to be opened ,one each from each loop, for reducing resistive line losses , and reliving overloads on feeders by shifting the load to adjacent feeders. However ,since there are many candidate switching combinations in the system ,the feeder reconfiguration is a complicated problem. In this paper a new approach is proposed based on a simple optimum loss calculation by determining optimal trees of the given network. From graph theory a distribution network can be represented with a graph that consists a set of nodes and branches. In fact this problem can be viewed as a problem of determining an optimal tree of the graph which simultaneously ensure radial structure of each candidate topology .In this method the refined genetic algorithm is also set up and some improvements of algorithm are made on chromosome coding. In this paper an implementation of the algorithm presented by [7] is applied by modifying in load flow program and a comparison of this method with the proposed method is employed. In [7] an algorithm is proposed that the choice of the switches to be opened is based on simple heuristic rules. This algorithm reduce the number of load flow runs and also reduce the switching combinations to a fewer number and gives the optimum solution. To demonstrate the validity of these methods computer simulations with PSAT and MATLAB programs are carried out on 33-bus test system. The results show that the performance of the proposed method is better than [7] method and also other methods.
Keywords: Distribution System, Reconfiguration, Loss Reduction , Graph Theory , Optimization , Genetic Algorithm
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3782169 Prime Cordial Labeling on Graphs
Authors: S. Babitha, J. Baskar Babujee
Abstract:
A prime cordial labeling of a graph G with vertex set V is a bijection f from V to {1, 2, ..., |V |} such that each edge uv is assigned the label 1 if gcd(f(u), f(v)) = 1 and 0 if gcd(f(u), f(v)) > 1, then the number of edges labeled with 0 and the number of edges labeled with 1 differ by at most 1. In this paper we exhibit some characterization results and new constructions on prime cordial graphs.
Keywords: Prime cordial, tree, Euler, bijective, function.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3576168 Comparing and Combining the Axial with the Network Maps for Analyzing Urban Street Pattern
Authors: Nophaket Napong
Abstract:
Rooted in the study of social functioning of space in architecture, Space Syntax (SS) and the more recent Network Pattern (NP) researches demonstrate the 'spatial structures' of city, i.e. the hierarchical patterns of streets, junctions and alley ends. Applying SS and NP models, planners can conceptualize the real city-s patterns. Although, both models yield the optimal path of the city their underpinning displays of the city-s spatial configuration differ. The Axial Map analyzes the topological non-distance-based connectivity structure, whereas, the Central-Node Map and the Shortcut-Path Map, in contrast, analyze the metrical distance-based structures. This research contrasts and combines them to understand various forms of city-s structures. It concludes that, while they reveal different spatial structures, Space Syntax and Network Pattern urban models support each the other. Combining together they simulate the global access and the locally compact structures namely the central nodes and the shortcuts for the city.
Keywords: Street pattern, space syntax, syntactic and metrical models, network pattern models.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1460167 Load Forecasting in Microgrid Systems with R and Cortana Intelligence Suite
Authors: F. Lazzeri, I. Reiter
Abstract:
Energy production optimization has been traditionally very important for utilities in order to improve resource consumption. However, load forecasting is a challenging task, as there are a large number of relevant variables that must be considered, and several strategies have been used to deal with this complex problem. This is especially true also in microgrids where many elements have to adjust their performance depending on the future generation and consumption conditions. The goal of this paper is to present a solution for short-term load forecasting in microgrids, based on three machine learning experiments developed in R and web services built and deployed with different components of Cortana Intelligence Suite: Azure Machine Learning, a fully managed cloud service that enables to easily build, deploy, and share predictive analytics solutions; SQL database, a Microsoft database service for app developers; and PowerBI, a suite of business analytics tools to analyze data and share insights. Our results show that Boosted Decision Tree and Fast Forest Quantile regression methods can be very useful to predict hourly short-term consumption in microgrids; moreover, we found that for these types of forecasting models, weather data (temperature, wind, humidity and dew point) can play a crucial role in improving the accuracy of the forecasting solution. Data cleaning and feature engineering methods performed in R and different types of machine learning algorithms (Boosted Decision Tree, Fast Forest Quantile and ARIMA) will be presented, and results and performance metrics discussed.
Keywords: Time-series, features engineering methods for forecasting, energy demand forecasting, Azure machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1290166 Calculation of a Sustainable Quota Harvesting of Long-Tailed Macaque (Macaca fascicularis Raffles) in Their Natural Habitats
Authors: Y. Santosa, D. A. Rahman, C. Wulan, A. H. Mustari
Abstract:
The global demand for long-tailed macaques for medical experimentation has continued to increase. Fulfillment of Indonesian export demands has been mostly from natural habitats, based on a harvesting quota. This quota has been determined according to the total catch for a given year, and not based on consideration of any demographic parameters or physical environmental factors with regard to the animal; hence threatening the sustainability of the various populations. It is therefore necessary to formulate a method for calculating a sustainable harvesting quota, based on population parameters in natural habitats. Considering the possibility of variations in habitat characteristics and population parameters, a time series observation of demographic and physical/biotic parameters, in various habitats, was performed on 13 groups of long-tailed macaques, distributed throughout the West Java, Lampung and Yogyakarta areas of Indonesia. These provinces were selected for comparison of the influence of human/tourism activities. Data on population parameters that was collected included data on life expectancy according to age class, numbers of individuals by sex and age class, and ‘ratio of infants to reproductive females’. The estimation of population growth was based on a population dynamic growth model: the Leslie matrix. The harvesting quota was calculated as being the difference between the actual population size and the MVP (minimum viable population) for each sex and age class. Observation indicated that there were variations within group size (24–106 individuals), gender (sex) ratio (1:1 to 1:1.3), life expectancy value (0.30 to 0.93), and ‘ratio of infants to reproductive females’ (0.23 to 1.56). Results of subsequent calculations showed that sustainable harvesting quotas for each studied group of long-tailed macaques, ranged from 29 to 110 individuals. An estimation model of the MVP for each age class was formulated as Log Y = 0.315 + 0.884 Log Ni (number of individual on ith age class). This study also found that life expectancy for the juvenile age class was affected by the humidity under tree stands, and dietary plants’ density at sapling, pole and tree stages (equation: Y=2.296 – 1.535 RH + 0.002 Kpcg – 0.002 Ktg – 0.001 Kphn, R2 = 89.6% with a significance value of 0.001). By contrast, for the sub-adult-adult age class, life expectancy was significantly affected by slope (equation: Y=0.377 = 0.012 Kml, R2 = 50.4%, with significance level of 0.007). The infant-toreproductive- female ratio was affected by humidity under tree stands, and dietary plant density at sapling and pole stages (equation: Y = - 1.432 + 2.172 RH – 0.004 Kpcg + 0.003 Ktg, R2 = 82.0% with significance level of 0.001). This research confirmed the importance of population parameters in determining the minimum viable population, and that MVP varied according to habitat characteristics (especially food availability). It would be difficult therefore, to formulate a general mathematical equation model for determining a harvesting quota for the species as a whole.Keywords: Harvesting, long-tailed macaque, population, quota.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2014165 Ontology Population via NLP Techniques in Risk Management
Authors: Jawad Makki, Anne-Marie Alquier, Violaine Prince
Abstract:
In this paper we propose an NLP-based method for Ontology Population from texts and apply it to semi automatic instantiate a Generic Knowledge Base (Generic Domain Ontology) in the risk management domain. The approach is semi-automatic and uses a domain expert intervention for validation. The proposed approach relies on a set of Instances Recognition Rules based on syntactic structures, and on the predicative power of verbs in the instantiation process. It is not domain dependent since it heavily relies on linguistic knowledge. A description of an experiment performed on a part of the ontology of the PRIMA1 project (supported by the European community) is given. A first validation of the method is done by populating this ontology with Chemical Fact Sheets from Environmental Protection Agency2. The results of this experiment complete the paper and support the hypothesis that relying on the predicative power of verbs in the instantiation process improves the performance.Keywords: Information Extraction, Instance Recognition Rules, Ontology Population, Risk Management, Semantic analysis.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1535164 Rolling Element Bearing Diagnosis by Improved Envelope Spectrum: Optimal Frequency Band Selection
Authors: Juan David Arango, Alejandro Restrepo-Martinez
Abstract:
The Rolling Element Bearing (REB) vibration diagnosis is worth of special interest by the variety of REB and the wide necessity of those elements in industrial applications. The presence of a localized fault in a REB gives rise to a vibrational response, characterized by the modulation of a carrier signal. Frequency content of carrier signal (Spectral Frequency –f) is mainly related to resonance frequencies of the REB. This carrier signal is modulated by another signal, governed by the periodicity of the fault impact (Cyclic Frequency –α). In this sense, REB fault vibration response gives rise to a second-order cyclostationary signal. Second order cyclostationary signals could be represented in a bi-spectral map, where Spectral Coherence –SCoh are plotted against f and α. The Improved Envelope Spectrum –IES, is a useful approach to execute REB fault diagnosis. IES could be applied by the integration of SCoh over a predefined bandwidth on the f axis. Approaches to select f-bandwidth have been recently exposed by the definition of a metric which intends to evaluate the magnitude of the IES at the fault characteristics frequencies. This metric is represented in a 1/3-binary tree as a function of the frequency bandwidth and centre. Based on this binary tree the optimal frequency band is selected. However, some advantages have been seen if the metric is changed, which in fact tends to dictate different optimal f-bandwidth and so improve the IES representation. This paper evaluates the behaviour of the IES from a different metric optimization. This metric is based on the sample correlation coefficient, detecting high peaks in the selected frequencies while penalizing high peaks in the neighbours of the selected frequencies. Prior results indicate an improvement on the signal-noise ratio (SNR) on around 86% of samples analysed, which belong to IMS database.
Keywords: Sample Correlation IESFOgram, cyclostationary analysis, improved envelope spectrum, IES, rolling element bearing diagnosis, spectral coherence.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 742163 Distributed Data-Mining by Probability-Based Patterns
Authors: M. Kargar, F. Gharbalchi
Abstract:
In this paper a new method is suggested for distributed data-mining by the probability patterns. These patterns use decision trees and decision graphs. The patterns are cared to be valid, novel, useful, and understandable. Considering a set of functions, the system reaches to a good pattern or better objectives. By using the suggested method we will be able to extract the useful information from massive and multi-relational data bases.Keywords: Data-mining, Decision tree, Decision graph, Pattern, Relationship.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1555162 Project Selection by Using Fuzzy AHP and TOPSIS Technique
Authors: S. Mahmoodzadeh, J. Shahrabi, M. Pariazar, M. S. Zaeri
Abstract:
In this article, by using fuzzy AHP and TOPSIS technique we propose a new method for project selection problem. After reviewing four common methods of comparing alternatives investment (net present value, rate of return, benefit cost analysis and payback period) we use them as criteria in AHP tree. In this methodology by utilizing improved Analytical Hierarchy Process by Fuzzy set theory, first we try to calculate weight of each criterion. Then by implementing TOPSIS algorithm, assessment of projects has been done. Obtained results have been tested in a numerical example.Keywords: Fuzzy AHP, Project Selection, TOPSIS Technique.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 6600161 Interoperability in Component Based Software Development
Authors: M. Madiajagan, B. Vijayakumar
Abstract:
The ability of information systems to operate in conjunction with each other encompassing communication protocols, hardware, software, application, and data compatibility layers. There has been considerable work in industry on the development of component interoperability models, such as CORBA, (D)COM and JavaBeans. These models are intended to reduce the complexity of software development and to facilitate reuse of off-the-shelf components. The focus of these models is syntactic interface specification, component packaging, inter-component communications, and bindings to a runtime environment. What these models lack is a consideration of architectural concerns – specifying systems of communicating components, explicitly representing loci of component interaction, and exploiting architectural styles that provide well-understood global design solutions. The development of complex business applications is now focused on an assembly of components available on a local area network or on the net. These components must be localized and identified in terms of available services and communication protocol before any request. The first part of the article introduces the base concepts of components and middleware while the following sections describe the different up-todate models of communication and interaction and the last section shows how different models can communicate among themselves.
Keywords: Interoperability, component packaging, communication technology, heterogeneous platform, component interface, middleware.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2787160 The Diameter of an Interval Graph is Twice of its Radius
Authors: Tarasankar Pramanik, Sukumar Mondal, Madhumangal Pal
Abstract:
In an interval graph G = (V,E) the distance between two vertices u, v is de£ned as the smallest number of edges in a path joining u and v. The eccentricity of a vertex v is the maximum among distances from all other vertices of V . The diameter (δ) and radius (ρ) of the graph G is respectively the maximum and minimum among all the eccentricities of G. The center of the graph G is the set C(G) of vertices with eccentricity ρ. In this context our aim is to establish the relation ρ = δ 2 for an interval graph and to determine the center of it.
Keywords: Interval graph, interval tree, radius, center.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1643159 Mining Sequential Patterns Using I-PrefixSpan
Authors: Dhany Saputra, Dayang R. A. Rambli, Oi Mean Foong
Abstract:
In this paper, we propose an improvement of pattern growth-based PrefixSpan algorithm, called I-PrefixSpan. The general idea of I-PrefixSpan is to use sufficient data structure for Seq-Tree framework and separator database to reduce the execution time and memory usage. Thus, with I-PrefixSpan there is no in-memory database stored after index set is constructed. The experimental result shows that using Java 2, this method improves the speed of PrefixSpan up to almost two orders of magnitude as well as the memory usage to more than one order of magnitude.Keywords: ArrayList, ArrayIntList, minimum support, sequence database, sequential patterns.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1564158 Using Interval Trees for Approximate Indexing of Instances
Authors: Khalil el Hindi
Abstract:
This paper presents a simple and effective method for approximate indexing of instances for instance based learning. The method uses an interval tree to determine a good starting search point for the nearest neighbor. The search stops when an early stopping criterion is met. The method proved to be very effective especially when only the first nearest neighbor is required.
Keywords: Instance based learning, interval trees, the knn algorithm, machine learning.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1510157 A Knowledge Engineering Workshop: Application for Choise Car
Authors: Touahria Mohamed, Khababa Abdallah, Frécon Louis
Abstract:
This paper proposes a declarative language for knowledge representation (Ibn Rochd), and its environment of exploitation (DeGSE). This DeGSE system was designed and developed to facilitate Ibn Rochd writing applications. The system was tested on several knowledge bases by ascending complexity, culminating in a system for recognition of a plant or a tree, and advisors to purchase a car, for pedagogical and academic guidance, or for bank savings and credit. Finally, the limits of the language and research perspectives are stated.Keywords: Knowledge representation, declarative language, IbnRochd, DeGSE, facets, cognitive approach.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1328156 Tibyan Automated Arabic Correction Using Machine-Learning in Detecting Syntactical Mistakes
Authors: Ashwag O. Maghraby, Nida N. Khan, Hosnia A. Ahmed, Ghufran N. Brohi, Hind F. Assouli, Jawaher S. Melibari
Abstract:
The Arabic language is one of the most important languages. Learning it is so important for many people around the world because of its religious and economic importance and the real challenge lies in practicing it without grammatical or syntactical mistakes. This research focused on detecting and correcting the syntactic mistakes of Arabic syntax according to their position in the sentence and focused on two of the main syntactical rules in Arabic: Dual and Plural. It analyzes each sentence in the text, using Stanford CoreNLP morphological analyzer and machine-learning approach in order to detect the syntactical mistakes and then correct it. A prototype of the proposed system was implemented and evaluated. It uses support vector machine (SVM) algorithm to detect Arabic grammatical errors and correct them using the rule-based approach. The prototype system has a far accuracy 81%. In general, it shows a set of useful grammatical suggestions that the user may forget about while writing due to lack of familiarity with grammar or as a result of the speed of writing such as alerting the user when using a plural term to indicate one person.
Keywords: Arabic Language acquisition and learning, natural language processing, morphological analyzer, part-of-speech.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1045155 Clustering Categorical Data Using Hierarchies (CLUCDUH)
Authors: Gökhan Silahtaroğlu
Abstract:
Clustering large populations is an important problem when the data contain noise and different shapes. A good clustering algorithm or approach should be efficient enough to detect clusters sensitively. Besides space complexity, time complexity also gains importance as the size grows. Using hierarchies we developed a new algorithm to split attributes according to the values they have and choosing the dimension for splitting so as to divide the database roughly into equal parts as much as possible. At each node we calculate some certain descriptive statistical features of the data which reside and by pruning we generate the natural clusters with a complexity of O(n).Keywords: Clustering, tree, split, pruning, entropy, gini.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1556154 Extraction of Semantic Digital Signatures from MRI Photos for Image-Identification Purposes
Authors: Marios Poulos, George Bokos
Abstract:
This paper makes an attempt to solve the problem of searching and retrieving of similar MRI photos via Internet services using morphological features which are sourced via the original image. This study is aiming to be considered as an additional tool of searching and retrieve methods. Until now the main way of the searching mechanism is based on the syntactic way using keywords. The technique it proposes aims to serve the new requirements of libraries. One of these is the development of computational tools for the control and preservation of the intellectual property of digital objects, and especially of digital images. For this purpose, this paper proposes the use of a serial number extracted by using a previously tested semantic properties method. This method, with its center being the multi-layers of a set of arithmetic points, assures the following two properties: the uniqueness of the final extracted number and the semantic dependence of this number on the image used as the method-s input. The major advantage of this method is that it can control the authentication of a published image or its partial modification to a reliable degree. Also, it acquires the better of the known Hash functions that the digital signature schemes use and produces alphanumeric strings for cases of authentication checking, and the degree of similarity between an unknown image and an original image.Keywords: Computational Geometry, MRI photos, Image processing, pattern Recognition.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1521153 Tool for Fast Detection of Java Code Snippets
Authors: Tomáš Bublík, Miroslav Virius
Abstract:
This paper presents general results on the Java source code snippet detection problem. We propose the tool which uses graph and subgraph isomorphism detection. A number of solutions for all of these tasks have been proposed in the literature. However, although that all these solutions are really fast, they compare just the constant static trees. Our solution offers to enter an input sample dynamically with the Scripthon language while preserving an acceptable speed. We used several optimizations to achieve very low number of comparisons during the matching algorithm.
Keywords: AST, Java, tree matching, Scripthon, source code recognition
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1958152 An Energy Efficient Algorithm for Distributed Mutual Exclusion in Mobile Ad-hoc Networks
Authors: Sayani Sil, Sukanta Das
Abstract:
This paper reports a distributed mutual exclusion algorithm for mobile Ad-hoc networks. The network is clustered hierarchically. The proposed algorithm considers the clustered network as a logical tree and develops a token passing scheme to get the mutual exclusion. The performance analysis and simulation results show that its message requirement is optimal, and thus the algorithm is energy efficient.Keywords: Critical section, Distributed mutual exclusion, MobileAd-hoc network, Token-based algorithms.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1751151 N-Sun Decomposition of Complete, Complete Bipartite and Some Harary Graphs
Authors: R. Anitha, R. S. Lekshmi
Abstract:
Graph decompositions are vital in the study of combinatorial design theory. A decomposition of a graph G is a partition of its edge set. An n-sun graph is a cycle Cn with an edge terminating in a vertex of degree one attached to each vertex. In this paper, we define n-sun decomposition of some even order graphs with a perfect matching. We have proved that the complete graph K2n, complete bipartite graph K2n, 2n and the Harary graph H4, 2n have n-sun decompositions. A labeling scheme is used to construct the n-suns.Keywords: Decomposition, Hamilton cycle, n-sun graph, perfect matching, spanning tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2396150 Detecting Email Forgery using Random Forests and Naïve Bayes Classifiers
Authors: Emad E Abdallah, A.F. Otoom, ArwaSaqer, Ola Abu-Aisheh, Diana Omari, Ghadeer Salem
Abstract:
As emails communications have no consistent authentication procedure to ensure the authenticity, we present an investigation analysis approach for detecting forged emails based on Random Forests and Naïve Bays classifiers. Instead of investigating the email headers, we use the body content to extract a unique writing style for all the possible suspects. Our approach consists of four main steps: (1) The cybercrime investigator extract different effective features including structural, lexical, linguistic, and syntactic evidence from previous emails for all the possible suspects, (2) The extracted features vectors are normalized to increase the accuracy rate. (3) The normalized features are then used to train the learning engine, (4) upon receiving the anonymous email (M); we apply the feature extraction process to produce a feature vector. Finally, using the machine learning classifiers the email is assigned to one of the suspects- whose writing style closely matches M. Experimental results on real data sets show the improved performance of the proposed method and the ability of identifying the authors with a very limited number of features.Keywords: Digital investigation, cybercrimes, emails forensics, anonymous emails, writing style, and authorship analysis
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 5254149 Approximately Similarity Measurement of Web Sites Using Genetic Algorithms and Binary Trees
Authors: Doru Anastasiu Popescu, Dan Rădulescu
Abstract:
In this paper, we determine the similarity of two HTML web applications. We are going to use a genetic algorithm in order to determine the most significant web pages of each application (we are not going to use every web page of a site). Using these significant web pages, we will find the similarity value between the two applications. The algorithm is going to be efficient because we are going to use a reduced number of web pages for comparisons but it will return an approximate value of the similarity. The binary trees are used to keep the tags from the significant pages. The algorithm was implemented in Java language.
Keywords: Tag, HTML, web page, genetic algorithm, similarity value, binary tree.
Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1309