Search results for: incremental and interactive mining.
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 950

Search results for: incremental and interactive mining.

170 Tool for Analysing the Sensitivity and Tolerance of Mechatronic Systems in Matlab GUI

Authors: Bohuslava Juhasova, Martin Juhas, Renata Masarova, Zuzana Sutova

Abstract:

The article deals with the tool in Matlab GUI form that is designed to analyse a mechatronic system sensitivity and tolerance. In the analysed mechatronic system, a torque is transferred from the drive to the load through a coupling containing flexible elements. Different methods of control system design are used. The classic form of the feedback control is proposed using Naslin method, modulus optimum criterion and inverse dynamics method. The cascade form of the control is proposed based on combination of modulus optimum criterion and symmetric optimum criterion. The sensitivity is analysed on the basis of absolute and relative sensitivity of system function to the change of chosen parameter value of the mechatronic system, as well as the control subsystem. The tolerance is analysed in the form of determining the range of allowed relative changes of selected system parameters in the field of system stability. The tool allows to analyse an influence of torsion stiffness, torsion damping, inertia moments of the motor and the load and controller(s) parameters. The sensitivity and tolerance are monitored in terms of the impact of parameter change on the response in the form of system step response and system frequency-response logarithmic characteristics. The Symbolic Math Toolbox for expression of the final shape of analysed system functions was used. The sensitivity and tolerance are graphically represented as 2D graph of sensitivity or tolerance of the system function and 3D/2D static/interactive graph of step/frequency response.

Keywords: Mechatronic systems, Matlab GUI, sensitivity, tolerance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2027
169 Belt Conveyor Dynamics in Transient Operation for Speed Control

Authors: D. He, Y. Pang, G. Lodewijks

Abstract:

Belt conveyors play an important role in continuous dry bulk material transport, especially at the mining industry. Speed control is expected to reduce the energy consumption of belt conveyors. Transient operation is the operation of increasing or decreasing conveyor speed for speed control. According to literature review, current research rarely takes the conveyor dynamics in transient operation into account. However, in belt conveyor speed control, the conveyor dynamic behaviors are significantly important since the poor dynamics might result in risks. In this paper, the potential risks in transient operation will be analyzed. An existing finite element model will be applied to build a conveyor model, and simulations will be carried out to analyze the conveyor dynamics. In order to realize the soft speed regulation, Harrison’s sinusoid acceleration profile will be applied, and Lodewijks estimator will be built to approximate the required acceleration time. A long inclined belt conveyor will be studied with two major simulations. The conveyor dynamics will be given.

Keywords: Belt conveyor, speed control, transient operation, dynamics

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2290
168 Yield Prediction Using Support Vectors Based Under-Sampling in Semiconductor Process

Authors: Sae-Rom Pak, Seung Hwan Park, Jeong Ho Cho, Daewoong An, Cheong-Sool Park, Jun Seok Kim, Jun-Geol Baek

Abstract:

It is important to predict yield in semiconductor test process in order to increase yield. In this study, yield prediction means finding out defective die, wafer or lot effectively. Semiconductor test process consists of some test steps and each test includes various test items. In other world, test data has a big and complicated characteristic. It also is disproportionably distributed as the number of data belonging to FAIL class is extremely low. For yield prediction, general data mining techniques have a limitation without any data preprocessing due to eigen properties of test data. Therefore, this study proposes an under-sampling method using support vector machine (SVM) to eliminate an imbalanced characteristic. For evaluating a performance, randomly under-sampling method is compared with the proposed method using actual semiconductor test data. As a result, sampling method using SVM is effective in generating robust model for yield prediction.

Keywords: Yield Prediction, Semiconductor Test Process, Support Vector Machine, Under Sampling

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2363
167 A Comparison and Analysis of Name Matching Algorithms

Authors: Chakkrit Snae

Abstract:

Names are important in many societies, even in technologically oriented ones which use e.g. ID systems to identify individual people. Names such as surnames are the most important as they are used in many processes, such as identifying of people and genealogical research. On the other hand variation of names can be a major problem for the identification and search for people, e.g. web search or security reasons. Name matching presumes a-priori that the recorded name written in one alphabet reflects the phonetic identity of two samples or some transcription error in copying a previously recorded name. We add to this the lode that the two names imply the same person. This paper describes name variations and some basic description of various name matching algorithms developed to overcome name variation and to find reasonable variants of names which can be used to further increasing mismatches for record linkage and name search. The implementation contains algorithms for computing a range of fuzzy matching based on different types of algorithms, e.g. composite and hybrid methods and allowing us to test and measure algorithms for accuracy. NYSIIS, LIG2 and Phonex have been shown to perform well and provided sufficient flexibility to be included in the linkage/matching process for optimising name searching.

Keywords: Data mining, name matching algorithm, nominaldata, searching system.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 11043
166 Unsupervised Text Mining Approach to Early Warning System

Authors: Ichihan Tai, Bill Olson, Paul Blessner

Abstract:

Traditional early warning systems that alarm against crisis are generally based on structured or numerical data; therefore, a system that can make predictions based on unstructured textual data, an uncorrelated data source, is a great complement to the traditional early warning systems. The Chicago Board Options Exchange (CBOE) Volatility Index (VIX), commonly referred to as the fear index, measures the cost of insurance against market crash, and spikes in the event of crisis. In this study, news data is consumed for prediction of whether there will be a market-wide crisis by predicting the movement of the fear index, and the historical references to similar events are presented in an unsupervised manner. Topic modeling-based prediction and representation are made based on daily news data between 1990 and 2015 from The Wall Street Journal against VIX index data from CBOE.

Keywords: Early Warning System, Knowledge Management, Topic Modeling, Market Prediction.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1890
165 FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule

Authors: Lu Si, Jie Yu, Shasha Li, Jun Ma, Lei Luo, Qingbo Wu, Yongqi Ma, Zhengji Liu

Abstract:

Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical.

Keywords: Instance selection, data reduction, MapReduce, kNN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 981
164 Effects of Alternative Opportunities and Compensation on Turnover Intention of Singapore PMET

Authors: Han Guan Chew, Keith Yong Ngee Ng, Shan-Wei Fan

Abstract:

In Singapore, talent retention is one of the most persistent and real issue companies have to grapple with due to the tight labour market. Being resource-scarce, Singapore depends solely on its talented pool of high quality human resource to sustain its competitive advantage in the global economy. But the complex and multifaceted nature of turnover phenomenon makes the prescription of effective talent retention strategies in such a competitive labour market very challenging, especially when it comes to monetary incentives, companies struggle to answer the question of “How much is enough?” By examining the interactive effects of perceived alternative employment opportunities, annual salary and satisfaction with compensation on the turnover intention of 102 Singapore Professionals, Managers, Executives and Technicians (PMET) through correlation analyses and multiple regressions, important insights into the psyche of the Singapore talent pool can be drawn. It is found that annual salary influence turnover intention indirectly through mediation and moderation effects on PMET’s satisfaction on compensation. PMET are also found to be heavily swayed by better external opportunities. This implies that talent retention strategies should not adopt a purely monetary based blanket approach but rather a comprehensive and holistic one that considers the dynamics of prevailing market conditions.

Keywords: Employee Turnover, High Performers, Knowledge Workers, Perceived Alternative Employment Opportunities Salary, Satisfaction on Compensation, Singapore PMET, Talent Retention.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3497
163 A Pilot Study of Robot Reminiscence in Dementia Care

Authors: Ryuji Yamazaki, Masahiro Kochi, Weiran Zhu, Hiroko Kase

Abstract:

In care for older adults, behavioral and psychological symptoms of dementia (BPSD) like agitation and aggression are distressing for patients and their caretakers, often resulting in premature institutionalization with increased costs of care. To improve mood and mitigate symptoms, as a non-pharmaceutical approach, emotion-oriented therapy like reminiscence work is adopted in face-to-face communication. Telecommunication support is expected to be provided by robotic media as a bridge for digital divide for those with dementia and facilitate social interaction both verbally and nonverbally. The purpose of this case study is to explore the conditions in which robotic media can effectively attract attention from older adults with dementia and promote their well-being. As a pilot study, we introduced the pillow-phone Hugvie®, a huggable humanly shaped communication medium to five residents with dementia at a care facility, to investigate how the following conditions work for the elderly when they use the medium; 1) no sound, 2) radio, non-interactive, 3) daily conversation, and 4) reminiscence work. As a result, under condition 4, reminiscence work, the five participants kept concentration in interacting with the medium for a longer duration than other conditions. In condition 4, they also showed larger amount of utterances than under other conditions. These results indicate that providing topics related to personal histories through robotic media could affect communication positively and should, therefore, be further investigated. In addition, the issue of ethical implications by using persuasive technology that affects emotions and behaviors of older adults is also discussed.

Keywords: BPSD, reminiscence, tactile telecommunication, utterances.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1108
162 Mobile Augmented Reality for Collaboration in Operation

Authors: Chong-Yang Qiao

Abstract:

Mobile augmented reality (MAR) tracking targets from the surroundings and aids operators for interactive data and procedures visualization, potential equipment and system understandably. Operators remotely communicate and coordinate with each other for the continuous tasks, information and data exchange between control room and work-site. In the routine work, distributed control system (DCS) monitoring and work-site manipulation require operators interact in real-time manners. The critical question is the improvement of user experience in cooperative works through applying Augmented Reality in the traditional industrial field. The purpose of this exploratory study is to find the cognitive model for the multiple task performance by MAR. In particular, the focus will be on the comparison between different tasks and environment factors which influence information processing. Three experiments use interface and interaction design, the content of start-up, maintenance and stop embedded in the mobile application. With the evaluation criteria of time demands and human errors, and analysis of the mental process and the behavior action during the multiple tasks, heuristic evaluation was used to find the operators performance with different situation factors, and record the information processing in recognition, interpretation, judgment and reasoning. The research will find the functional properties of MAR and constrain the development of the cognitive model. Conclusions can be drawn that suggest MAR is easy to use and useful for operators in the remote collaborative works.

Keywords: Mobile augmented reality, remote collaboration, user experience, cognitive model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1314
161 Unsupervised Clustering Methods for Identifying Rare Events in Anomaly Detection

Authors: Witcha Chimphlee, Abdul Hanan Abdullah, Mohd Noor Md Sap, Siriporn Chimphlee, Surat Srinoy

Abstract:

It is important problems to increase the detection rates and reduce false positive rates in Intrusion Detection System (IDS). Although preventative techniques such as access control and authentication attempt to prevent intruders, these can fail, and as a second line of defence, intrusion detection has been introduced. Rare events are events that occur very infrequently, detection of rare events is a common problem in many domains. In this paper we propose an intrusion detection method that combines Rough set and Fuzzy Clustering. Rough set has to decrease the amount of data and get rid of redundancy. Fuzzy c-means clustering allow objects to belong to several clusters simultaneously, with different degrees of membership. Our approach allows us to recognize not only known attacks but also to detect suspicious activity that may be the result of a new, unknown attack. The experimental results on Knowledge Discovery and Data Mining-(KDDCup 1999) Dataset show that the method is efficient and practical for intrusion detection systems.

Keywords: Network and security, intrusion detection, fuzzy cmeans, rough set.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2823
160 Semi-Automatic Method to Assist Expert for Association Rules Validation

Authors: Amdouni Hamida, Gammoudi Mohamed Mohsen

Abstract:

In order to help the expert to validate association rules extracted from data, some quality measures are proposed in the literature. We distinguish two categories: objective and subjective measures. The first one depends on a fixed threshold and on data quality from which the rules are extracted. The second one consists on providing to the expert some tools in the objective to explore and visualize rules during the evaluation step. However, the number of extracted rules to validate remains high. Thus, the manually mining rules task is very hard. To solve this problem, we propose, in this paper, a semi-automatic method to assist the expert during the association rule's validation. Our method uses rule-based classification as follow: (i) We transform association rules into classification rules (classifiers), (ii) We use the generated classifiers for data classification. (iii) We visualize association rules with their quality classification to give an idea to the expert and to assist him during validation process.

Keywords: Association rules, Rule-based classification, Classification quality, Validation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1766
159 The Effect of Fine Aggregate Properties on the Fatigue Behavior of the Conventional and Polymer Modified Bituminous Mixtures Using Two Types of Sand as Fine Aggregate

Authors: S. G. Yasreen, N. B. Madzlan, K. Ibrahim

Abstract:

Fatigue cracking continues to be the main challenges in improving the performance of bituminous mixture pavements. The purpose of this paper is to look at some aspects of the effects of fine aggregate properties on the fatigue behaviour of hot mixture asphalt. Two types of sand (quarry and mining sand) with two conventional bitumen (PEN 50/60 & PEN 80/100) and four polymers modified bitumen PMB (PM1_82, PM1_76, PM2_82 and PM2_76) were used. Physical, chemical and mechanical tests were performed on the sands to determine their effect when incorporated with a bituminous mixture. According to the beam fatigue results, quarry sand that has more angularity, rougher, higher shear strength and a higher percentage of Aluminium oxide presented higher resistance to fatigue. Also a PMB mixture gives better fatigue results than conventional mixtures, this is due to the PMB having better viscosity property than that of the conventional bitumen.

Keywords: Beam fatigue test, chemical property, mechanical property, physical property

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2774
158 Experimental and Finite Element Study of Bending Fatigue Failure: A Case Study on Main Shaft of a Gyrator Crusher

Authors: Rahim Sotoudeh Bahreini, Alireza Foroughi Nematollahi, Akbar Jafari

Abstract:

This study investigates the mechanism of a Gyratory crusher-located in Golgohar mining and industrial Co. specifically with a focus on stresses distribution and fatigue failure of its main shaft. At first step, the cross section of the fractured shaft is studied, and the crack growth is analyzed. Then, the rotational motion of the shaft and the oil temperature of oil circuit of equipment are monitored. Condition monitoring is used to help finding a better modification. Based on the results of this study, the main causes of shaft failure are identified, and corrective solution is offered to increase crusher performance, especially its main shaft life. To predict the efficiency of the proposed modification, finite element simulation is performed, and its results are compared with the similar modified cases. The comparison and interpretation of simulation results confirm the efficiency of proposed corrective method.

Keywords: Fatigue failure, finite element method, gyratory crusher, condition monitoring.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1600
157 Human Trafficking: The Kosovar Perspective of Fighting the Phenomena through Police and Civil Society Cooperation

Authors: Samedin Mehmeti

Abstract:

The rationale behind this study is considering combating and preventing the phenomenon of trafficking in human beings from a multidisciplinary perspective that involves many layers of the society. Trafficking in human beings is an abhorrent phenomenon highly affecting negatively the victims and their families in both human and material aspect, sometimes causing irreversible damages. The longer term effects of this phenomenon, in countries with a weak economic development and extremely young and dynamic population, such as Kosovo, without proper measures to prevented and control can cause tremendous damages in the society. Given the fact that a complete eradication of this phenomenon is almost impossible, efforts should be concentrated at least on the prevention and controlling aspects. Treating trafficking in human beings based on traditional police tactics, methods and proceedings cannot bring satisfactory results. There is no doubt that a multi-disciplinary approach is an irreplaceable requirement, in other words, a combination of authentic and functional proactive and reactive methods, techniques and tactics. Obviously, police must exercise its role in preventing and combating trafficking in human beings, a role sanctioned by the law, however, police role and contribution cannot by any means considered complete if all segments of the society are not included in these efforts. Naturally, civil society should have an important share in these collaborative and interactive efforts especially in preventive activities such as: awareness on trafficking risks and damages, proactive engagement in drafting appropriate legislation and strategies, law enforcement monitoring and direct or indirect involvement in protective and supporting activities which benefit the victims of trafficking etc.

Keywords: Civil society, cooperation, police, trafficking in human beings.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1609
156 Feature Selection with Kohonen Self Organizing Classification Algorithm

Authors: Francesco Maiorana

Abstract:

In this paper a one-dimension Self Organizing Map algorithm (SOM) to perform feature selection is presented. The algorithm is based on a first classification of the input dataset on a similarity space. From this classification for each class a set of positive and negative features is computed. This set of features is selected as result of the procedure. The procedure is evaluated on an in-house dataset from a Knowledge Discovery from Text (KDT) application and on a set of publicly available datasets used in international feature selection competitions. These datasets come from KDT applications, drug discovery as well as other applications. The knowledge of the correct classification available for the training and validation datasets is used to optimize the parameters for positive and negative feature extractions. The process becomes feasible for large and sparse datasets, as the ones obtained in KDT applications, by using both compression techniques to store the similarity matrix and speed up techniques of the Kohonen algorithm that take advantage of the sparsity of the input matrix. These improvements make it feasible, by using the grid, the application of the methodology to massive datasets.

Keywords: Clustering algorithm, Data mining, Feature selection, Grid, Kohonen Self Organizing Map.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3025
155 Automatic Extraction of Features and Opinion-Oriented Sentences from Customer Reviews

Authors: Khairullah Khan, Baharum B. Baharudin, Aurangzeb Khan, Fazal_e_Malik

Abstract:

Opinion extraction about products from customer reviews is becoming an interesting area of research. Customer reviews about products are nowadays available from blogs and review sites. Also tools are being developed for extraction of opinion from these reviews to help the user as well merchants to track the most suitable choice of product. Therefore efficient method and techniques are needed to extract opinions from review and blogs. As reviews of products mostly contains discussion about the features, functions and services, therefore, efficient techniques are required to extract user comments about the desired features, functions and services. In this paper we have proposed a novel idea to find features of product from user review in an efficient way. Our focus in this paper is to get the features and opinion-oriented words about products from text through auxiliary verbs (AV) {is, was, are, were, has, have, had}. From the results of our experiments we found that 82% of features and 85% of opinion-oriented sentences include AVs. Thus these AVs are good indicators of features and opinion orientation in customer reviews.

Keywords: Classification, Customer Reviews, Helping Verbs, Opinion Mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2060
154 Applying Kinect on the Development of a Customized 3D Mannequin

Authors: Shih-Wen Hsiao, Rong-Qi Chen

Abstract:

In the field of fashion design, 3D Mannequin is a kind of assisting tool which could rapidly realize the design concepts. While the concept of 3D Mannequin is applied to the computer added fashion design, it will connect with the development and the application of design platform and system. Thus, the situation mentioned above revealed a truth that it is very critical to develop a module of 3D Mannequin which would correspond with the necessity of fashion design. This research proposes a concrete plan that developing and constructing a system of 3D Mannequin with Kinect. In the content, ergonomic measurements of objective human features could be attained real-time through the implement with depth camera of Kinect, and then the mesh morphing can be implemented through transformed the locations of the control-points on the model by inputting those ergonomic data to get an exclusive 3D mannequin model. In the proposed methodology, after the scanned points from the Kinect are revised for accuracy and smoothening, a complete human feature would be reconstructed by the ICP algorithm with the method of image processing. Also, the objective human feature could be recognized to analyze and get real measurements. Furthermore, the data of ergonomic measurements could be applied to shape morphing for the division of 3D Mannequin reconstructed by feature curves. Due to a standardized and customer-oriented 3D Mannequin would be generated by the implement of subdivision, the research could be applied to the fashion design or the presentation and display of 3D virtual clothes. In order to examine the practicality of research structure, a system of 3D Mannequin would be constructed with JAVA program in this study. Through the revision of experiments the practicability-contained research result would come out.

Keywords: 3D Mannequin, kinect scanner, interactive closest point, shape morphing, subdivision.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2035
153 Topic Modeling Using Latent Dirichlet Allocation and Latent Semantic Indexing on South African Telco Twitter Data

Authors: Phumelele P. Kubheka, Pius A. Owolawi, Gbolahan Aiyetoro

Abstract:

Twitter is one of the most popular social media platforms where users share their opinions on different subjects. Twitter can be considered a great source for mining text due to the high volumes of data generated through the platform daily. Many industries such as telecommunication companies can leverage the availability of Twitter data to better understand their markets and make an appropriate business decision. This study performs topic modeling on Twitter data using Latent Dirichlet Allocation (LDA). The obtained results are benchmarked with another topic modeling technique, Latent Semantic Indexing (LSI). The study aims to retrieve topics on a Twitter dataset containing user tweets on South African Telcos. Results from this study show that LSI is much faster than LDA. However, LDA yields better results with higher topic coherence by 8% for the best-performing model in this experiment. A higher topic coherence score indicates better performance of the model.

Keywords: Big data, latent Dirichlet allocation, latent semantic indexing, Telco, topic modeling, Twitter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 406
152 An Intelligent System for Phish Detection, using Dynamic Analysis and Template Matching

Authors: Chinmay Soman, Hrishikesh Pathak, Vishal Shah, Aniket Padhye, Amey Inamdar

Abstract:

Phishing, or stealing of sensitive information on the web, has dealt a major blow to Internet Security in recent times. Most of the existing anti-phishing solutions fail to handle the fuzziness involved in phish detection, thus leading to a large number of false positives. This fuzziness is attributed to the use of highly flexible and at the same time, highly ambiguous HTML language. We introduce a new perspective against phishing, that tries to systematically prove, whether a given page is phished or not, using the corresponding original page as the basis of the comparison. It analyzes the layout of the pages under consideration to determine the percentage distortion between them, indicative of any form of malicious alteration. The system design represents an intelligent system, employing dynamic assessment which accurately identifies brand new phishing attacks and will prove effective in reducing the number of false positives. This framework could potentially be used as a knowledge base, in educating the internet users against phishing.

Keywords: World Wide Web, Phishing, Internet security, data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1800
151 A Distance Function for Data with Missing Values and Its Application

Authors: Loai AbdAllah, Ilan Shimshoni

Abstract:

Missing values in data are common in real world applications. Since the performance of many data mining algorithms depend critically on it being given a good metric over the input space, we decided in this paper to define a distance function for unlabeled datasets with missing values. We use the Bhattacharyya distance, which measures the similarity of two probability distributions, to define our new distance function. According to this distance, the distance between two points without missing attributes values is simply the Mahalanobis distance. When on the other hand there is a missing value of one of the coordinates, the distance is computed according to the distribution of the missing coordinate. Our distance is general and can be used as part of any algorithm that computes the distance between data points. Because its performance depends strongly on the chosen distance measure, we opted for the k nearest neighbor classifier to evaluate its ability to accurately reflect object similarity. We experimented on standard numerical datasets from the UCI repository from different fields. On these datasets we simulated missing values and compared the performance of the kNN classifier using our distance to other three basic methods. Our  experiments show that kNN using our distance function outperforms the kNN using other methods. Moreover, the runtime performance of our method is only slightly higher than the other methods.

Keywords: Missing values, Distance metric, Bhattacharyya distance.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2711
150 Application of Artificial Neural Network to Classification Surface Water Quality

Authors: S. Wechmongkhonkon, N.Poomtong, S. Areerachakul

Abstract:

Water quality is a subject of ongoing concern. Deterioration of water quality has initiated serious management efforts in many countries. This study endeavors to automatically classify water quality. The water quality classes are evaluated using 6 factor indices. These factors are pH value (pH), Dissolved Oxygen (DO), Biochemical Oxygen Demand (BOD), Nitrate Nitrogen (NO3N), Ammonia Nitrogen (NH3N) and Total Coliform (TColiform). The methodology involves applying data mining techniques using multilayer perceptron (MLP) neural network models. The data consisted of 11 sites of canals in Dusit district in Bangkok, Thailand. The data is obtained from the Department of Drainage and Sewerage Bangkok Metropolitan Administration during 2007-2011. The results of multilayer perceptron neural network exhibit a high accuracy multilayer perception rate at 96.52% in classifying the water quality of Dusit district canal in Bangkok Subsequently, this encouraging result could be applied with plan and management source of water quality.

Keywords: artificial neural network, classification, surface water quality

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3178
149 Sustainability Impact Assessment of Construction Ecology to Engineering Systems and Climate Change

Authors: Moustafa Osman Mohammed

Abstract:

Construction industry, as one of the main contributor in depletion of natural resources, influences climate change. This paper discusses incremental and evolutionary development of the proposed models for optimization of a life-cycle analysis to explicit strategy for evaluation systems. The main categories are virtually irresistible for introducing uncertainties, uptake composite structure model (CSM) as environmental management systems (EMSs) in a practice science of evaluation small and medium-sized enterprises (SMEs). The model simplified complex systems to reflect nature systems’ input, output and outcomes mode influence “framework measures” and give a maximum likelihood estimation of how elements are simulated over the composite structure. The traditional knowledge of modeling is based on physical dynamic and static patterns regarding parameters influence environment. It unified methods to demonstrate how construction systems ecology interrelated from management prospective in procedure reflects the effect of the effects of engineering systems to ecology as ultimately unified technologies in extensive range beyond constructions impact so as, - energy systems. Sustainability broadens socioeconomic parameters to practice science that meets recovery performance, engineering reflects the generic control of protective systems. When the environmental model employed properly, management decision process in governments or corporations could address policy for accomplishment strategic plans precisely. The management and engineering limitation focuses on autocatalytic control as a close cellular system to naturally balance anthropogenic insertions or aggregation structure systems to pound equilibrium as steady stable conditions. Thereby, construction systems ecology incorporates engineering and management scheme, as a midpoint stage between biotic and abiotic components to predict constructions impact. The later outcomes’ theory of environmental obligation suggests either a procedures of method or technique that is achieved in sustainability impact of construction system ecology (SICSE), as a relative mitigation measure of deviation control, ultimately.

Keywords: Sustainability, constructions ecology, composite structure model, design structure matrix, environmental impact assessment, life cycle analysis, climate change.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1390
148 Adaptive Network Intrusion Detection Learning: Attribute Selection and Classification

Authors: Dewan Md. Farid, Jerome Darmont, Nouria Harbi, Nguyen Huu Hoa, Mohammad Zahidur Rahman

Abstract:

In this paper, a new learning approach for network intrusion detection using naïve Bayesian classifier and ID3 algorithm is presented, which identifies effective attributes from the training dataset, calculates the conditional probabilities for the best attribute values, and then correctly classifies all the examples of training and testing dataset. Most of the current intrusion detection datasets are dynamic, complex and contain large number of attributes. Some of the attributes may be redundant or contribute little for detection making. It has been successfully tested that significant attribute selection is important to design a real world intrusion detection systems (IDS). The purpose of this study is to identify effective attributes from the training dataset to build a classifier for network intrusion detection using data mining algorithms. The experimental results on KDD99 benchmark intrusion detection dataset demonstrate that this new approach achieves high classification rates and reduce false positives using limited computational resources.

Keywords: Attributes selection, Conditional probabilities, information gain, network intrusion detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2663
147 Bayesian Networks for Earthquake Magnitude Classification in a Early Warning System

Authors: G. Zazzaro, F.M. Pisano, G. Romano

Abstract:

During last decades, worldwide researchers dedicated efforts to develop machine-based seismic Early Warning systems, aiming at reducing the huge human losses and economic damages. The elaboration time of seismic waveforms is to be reduced in order to increase the time interval available for the activation of safety measures. This paper suggests a Data Mining model able to correctly and quickly estimate dangerousness of the running seismic event. Several thousand seismic recordings of Japanese and Italian earthquakes were analyzed and a model was obtained by means of a Bayesian Network (BN), which was tested just over the first recordings of seismic events in order to reduce the decision time and the test results were very satisfactory. The model was integrated within an Early Warning System prototype able to collect and elaborate data from a seismic sensor network, estimate the dangerousness of the running earthquake and take the decision of activating the warning promptly.

Keywords: Bayesian Networks, Decision Support System, Magnitude Classification, Seismic Early Warning System

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3568
146 Classification of Political Affiliations by Reduced Number of Features

Authors: Vesile Evrim, Aliyu Awwal

Abstract:

By the evolvement in technology, the way of expressing opinions switched direction to the digital world. The domain of politics, as one of the hottest topics of opinion mining research, merged together with the behavior analysis for affiliation determination in texts, which constitutes the subject of this paper. This study aims to classify the text in news/blogs either as Republican or Democrat with the minimum number of features. As an initial set, 68 features which 64 were constituted by Linguistic Inquiry and Word Count (LIWC) features were tested against 14 benchmark classification algorithms. In the later experiments, the dimensions of the feature vector reduced based on the 7 feature selection algorithms. The results show that the “Decision Tree”, “Rule Induction” and “M5 Rule” classifiers when used with “SVM” and “IGR” feature selection algorithms performed the best up to 82.5% accuracy on a given dataset. Further tests on a single feature and the linguistic based feature sets showed the similar results. The feature “Function”, as an aggregate feature of the linguistic category, was found as the most differentiating feature among the 68 features with the accuracy of 81% in classifying articles either as Republican or Democrat.

Keywords: Politics, machine learning, feature selection, LIWC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2333
145 Lexical Database for Multiple Languages: Multilingual Word Semantic Network

Authors: K. K. Yong, R. Mahmud, C. S. Woo

Abstract:

Data mining and knowledge engineering have become a tough task due to the availability of large amount of data in the web nowadays. Validity and reliability of data also become a main debate in knowledge acquisition. Besides, acquiring knowledge from different languages has become another concern. There are many language translators and corpora developed but the function of these translators and corpora are usually limited to certain languages and domains. Furthermore, search results from engines with traditional 'keyword' approach are no longer satisfying. More intelligent knowledge engineering agents are needed. To address to these problems, a system known as Multilingual Word Semantic Network is proposed. This system adapted semantic network to organize words according to concepts and relations. The system also uses open source as the development philosophy to enable the native language speakers and experts to contribute their knowledge to the system. The contributed words are then defined and linked using lexical and semantic relations. Thus, related words and derivatives can be identified and linked. From the outcome of the system implementation, it contributes to the development of semantic web and knowledge engineering.

Keywords: Multilingual, semantic network, intelligent knowledge engineering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1928
144 Attacks Classification in Adaptive Intrusion Detection using Decision Tree

Authors: Dewan Md. Farid, Nouria Harbi, Emna Bahri, Mohammad Zahidur Rahman, Chowdhury Mofizur Rahman

Abstract:

Recently, information security has become a key issue in information technology as the number of computer security breaches are exposed to an increasing number of security threats. A variety of intrusion detection systems (IDS) have been employed for protecting computers and networks from malicious network-based or host-based attacks by using traditional statistical methods to new data mining approaches in last decades. However, today's commercially available intrusion detection systems are signature-based that are not capable of detecting unknown attacks. In this paper, we present a new learning algorithm for anomaly based network intrusion detection system using decision tree algorithm that distinguishes attacks from normal behaviors and identifies different types of intrusions. Experimental results on the KDD99 benchmark network intrusion detection dataset demonstrate that the proposed learning algorithm achieved 98% detection rate (DR) in comparison with other existing methods.

Keywords: Detection rate, decision tree, intrusion detectionsystem, network security.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3579
143 Methodology for Quantifying the Meaning of Information in Biological Systems

Authors: Richard L. Summers

Abstract:

The advanced computational analysis of biological systems is becoming increasingly dependent upon an understanding of the information-theoretic structure of the materials, energy and interactive processes that comprise those systems. The stability and survival of these living systems is fundamentally contingent upon their ability to acquire and process the meaning of information concerning the physical state of its biological continuum (biocontinuum). The drive for adaptive system reconciliation of a divergence from steady state within this biocontinuum can be described by an information metric-based formulation of the process for actionable knowledge acquisition that incorporates the axiomatic inference of Kullback-Leibler information minimization driven by survival replicator dynamics. If the mathematical expression of this process is the Lagrangian integrand for any change within the biocontinuum then it can also be considered as an action functional for the living system. In the direct method of Lyapunov, such a summarizing mathematical formulation of global system behavior based on the driving forces of energy currents and constraints within the system can serve as a platform for the analysis of stability. As the system evolves in time in response to biocontinuum perturbations, the summarizing function then conveys information about its overall stability. This stability information portends survival and therefore has absolute existential meaning for the living system. The first derivative of the Lyapunov energy information function will have a negative trajectory toward a system steady state if the driving force is dissipating. By contrast, system instability leading to system dissolution will have a positive trajectory. The direction and magnitude of the vector for the trajectory then serves as a quantifiable signature of the meaning associated with the living system’s stability information, homeostasis and survival potential.

Keywords: Semiotic meaning, Shannon information, Lyapunov, living systems.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 460
142 Decision Trees for Predicting Risk of Mortality using Routinely Collected Data

Authors: Tessy Badriyah, Jim S. Briggs, Dave R. Prytherch

Abstract:

It is well known that Logistic Regression is the gold standard method for predicting clinical outcome, especially predicting risk of mortality. In this paper, the Decision Tree method has been proposed to solve specific problems that commonly use Logistic Regression as a solution. The Biochemistry and Haematology Outcome Model (BHOM) dataset obtained from Portsmouth NHS Hospital from 1 January to 31 December 2001 was divided into four subsets. One subset of training data was used to generate a model, and the model obtained was then applied to three testing datasets. The performance of each model from both methods was then compared using calibration (the χ2 test or chi-test) and discrimination (area under ROC curve or c-index). The experiment presented that both methods have reasonable results in the case of the c-index. However, in some cases the calibration value (χ2) obtained quite a high result. After conducting experiments and investigating the advantages and disadvantages of each method, we can conclude that Decision Trees can be seen as a worthy alternative to Logistic Regression in the area of Data Mining.

Keywords: Decision Trees, Logistic Regression, clinical outcome, risk of mortality.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2489
141 The Development of the Multi-Agent Classification System (MACS) in Compliance with FIPA Specifications

Authors: Mohamed R. Mhereeg

Abstract:

The paper investigates the feasibility of constructing a software multi-agent based monitoring and classification system and utilizing it to provide an automated and accurate classification of end users developing applications in the spreadsheet domain. The agents function autonomously to provide continuous and periodic monitoring of excels spreadsheet workbooks. Resulting in, the development of the MultiAgent classification System (MACS) that is in compliance with the specifications of the Foundation for Intelligent Physical Agents (FIPA). However, different technologies have been brought together to build MACS. The strength of the system is the integration of the agent technology with the FIPA specifications together with other technologies that are Windows Communication Foundation (WCF) services, Service Oriented Architecture (SOA), and Oracle Data Mining (ODM). The Microsoft's .NET widows service based agents were utilized to develop the monitoring agents of MACS, the .NET WCF services together with SOA approach allowed the distribution and communication between agents over the WWW that is in order to satisfy the monitoring and classification of the multiple developer aspect. ODM was used to automate the classification phase of MACS.

Keywords: Autonomous, Classification, MACS, Multi-Agent, SOA, WCF.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1555