Search results for: neighbor effect

4672 Feature Extraction Technique for Prediction the Antigenic Variants of the Influenza Virus

Authors: Majid Forghani, Michael Khachay

Abstract:

In genetics, the impact of neighboring amino acids on a target site is referred as the nearest-neighbor effect or simply neighbor effect. In this paper, a new method called wavelet particle decomposition representing the one-dimensional neighbor effect using wavelet packet decomposition is proposed. The main idea lies in known dependence of wavelet packet sub-bands on location and order of neighboring samples. The method decomposes the value of a signal sample into small values called particles that represent a part of the neighbor effect information. The results have shown that the information obtained from the particle decomposition can be used to create better model variables or features. As an example, the approach has been applied to improve the correlation of test and reference sequence distance with titer in the hemagglutination inhibition assay.

Keywords: Antigenic variants, neighbor effect, wavelet packet, wavelet particle decomposition.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 760

4671 Determination of Neighbor Node in Consideration of the Imaging Range of Cameras in Automatic Human Tracking System

Authors: Kozo Tanigawa, Tappei Yotsumoto, Kenichi Takahashi, Takao Kawamura, Kazunori Sugahara

Abstract:

A automatic human tracking system using mobile agent technology is realized because a mobile agent moves in accordance with a migration of a target person. In this paper, we propose a method for determining the neighbor node in consideration of the imaging range of cameras.

Keywords: Human tracking, Mobile agent, Pan/Tilt/Zoom, Neighbor relation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1658

4670 Multilevel Classifiers in Recognition of Handwritten Kannada Numerals

Authors: Dinesh Acharya U., N. V. Subba Reddy, Krishnamoorthi Makkithaya

Abstract:

The recognition of handwritten numeral is an important area of research for its applications in post office, banks and other organizations. This paper presents automatic recognition of handwritten Kannada numerals based on structural features. Five different types of features, namely, profile based 10-segment string, water reservoir; vertical and horizontal strokes, end points and average boundary length from the minimal bounding box are used in the recognition of numeral. The effect of each feature and their combination in the numeral classification is analyzed using nearest neighbor classifiers. It is common to combine multiple categories of features into a single feature vector for the classification. Instead, separate classifiers can be used to classify based on each visual feature individually and the final classification can be obtained based on the combination of separate base classification results. One popular approach is to combine the classifier results into a feature vector and leaving the decision to next level classifier. This method is extended to extract a better information, possibility distribution, from the base classifiers in resolving the conflicts among the classification results. Here, we use fuzzy k Nearest Neighbor (fuzzy k-NN) as base classifier for individual feature sets, the results of which together forms the feature vector for the final k Nearest Neighbor (k-NN) classifier. Testing is done, using different features, individually and in combination, on a database containing 1600 samples of different numerals and the results are compared with the results of different existing methods.

Keywords: Fuzzy k Nearest Neighbor, Multiple Classifiers, Numeral Recognition, Structural features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1732

4669 A Finite Point Method Based on Directional Derivatives for Diffusion Equation

Authors: Guixia Lv, Longjun Shen

Abstract:

This paper presents a finite point method based on directional derivatives for diffusion equation on 2D scattered points. To discretize the diffusion operator at a given point, a six-point stencil is derived by employing explicit numerical formulae of directional derivatives, namely, for the point under consideration, only five neighbor points are involved, the number of which is the smallest for discretizing diffusion operator with first-order accuracy. A method for selecting neighbor point set is proposed, which satisfies the solvability condition of numerical derivatives. Some numerical examples are performed to show the good performance of the proposed method.

Keywords: Finite point method, directional derivatives, diffusionequation, method for selecting neighbor point set.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1329

4668 Fast and Accuracy Control Chart Pattern Recognition using a New cluster-k-Nearest Neighbor

Authors: Samir Brahim Belhaouari

Abstract:

By taking advantage of both k-NN which is highly accurate and K-means cluster which is able to reduce the time of classification, we can introduce Cluster-k-Nearest Neighbor as "variable k"-NN dealing with the centroid or mean point of all subclasses generated by clustering algorithm. In general the algorithm of K-means cluster is not stable, in term of accuracy, for that reason we develop another algorithm for clustering our space which gives a higher accuracy than K-means cluster, less subclass number, stability and bounded time of classification with respect to the variable data size. We find between 96% and 99.7 % of accuracy in the lassification of 6 different types of Time series by using K-means cluster algorithm and we find 99.7% by using the new clustering algorithm.

Keywords: Pattern recognition, Time series, k-Nearest Neighbor, k-means cluster, Gaussian Mixture Model, Classification

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1947

4667 Identity Verification Using k-NN Classifiers and Autistic Genetic Data

Authors: Fuad M. Alkoot

Abstract:

DNA data have been used in forensics for decades. However, current research looks at using the DNA as a biometric identity verification modality. The goal is to improve the speed of identification. We aim at using gene data that was initially used for autism detection to find if and how accurate is this data for identification applications. Mainly our goal is to find if our data preprocessing technique yields data useful as a biometric identification tool. We experiment with using the nearest neighbor classifier to identify subjects. Results show that optimal classification rate is achieved when the test set is corrupted by normally distributed noise with zero mean and standard deviation of 1. The classification rate is close to optimal at higher noise standard deviation reaching 3. This shows that the data can be used for identity verification with high accuracy using a simple classifier such as the k-nearest neighbor (k-NN).

Keywords: Biometrics, identity verification, genetic data, k-nearest neighbor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1096

4666 Position Based Routing Protocol with More Reliability in Mobile Ad Hoc Network

Authors: Mahboobeh Abdoos, Karim Faez, Masoud Sabaei

Abstract:

Position based routing protocols are the kinds of routing protocols, which they use of nodes location information, instead of links information to routing. In position based routing protocols, it supposed that the packet source node has position information of itself and it's neighbors and packet destination node. Greedy is a very important position based routing protocol. In one of it's kinds, named MFR (Most Forward Within Radius), source node or packet forwarder node, sends packet to one of it's neighbors with most forward progress towards destination node (closest neighbor to destination). Using distance deciding metric in Greedy to forward packet to a neighbor node, is not suitable for all conditions. If closest neighbor to destination node, has high speed, in comparison with source node or intermediate packet forwarder node speed or has very low remained battery power, then packet loss probability is increased. Proposed strategy uses combination of metrics distancevelocity similarity-power, to deciding about giving the packet to which neighbor. Simulation results show that the proposed strategy has lower lost packets average than Greedy, so it has more reliability.

Keywords: Mobile Ad Hoc Network, Position Based, Reliability, Routing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1745

4665 Neighbors of Indefinite Binary Quadratic Forms

Authors: Ahmet Tekcan

Abstract:

In this paper, we derive some algebraic identities on right and left neighbors R(F) and L(F) of an indefinite binary quadratic form F = F(x, y) = ax2 + bxy + cy2 of discriminant Δ = b2 -4ac. We prove that the proper cycle of F can be given by using its consecutive left neighbors. Also we construct a connection between right and left neighbors of F.

Keywords: Quadratic form, indefinite form, cycle, proper cycle, right neighbor, left neighbor.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1383

4664 Using Interval Trees for Approximate Indexing of Instances

Authors: Khalil el Hindi

Abstract:

This paper presents a simple and effective method for approximate indexing of instances for instance based learning. The method uses an interval tree to determine a good starting search point for the nearest neighbor. The search stops when an early stopping criterion is met. The method proved to be very effective especially when only the first nearest neighbor is required.

Keywords: Instance based learning, interval trees, the knn algorithm, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1495

4663 A Robust Method for Finding Nearest-Neighbor using Hexagon Cells

Authors: Ahmad Attiq Al-Ogaibi, Ahmad Sharieh, Moh’d Belal Al-Zoubi, R. Bremananth

Abstract:

In pattern clustering, nearest neighborhood point computation is a challenging issue for many applications in the area of research such as Remote Sensing, Computer Vision, Pattern Recognition and Statistical Imaging. Nearest neighborhood computation is an essential computation for providing sufficient classification among the volume of pixels (voxels) in order to localize the active-region-of-interests (AROI). Furthermore, it is needed to compute spatial metric relationships of diverse area of imaging based on the applications of pattern recognition. In this paper, we propose a new methodology for finding the nearest neighbor point, depending on making a virtually grid of a hexagon cells, then locate every point beneath them. An algorithm is suggested for minimizing the computation and increasing the turnaround time of the process. The nearest neighbor query points Φ are fetched by seeking fashion of hexagon holistic. Seeking will be repeated until an AROI Φ is to be expected. If any point Υ is located then searching starts in the nearest hexagons in a circular way. The First hexagon is considered be level 0 (L0) and the surrounded hexagons is level 1 (L1). If Υ is located in L1, then search starts in the next level (L2) to ensure that Υ is the nearest neighbor for Φ. Based on the result and experimental results, we found that the proposed method has an advantage over the traditional methods in terms of minimizing the time complexity required for searching the neighbors, in turn, efficiency of classification will be improved sufficiently.

Keywords: Hexagon cells, k-nearest neighbors, Nearest Neighbor, Pattern recognition, Query pattern, Virtually grid

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2774

4662 Economics of Conflict: Core Economic Dimensions of the Georgian-South Ossetian Context

Authors: V. Charaia

Abstract:

This article presents SWOT analysis for Georgian - South Ossetian conflict. The research analyzes socio-economic aspects and considers future prospects for all sides including neighbor countries and regions. Also it includes the possibilities of positive intervention of neighbor countries to solve the conflict or to mitigate its negative results. The main question of the article is: What will it take to award Georgians and South Ossetians with a peace dividend?

Keywords: Conflict economics, Georgian economy, international organizations, peace building, S. Ossetian economy.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1423

4661 Performance Comparison of Different Regression Methods for a Polymerization Process with Adaptive Sampling

Authors: Florin Leon, Silvia Curteanu

Abstract:

Developing complete mechanistic models for polymerization reactors is not easy, because complex reactions occur simultaneously; there is a large number of kinetic parameters involved and sometimes the chemical and physical phenomena for mixtures involving polymers are poorly understood. To overcome these difficulties, empirical models based on sampled data can be used instead, namely regression methods typical of machine learning field. They have the ability to learn the trends of a process without any knowledge about its particular physical and chemical laws. Therefore, they are useful for modeling complex processes, such as the free radical polymerization of methyl methacrylate achieved in a batch bulk process. The goal is to generate accurate predictions of monomer conversion, numerical average molecular weight and gravimetrical average molecular weight. This process is associated with non-linear gel and glass effects. For this purpose, an adaptive sampling technique is presented, which can select more samples around the regions where the values have a higher variation. Several machine learning methods are used for the modeling and their performance is compared: support vector machines, k-nearest neighbor, k-nearest neighbor and random forest, as well as an original algorithm, large margin nearest neighbor regression. The suggested method provides very good results compared to the other well-known regression algorithms.

Keywords: Adaptive sampling, batch bulk methyl methacrylate polymerization, large margin nearest neighbor regression, machine learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1378

4660 Improving Cryptographically Generated Address Algorithm in IPv6 Secure Neighbor Discovery Protocol through Trust Management

Authors: M. Moslehpour, S. Khorsandi

Abstract:

As transition to widespread use of IPv6 addresses has gained momentum, it has been shown to be vulnerable to certain security attacks such as those targeting Neighbor Discovery Protocol (NDP) which provides the address resolution functionality in IPv6. To protect this protocol, Secure Neighbor Discovery (SEND) is introduced. This protocol uses Cryptographically Generated Address (CGA) and asymmetric cryptography as a defense against threats on integrity and identity of NDP. Although SEND protects NDP against attacks, it is computationally intensive due to Hash2 condition in CGA. To improve the CGA computation speed, we parallelized CGA generation process and used the available resources in a trusted network. Furthermore, we focused on the influence of the existence of malicious nodes on the overall load of un-malicious ones in the network. According to the evaluation results, malicious nodes have adverse impacts on the average CGA generation time and on the average number of tries. We utilized a Trust Management that is capable of detecting and isolating the malicious node to remove possible incentives for malicious behavior. We have demonstrated the effectiveness of the Trust Management System in detecting the malicious nodes and hence improving the overall system performance.

Keywords: NDP, SEND, CGA, modifier, malicious node.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1181

4659 An Improved k Nearest Neighbor Classifier Using Interestingness Measures for Medical Image Mining

Authors: J. Alamelu Mangai, Satej Wagle, V. Santhosh Kumar

Abstract:

The exponential increase in the volume of medical image database has imposed new challenges to clinical routine in maintaining patient history, diagnosis, treatment and monitoring. With the advent of data mining and machine learning techniques it is possible to automate and/or assist physicians in clinical diagnosis. In this research a medical image classification framework using data mining techniques is proposed. It involves feature extraction, feature selection, feature discretization and classification. In the classification phase, the performance of the traditional kNN k nearest neighbor classifier is improved using a feature weighting scheme and a distance weighted voting instead of simple majority voting. Feature weights are calculated using the interestingness measures used in association rule mining. Experiments on the retinal fundus images show that the proposed framework improves the classification accuracy of traditional kNN from 78.57 % to 92.85 %.

Keywords: Medical Image Mining, Data Mining, Feature Weighting, Association Rule Mining, k nearest neighbor classifier.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3289

4658 Classification Influence Index and its Application for k-Nearest Neighbor Classifier

Authors: Sejong Oh

Abstract:

Classification is an important topic in machine learning and bioinformatics. Many datasets have been introduced for classification tasks. A dataset contains multiple features, and the quality of features influences the classification accuracy of the dataset. The power of classification for each feature differs. In this study, we suggest the Classification Influence Index (CII) as an indicator of classification power for each feature. CII enables evaluation of the features in a dataset and improved classification accuracy by transformation of the dataset. By conducting experiments using CII and the k-nearest neighbor classifier to analyze real datasets, we confirmed that the proposed index provided meaningful improvement of the classification accuracy.

Keywords: accuracy, classification, dataset, data preprocessing

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1472

4657 Wormhole Attack Detection in Wireless Sensor Networks

Authors: Zaw Tun, Aung Htein Maw

Abstract:

The nature of wireless ad hoc and sensor networks make them very attractive to attackers. One of the most popular and serious attacks in wireless ad hoc networks is wormhole attack and most proposed protocols to defend against this attack used positioning devices, synchronized clocks, or directional antennas. This paper analyzes the nature of wormhole attack and existing methods of defending mechanism and then proposes round trip time (RTT) and neighbor numbers based wormhole detection mechanism. The consideration of proposed mechanism is the RTT between two successive nodes and those nodes- neighbor number which is needed to compare those values of other successive nodes. The identification of wormhole attacks is based on the two faces. The first consideration is that the transmission time between two wormhole attack affected nodes is considerable higher than that between two normal neighbor nodes. The second detection mechanism is based on the fact that by introducing new links into the network, the adversary increases the number of neighbors of the nodes within its radius. This system does not require any specific hardware, has good performance and little overhead and also does not consume extra energy. The proposed system is designed in ad hoc on-demand distance vector (AODV) routing protocol and analysis and simulations of the proposed system are performed in network simulator (ns-2).

Keywords: AODV, Wormhole attacks, Wireless ad hoc andsensor networks

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3449

4656 A Distributed Cryptographically Generated Address Computing Algorithm for Secure Neighbor Discovery Protocol in IPv6

Authors: M. Moslehpour, S. Khorsandi

Abstract:

Due to shortage in IPv4 addresses, transition to IPv6 has gained significant momentum in recent years. Like Address Resolution Protocol (ARP) in IPv4, Neighbor Discovery Protocol (NDP) provides some functions like address resolution in IPv6. Besides functionality of NDP, it is vulnerable to some attacks. To mitigate these attacks, Internet Protocol Security (IPsec) was introduced, but it was not efficient due to its limitation. Therefore, SEND protocol is proposed to automatic protection of auto-configuration process. It is secure neighbor discovery and address resolution process. To defend against threats on NDP’s integrity and identity, Cryptographically Generated Address (CGA) and asymmetric cryptography are used by SEND. Besides advantages of SEND, its disadvantages like the computation process of CGA algorithm and sequentially of CGA generation algorithm are considerable. In this paper, we parallel this process between network resources in order to improve it. In addition, we compare the CGA generation time in self-computing and distributed-computing process. We focus on the impact of the malicious nodes on the CGA generation time in the network. According to the result, although malicious nodes participate in the generation process, CGA generation time is less than when it is computed in a one-way. By Trust Management System, detecting and insulating malicious nodes is easier.

Keywords: NDP, IPsec, SEND, CGA, Modifier, Malicious node, Self-Computing, Distributed-Computing.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1353

4655 Investigating Quality Metrics for Multimedia Traffic in OLSR Routing Protocol

Authors: B. Prabhakara Rao, M. V. H. Bhaskara Murthy

Abstract:

An Ad hoc wireless network comprises of mobile terminals linked and communicating with each other sans the aid of traditional infrastructure. Optimized Link State Protocol (OLSR) is a proactive routing protocol, in which routes are discovered/updated continuously so that they are available when needed. Hello messages generated by a node seeks information about its neighbor and if the latter fails to respond to a specified number of hello messages regulated by neighborhood hold time, the node is forced to assume that the neighbor is not in range. This paper proposes to evaluate OLSR routing protocol in a random mobility network having various neighborhood hold time intervals. The throughput and delivery ratio are also evaluated to learn about its efficiency for multimedia loads.

Keywords: Ad hoc Network, Optimized Link State Routing, Multimedia traffic

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1928

4654 Implementation of Heuristics for Solving Travelling Salesman Problem Using Nearest Neighbour and Minimum Spanning Tree Algorithms

Authors: Fatma A. Karkory, Ali A. Abudalmola

Abstract:

The travelling salesman problem (TSP) is a combinatorial optimization problem in which the goal is to find the shortest path between different cities that the salesman takes. In other words, the problem deals with finding a route covering all cities so that total distance and execution time is minimized. This paper adopts the nearest neighbor and minimum spanning tree algorithm to solve the well-known travelling salesman problem. The algorithms were implemented using java programming language. The approach is tested on three graphs that making a TSP tour instance of 5-city, 10 –city, and 229–city. The computation results validate the performance of the proposed algorithm.

Keywords: Heuristics, minimum spanning tree algorithm, Nearest Neighbor, Travelling Salesman Problem (TSP).

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 7791

4653 An Enhanced Key Management Scheme Based on Key Infection in Wireless Sensor Networks

Authors: Han Park, JooSeok Song

Abstract:

We propose an enhanced key management scheme based on Key Infection, which is lightweight scheme for tiny sensors. The basic scheme, Key Infection, is perfectly secure against node capture and eavesdropping if initial communications after node deployment is secure. If, however, an attacker can eavesdrop on the initial communications, they can take the session key. We use common neighbors for each node to generate the session key. Each node has own secret key and shares it with its neighbor nodes. Then each node can establish the session key using common neighbors- secret keys and a random number. Our scheme needs only a few communications even if it uses neighbor nodes- information. Without losing the lightness of basic scheme, it improves the resistance against eavesdropping on the initial communications more than 30%.

Keywords: Wireless Sensor Networks, Key Management

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1533

4652 A Comparison between Heuristic and Meta-Heuristic Methods for Solving the Multiple Traveling Salesman Problem

Authors: San Nah Sze, Wei King Tiong

Abstract:

The multiple traveling salesman problem (mTSP) can be used to model many practical problems. The mTSP is more complicated than the traveling salesman problem (TSP) because it requires determining which cities to assign to each salesman, as well as the optimal ordering of the cities within each salesman's tour. Previous studies proposed that Genetic Algorithm (GA), Integer Programming (IP) and several neural network (NN) approaches could be used to solve mTSP. This paper compared the results for mTSP, solved with Genetic Algorithm (GA) and Nearest Neighbor Algorithm (NNA). The number of cities is clustered into a few groups using k-means clustering technique. The number of groups depends on the number of salesman. Then, each group is solved with NNA and GA as an independent TSP. It is found that k-means clustering and NNA are superior to GA in terms of performance (evaluated by fitness function) and computing time.

Keywords: Multiple Traveling Salesman Problem, GeneticAlgorithm, Nearest Neighbor Algorithm, k-Means Clustering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3201

4651 Security in Resource Constraints Network Light Weight Encryption for Z-MAC

Authors: Mona Almansoori, Ahmed Mustafa, Ahmad Elshamy

Abstract:

Wireless sensor network was formed by a combination of nodes, systematically it transmitting the data to their base stations, this transmission data can be easily compromised if the limited processing power and the data consistency from these nodes are kept in mind; there is always a discussion to address the secure data transfer or transmission in actual time. This will present a mechanism to securely transmit the data over a chain of sensor nodes without compromising the throughput of the network by utilizing available battery resources available in the sensor node. Our methodology takes many different advantages of Z-MAC protocol for its efficiency, and it provides a unique key by sharing the mechanism using neighbor node MAC address. We present a light weighted data integrity layer which is embedded in the Z-MAC protocol to prove that our protocol performs well than Z-MAC when we introduce the different attack scenarios.

Keywords: Hybrid MAC protocol, data integrity, lightweight encryption, Neighbor based key sharing, Sensor node data processing, Z-MAC.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 533

4650 Artificial Intelligence-Based Detection of Individuals Suffering from Vestibular Disorder

Authors: D. Hişam, S. İkizoğlu

Abstract:

Identifying the problem behind balance disorder is one of the most interesting topics in medical literature. This study has considerably enhanced the development of artificial intelligence (AI) algorithms applying multiple machine learning (ML) models to sensory data on gait collected from humans to classify between normal people and those suffering from Vestibular System (VS) problems. Although AI is widely utilized as a diagnostic tool in medicine, AI models have not been used to perform feature extraction and identify VS disorders through training on raw data. In this study, three ML models, the Random Forest Classifier (RF), Extreme Gradient Boosting (XGB), and K-Nearest Neighbor (KNN), have been trained to detect VS disorder, and the performance comparison of the algorithms has been made using accuracy, recall, precision, and f1-score. With an accuracy of 95.28 %, Random Forest (RF) Classifier was the most accurate model.

Keywords: Vestibular disorder, machine learning, random forest classifier, k-nearest neighbor, extreme gradient boosting.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 129

4649 FCNN-MR: A Parallel Instance Selection Method Based on Fast Condensed Nearest Neighbor Rule

Authors: Lu Si, Jie Yu, Shasha Li, Jun Ma, Lei Luo, Qingbo Wu, Yongqi Ma, Zhengji Liu

Abstract:

Instance selection (IS) technique is used to reduce the data size to improve the performance of data mining methods. Recently, to process very large data set, several proposed methods divide the training set into some disjoint subsets and apply IS algorithms independently to each subset. In this paper, we analyze the limitation of these methods and give our viewpoint about how to divide and conquer in IS procedure. Then, based on fast condensed nearest neighbor (FCNN) rule, we propose a large data sets instance selection method with MapReduce framework. Besides ensuring the prediction accuracy and reduction rate, it has two desirable properties: First, it reduces the work load in the aggregation node; Second and most important, it produces the same result with the sequential version, which other parallel methods cannot achieve. We evaluate the performance of FCNN-MR on one small data set and two large data sets. The experimental results show that it is effective and practical.

Keywords: Instance selection, data reduction, MapReduce, kNN.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 995

4648 Image Spam Detection Using Color Features and K-Nearest Neighbor Classification

Authors: T. Kumaresan, S. Sanjushree, C. Palanisamy

Abstract:

Image spam is a kind of email spam where the spam text is embedded with an image. It is a new spamming technique being used by spammers to send their messages to bulk of internet users. Spam email has become a big problem in the lives of internet users, causing time consumption and economic losses. The main objective of this paper is to detect the image spam by using histogram properties of an image. Though there are many techniques to automatically detect and avoid this problem, spammers employing new tricks to bypass those techniques, as a result those techniques are inefficient to detect the spam mails. In this paper we have proposed a new method to detect the image spam. Here the image features are extracted by using RGB histogram, HSV histogram and combination of both RGB and HSV histogram. Based on the optimized image feature set classification is done by using k- Nearest Neighbor(k-NN) algorithm. Experimental result shows that our method has achieved better accuracy. From the result it is known that combination of RGB and HSV histogram with k-NN algorithm gives the best accuracy in spam detection.

Keywords: File Type, HSV Histogram, k-NN, RGB Histogram, Spam Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2117

4647 Spatial Data Mining by Decision Trees

Authors: S. Oujdi, H. Belbachir

Abstract:

Existing methods of data mining cannot be applied on spatial data because they require spatial specificity consideration, as spatial relationships. This paper focuses on the classification with decision trees, which are one of the data mining techniques. We propose an extension of the C4.5 algorithm for spatial data, based on two different approaches Join materialization and Querying on the fly the different tables. Similar works have been done on these two main approaches, the first - Join materialization - favors the processing time in spite of memory space, whereas the second - Querying on the fly different tables- promotes memory space despite of the processing time. The modified C4.5 algorithm requires three entries tables: a target table, a neighbor table, and a spatial index join that contains the possible spatial relationship among the objects in the target table and those in the neighbor table. Thus, the proposed algorithms are applied to a spatial data pattern in the accidentology domain. A comparative study of our approach with other works of classification by spatial decision trees will be detailed.

Keywords: C4.5 Algorithm, Decision trees, S-CART, Spatial data mining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2961

4646 Methods of Geodesic Distance in Two-Dimensional Face Recognition

Authors: Rachid Ahdid, Said Safi, Bouzid Manaut

Abstract:

In this paper, we present a comparative study of three methods of 2D face recognition system such as: Iso-Geodesic Curves (IGC), Geodesic Distance (GD) and Geodesic-Intensity Histogram (GIH). These approaches are based on computing of geodesic distance between points of facial surface and between facial curves. In this study we represented the image at gray level as a 2D surface in a 3D space, with the third coordinate proportional to the intensity values of pixels. In the classifying step, we use: Neural Networks (NN), K-Nearest Neighbor (KNN) and Support Vector Machines (SVM). The images used in our experiments are from two wellknown databases of face images ORL and YaleB. ORL data base was used to evaluate the performance of methods under conditions where the pose and sample size are varied, and the database YaleB was used to examine the performance of the systems when the facial expressions and lighting are varied.

Keywords: 2D face recognition, Geodesic distance, Iso-Geodesic Curves, Geodesic-Intensity Histogram, facial surface, Neural Networks, K-Nearest Neighbor, Support Vector Machines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1800

4645 Study of Compaction in Hot-Mix Asphalt Using Computer Simulations

Authors: Kasthurirangan Gopalakrishnan, Naga Shashidhar, Xiaoxiong Zhong

Abstract:

During the process of compaction in Hot-Mix Asphalt (HMA) mixtures, the distance between aggregate particles decreases as they come together and eliminate air-voids. By measuring the inter-particle distances in a cut-section of a HMA sample the degree of compaction can be estimated. For this, a calibration curve is generated by computer simulation technique when the gradation and asphalt content of the HMA mixture are known. A two-dimensional cross section of HMA specimen was simulated using the mixture design information (gradation, asphalt content and air-void content). Nearest neighbor distance methods such as Delaunay triangulation were used to study the changes in inter-particle distance and area distribution during the process of compaction in HMA. Such computer simulations would enable making several hundreds of repetitions in a short period of time without the necessity to compact and analyze laboratory specimens in order to obtain good statistics on the parameters defined. The distributions for the statistical parameters based on computer simulations showed similar trends as those of laboratory specimens.

Keywords: Computer simulations, Hot-Mix Asphalt (HMA), inter-particle distance, image analysis, nearest neighbor

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1873

4644 Analysis of Genetic Variations in Camel Breeds (Camelus dromedarius)

Authors: Yasser M. Saad, Amr A. El Hanafy, Saleh A. Alkarim, Hussein A. Almehdar, Elrashdy M. Redwan

Abstract:

Camels are substantial providers of transport, milk, sport, meat, shelter, security and capital in many countries, particularly in Saudi Arabia. Inter simple sequence repeat technique was used to detect the genetic variations among some camel breeds (Majaheim, Safra, Wadah, and Hamara). Actual number of alleles, effective number of alleles, gene diversity, Shannon’s information index and polymorphic bands were calculated for each evaluated camel breed. Neighbor-joining tree that re-constructed for evaluated these camel breeds showed that, Hamara breed is distantly related from the other evaluated camels. In addition, the polymorphic sites, haplotypes and nucleotide diversity were identified for some camelidae cox1 gene sequences (obtained from NCBI). The distance value between C. bactrianus and C. dromedarius (0.072) was relatively low. Analysis of genetic diversity is an important way for conserving Camelus dromedarius genetic resources.

Keywords: Camel, genetics, ISSR, cox1, neighbor-joining.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1273

4643 Reducing SAGE Data Using Genetic Algorithms

Authors: Cheng-Hong Yang, Tsung-Mu Shih, Li-Yeh Chuang

Abstract:

Serial Analysis of Gene Expression is a powerful quantification technique for generating cell or tissue gene expression data. The profile of the gene expression of cell or tissue in several different states is difficult for biologists to analyze because of the large number of genes typically involved. However, feature selection in machine learning can successfully reduce this problem. The method allows reducing the features (genes) in specific SAGE data, and determines only relevant genes. In this study, we used a genetic algorithm to implement feature selection, and evaluate the classification accuracy of the selected features with the K-nearest neighbor method. In order to validate the proposed method, we used two SAGE data sets for testing. The results of this study conclusively prove that the number of features of the original SAGE data set can be significantly reduced and higher classification accuracy can be achieved.

Keywords: Serial Analysis of Gene Expression, Feature selection, Genetic Algorithm, K-nearest neighbor method.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1588