Search results for: approximate nearest neighbor search
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 2567

Search results for: approximate nearest neighbor search

2507 Predicting Groundwater Areas Using Data Mining Techniques: Groundwater in Jordan as Case Study

Authors: Faisal Aburub, Wael Hadi

Abstract:

Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.

Keywords: classification, data mining, evaluation measures, groundwater

Procedia PDF Downloads 279
2506 An Analysis of Classification of Imbalanced Datasets by Using Synthetic Minority Over-Sampling Technique

Authors: Ghada A. Alfattni

Abstract:

Analysing unbalanced datasets is one of the challenges that practitioners in machine learning field face. However, many researches have been carried out to determine the effectiveness of the use of the synthetic minority over-sampling technique (SMOTE) to address this issue. The aim of this study was therefore to compare the effectiveness of the SMOTE over different models on unbalanced datasets. Three classification models (Logistic Regression, Support Vector Machine and Nearest Neighbour) were tested with multiple datasets, then the same datasets were oversampled by using SMOTE and applied again to the three models to compare the differences in the performances. Results of experiments show that the highest number of nearest neighbours gives lower values of error rates. 

Keywords: imbalanced datasets, SMOTE, machine learning, logistic regression, support vector machine, nearest neighbour

Procedia PDF Downloads 350
2505 The Application of Pareto Local Search to the Single-Objective Quadratic Assignment Problem

Authors: Abdullah Alsheddy

Abstract:

This paper presents the employment of Pareto optimality as a strategy to help (single-objective) local search escaping local optima. Instead of local search, Pareto local search is applied to solve the quadratic assignment problem which is multi-objectivized by adding a helper objective. The additional objective is defined as a function of the primary one with augmented penalties that are dynamically updated.

Keywords: Pareto optimization, multi-objectivization, quadratic assignment problem, local search

Procedia PDF Downloads 466
2504 Performance Evaluation of Contemporary Classifiers for Automatic Detection of Epileptic EEG

Authors: K. E. Ch. Vidyasagar, M. Moghavvemi, T. S. S. T. Prabhat

Abstract:

Epilepsy is a global problem, and with seizures eluding even the smartest of diagnoses a requirement for automatic detection of the same using electroencephalogram (EEG) would have a huge impact in diagnosis of the disorder. Among a multitude of methods for automatic epilepsy detection, one should find the best method out, based on accuracy, for classification. This paper reasons out, and rationalizes, the best methods for classification. Accuracy is based on the classifier, and thus this paper discusses classifiers like quadratic discriminant analysis (QDA), classification and regression tree (CART), support vector machine (SVM), naive Bayes classifier (NBC), linear discriminant analysis (LDA), K-nearest neighbor (KNN) and artificial neural networks (ANN). Results show that ANN is the most accurate of all the above stated classifiers with 97.7% accuracy, 97.25% specificity and 98.28% sensitivity in its merit. This is followed closely by SVM with 1% variation in result. These results would certainly help researchers choose the best classifier for detection of epilepsy.

Keywords: classification, seizure, KNN, SVM, LDA, ANN, epilepsy

Procedia PDF Downloads 520
2503 Machine Learning for Disease Prediction Using Symptoms and X-Ray Images

Authors: Ravija Gunawardana, Banuka Athuraliya

Abstract:

Machine learning has emerged as a powerful tool for disease diagnosis and prediction. The use of machine learning algorithms has the potential to improve the accuracy of disease prediction, thereby enabling medical professionals to provide more effective and personalized treatments. This study focuses on developing a machine-learning model for disease prediction using symptoms and X-ray images. The importance of this study lies in its potential to assist medical professionals in accurately diagnosing diseases, thereby improving patient outcomes. Respiratory diseases are a significant cause of morbidity and mortality worldwide, and chest X-rays are commonly used in the diagnosis of these diseases. However, accurately interpreting X-ray images requires significant expertise and can be time-consuming, making it difficult to diagnose respiratory diseases in a timely manner. By incorporating machine learning algorithms, we can significantly enhance disease prediction accuracy, ultimately leading to better patient care. The study utilized the Mask R-CNN algorithm, which is a state-of-the-art method for object detection and segmentation in images, to process chest X-ray images. The model was trained and tested on a large dataset of patient information, which included both symptom data and X-ray images. The performance of the model was evaluated using a range of metrics, including accuracy, precision, recall, and F1-score. The results showed that the model achieved an accuracy rate of over 90%, indicating that it was able to accurately detect and segment regions of interest in the X-ray images. In addition to X-ray images, the study also incorporated symptoms as input data for disease prediction. The study used three different classifiers, namely Random Forest, K-Nearest Neighbor and Support Vector Machine, to predict diseases based on symptoms. These classifiers were trained and tested using the same dataset of patient information as the X-ray model. The results showed promising accuracy rates for predicting diseases using symptoms, with the ensemble learning techniques significantly improving the accuracy of disease prediction. The study's findings indicate that the use of machine learning algorithms can significantly enhance disease prediction accuracy, ultimately leading to better patient care. The model developed in this study has the potential to assist medical professionals in diagnosing respiratory diseases more accurately and efficiently. However, it is important to note that the accuracy of the model can be affected by several factors, including the quality of the X-ray images, the size of the dataset used for training, and the complexity of the disease being diagnosed. In conclusion, the study demonstrated the potential of machine learning algorithms for disease prediction using symptoms and X-ray images. The use of these algorithms can improve the accuracy of disease diagnosis, ultimately leading to better patient care. Further research is needed to validate the model's accuracy and effectiveness in a clinical setting and to expand its application to other diseases.

Keywords: K-nearest neighbor, mask R-CNN, random forest, support vector machine

Procedia PDF Downloads 154
2502 Using Scale Invariant Feature Transform Features to Recognize Characters in Natural Scene Images

Authors: Belaynesh Chekol, Numan Çelebi

Abstract:

The main purpose of this work is to recognize individual characters extracted from natural scene images using scale invariant feature transform (SIFT) features as an input to K-nearest neighbor (KNN); a classification learner algorithm. For this task, 1,068 and 78 images of English alphabet characters taken from Chars74k data set is used to train and test the classifier respectively. For each character image, We have generated describing features by using SIFT algorithm. This set of features is fed to the learner so that it can recognize and label new images of English characters. Two types of KNN (fine KNN and weighted KNN) were trained and the resulted classification accuracy is 56.9% and 56.5% respectively. The training time taken was the same for both fine and weighted KNN.

Keywords: character recognition, KNN, natural scene image, SIFT

Procedia PDF Downloads 281
2501 Interactive, Topic-Oriented Search Support by a Centroid-Based Text Categorisation

Authors: Mario Kubek, Herwig Unger

Abstract:

Centroid terms are single words that semantically and topically characterise text documents and so may serve as their very compact representation in automatic text processing. In the present paper, centroids are used to measure the relevance of text documents with respect to a given search query. Thus, a new graphbased paradigm for searching texts in large corpora is proposed and evaluated against keyword-based methods. The first, promising experimental results demonstrate the usefulness of the centroid-based search procedure. It is shown that especially the routing of search queries in interactive and decentralised search systems can be greatly improved by applying this approach. A detailed discussion on further fields of its application completes this contribution.

Keywords: search algorithm, centroid, query, keyword, co-occurrence, categorisation

Procedia PDF Downloads 282
2500 On the Interactive Search with Web Documents

Authors: Mario Kubek, Herwig Unger

Abstract:

Due to the large amount of information in the World Wide Web (WWW, web) and the lengthy and usually linearly ordered result lists of web search engines that do not indicate semantic relationships between their entries, the search for topically similar and related documents can become a tedious task. Especially, the process of formulating queries with proper terms representing specific information needs requires much effort from the user. This problem gets even bigger when the user's knowledge on a subject and its technical terms is not sufficient enough to do so. This article presents the new and interactive search application DocAnalyser that addresses this problem by enabling users to find similar and related web documents based on automatic query formulation and state-of-the-art search word extraction. Additionally, this tool can be used to track topics across semantically connected web documents

Keywords: DocAnalyser, interactive web search, search word extraction, query formulation, source topic detection, topic tracking

Procedia PDF Downloads 393
2499 Hybrid Approximate Structural-Semantic Frequent Subgraph Mining

Authors: Montaceur Zaghdoud, Mohamed Moussaoui, Jalel Akaichi

Abstract:

Frequent subgraph mining refers usually to graph matching and it is widely used in when analyzing big data with large graphs. A lot of research works dealt with structural exact or inexact graph matching but a little attention is paid to semantic matching when graph vertices and/or edges are attributed and typed. Therefore, it seems very interesting to integrate background knowledge into the analysis and that extracted frequent subgraphs should become more pruned by applying a new semantic filter instead of using only structural similarity in graph matching process. Consequently, this paper focuses on developing a new hybrid approximate structuralsemantic graph matching to discover a set of frequent subgraphs. It uses simultaneously an approximate structural similarity function based on graph edit distance function and a possibilistic vertices similarity function based on affinity function. Both structural and semantic filters contribute together to prune extracted frequent set. Indeed, new hybrid structural-semantic frequent subgraph mining approach searches will be suitable to be applied to several application such as community detection in social networks.

Keywords: approximate graph matching, hybrid frequent subgraph mining, graph mining, possibility theory

Procedia PDF Downloads 402
2498 Interactive Image Search for Mobile Devices

Authors: Komal V. Aher, Sanjay B. Waykar

Abstract:

Nowadays every individual having mobile device with them. In both computer vision and information retrieval Image search is currently hot topic with many applications. The proposed intelligent image search system is fully utilizing multimodal and multi-touch functionalities of smart phones which allows search with Image, Voice, and Text on mobile phones. The system will be more useful for users who already have pictures in their minds but have no proper descriptions or names to address them. The paper gives system with ability to form composite visual query to express user’s intention more clearly which helps to give more precise or appropriate results to user. The proposed algorithm will considerably get better in different aspects. System also uses Context based Image retrieval scheme to give significant outcomes. So system is able to achieve gain in terms of search performance, accuracy and user satisfaction.

Keywords: color space, histogram, mobile device, mobile visual search, multimodal search

Procedia PDF Downloads 367
2497 Development of the Academic Model to Predict Student Success at VUT-FSASEC Using Decision Trees

Authors: Langa Hendrick Musawenkosi, Twala Bhekisipho

Abstract:

The success or failure of students is a concern for every academic institution, college, university, governments and students themselves. Several approaches have been researched to address this concern. In this paper, a view is held that when a student enters a university or college or an academic institution, he or she enters an academic environment. The academic environment is unique concept used to develop the solution for making predictions effectively. This paper presents a model to determine the propensity of a student to succeed or fail in the French South African Schneider Electric Education Center (FSASEC) at the Vaal University of Technology (VUT). The Decision Tree algorithm is used to implement the model at FSASEC.

Keywords: FSASEC, academic environment model, decision trees, k-nearest neighbor, machine learning, popularity index, support vector machine

Procedia PDF Downloads 200
2496 Electroencephalogram Based Alzheimer Disease Classification using Machine and Deep Learning Methods

Authors: Carlos Roncero-Parra, Alfonso Parreño-Torres, Jorge Mateo Sotos, Alejandro L. Borja

Abstract:

In this research, different methods based on machine/deep learning algorithms are presented for the classification and diagnosis of patients with mental disorders such as alzheimer. For this purpose, the signals obtained from 32 unipolar electrodes identified by non-invasive EEG were examined, and their basic properties were obtained. More specifically, different well-known machine learning based classifiers have been used, i.e., support vector machine (SVM), Bayesian linear discriminant analysis (BLDA), decision tree (DT), Gaussian Naïve Bayes (GNB), K-nearest neighbor (KNN) and Convolutional Neural Network (CNN). A total of 668 patients from five different hospitals have been studied in the period from 2011 to 2021. The best accuracy is obtained was around 93 % in both ADM and ADA classifications. It can be concluded that such a classification will enable the training of algorithms that can be used to identify and classify different mental disorders with high accuracy.

Keywords: alzheimer, machine learning, deep learning, EEG

Procedia PDF Downloads 126
2495 An Early Detection Type 2 Diabetes Using K - Nearest Neighbor Algorithm

Authors: Ng Liang Shen, Ngahzaifa Abdul Ghani

Abstract:

This research aimed at developing an early warning system for pre-diabetic and diabetics by analyzing simple and easily determinable signs and symptoms of diabetes among the people living in Malaysia using Particle Swarm Optimized Artificial. With the skyrocketing prevalence of Type 2 diabetes in Malaysia, the system can be used to encourage affected people to seek further medical attention to prevent the onset of diabetes or start managing it early enough to avoid the associated complications. The study sought to find out the best predictive variables of Type 2 Diabetes Mellitus, developed a system to diagnose diabetes from the variables using Artificial Neural Networks and tested the system on accuracy to find out the patent generated from diabetes diagnosis result in machine learning algorithms even at primary or advanced stages.

Keywords: diabetes diagnosis, Artificial Neural Networks, artificial intelligence, soft computing, medical diagnosis

Procedia PDF Downloads 336
2494 Estimate Robert Gordon University's Scope Three Emissions by Nearest Neighbor Analysis

Authors: Nayak Amar, Turner Naomi, Gobina Edward

Abstract:

The Scottish Higher Education Institutes must report their scope 1 & 2 emissions, whereas reporting scope 3 is optional. Scope 3 is indirect emissions which embodies a significant component of total carbon footprint and therefore it is important to record, measure and report it accurately. Robert Gordon University (RGU) reported only business travel, grid transmission and distribution, water supply and transport, and recycling scope 3 emissions. This study estimates the RGUs total scope 3 emissions by comparing it with a similar HEI in scale. The scope 3 emission reporting of sixteen Scottish HEI was studied. Glasgow Caledonian University was identified as the nearest neighbour by comparing its students' full time equivalent, staff full time equivalent, research-teaching split, budget, and foundation year. Apart from the peer, data was also collected from the Higher Education Statistics Agency database. RGU reported emissions from business travel, grid transmission and distribution, water supply, and transport and recycling. This study estimated RGUs scope 3 emissions from procurement, student-staff commute, and international student trip. The result showed that RGU covered only 11% of the scope 3 emissions. The major contributor to scope 3 emissions were procurement (48%), student commute (21%), international student trip (16%), and staff commute (4%). The estimated scope 3 emission was more than 14 times the reported emissions. This study has shown the relative importance of each scope 3 emissions source, which gives a guideline for the HEIs, on where to focus their attention to capture maximum scope 3 emissions. Moreover, it has demonstrated that it is possible to estimate the scope 3 emissions with limited data.

Keywords: HEI, university, emission calculations, scope 3 emissions, emissions reporting

Procedia PDF Downloads 100
2493 Investigating the Energy Gap and Wavelength of (AlₓGa₁₋ₓAs)ₘ/(GaAs)ₙ Superlattices in Terms of Material Thickness and Al Mole Fraction Using Empirical Tight-Binding Method

Authors: Matineh Sadat Hosseini Gheidari, Vahid Reza Yazdanpanah

Abstract:

In this paper, we used the empirical tight-binding method (ETBM) with sp3s* approximation and considering the first nearest neighbor with spin-orbit interactions in order to model superlattice structure (SLS) of (AlₓGa₁₋ₓAs)ₘ/(GaAs)ₙ grown on GaAs (100) substrate at 300K. In the next step, we investigated the behavior of the energy gap and wavelength of this superlattice in terms of different thicknesses of core materials and Al mole fractions. As a result of this survey, we found out that as the Al composition increases, the energy gap of this superlattice has an upward trend and ranges from 1.42-1.63 eV. Also, according to the wavelength range that we gained from this superlattice in different Al mole fractions and various thicknesses, we can find a suitable semiconductor for a special light-emitting diode (LED) application.

Keywords: energy gap, empirical tight-binding method, light-emitting diode, superlattice, wavelength

Procedia PDF Downloads 205
2492 Information Extraction Based on Search Engine Results

Authors: Mohammed R. Elkobaisi, Abdelsalam Maatuk

Abstract:

The search engines are the large scale information retrieval tools from the Web that are currently freely available to all. This paper explains how to convert the raw resulted number of search engines into useful information. This represents a new method for data gathering comparing with traditional methods. When a query is submitted for a multiple numbers of keywords, this take a long time and effort, hence we develop a user interface program to automatic search by taking multi-keywords at the same time and leave this program to collect wanted data automatically. The collected raw data is processed using mathematical and statistical theories to eliminate unwanted data and converting it to usable data.

Keywords: search engines, information extraction, agent system

Procedia PDF Downloads 430
2491 A Comparative Study between Different Techniques of Off-Page and On-Page Search Engine Optimization

Authors: Ahmed Ishtiaq, Maeeda Khalid, Umair Sajjad

Abstract:

In the fast-moving world, information is the key to success. If information is easily available, then it makes work easy. The Internet is the biggest collection and source of information nowadays, and with every single day, the data on internet increases, and it becomes difficult to find required data. Everyone wants to make his/her website at the top of search results. This can be possible when you have applied some techniques of SEO inside your application or outside your application, which are two types of SEO, onsite and offsite SEO. SEO is an abbreviation of Search Engine Optimization, and it is a set of techniques, methods to increase users of a website on World Wide Web or to rank up your website in search engine indexing. In this paper, we have compared different techniques of Onpage and Offpage SEO, and we have suggested many things that should be changed inside webpage, outside web page and mentioned some most powerful and search engine considerable elements and techniques in both types of SEO in order to gain high ranking on Search Engine.

Keywords: auto-suggestion, search engine optimization, SEO, query, web mining, web crawler

Procedia PDF Downloads 149
2490 Metaheuristic to Align Multiple Sequences

Authors: Lamiche Chaabane

Abstract:

In this study, a new method for solving sequence alignment problem is proposed, which is named ITS (Improved Tabu Search). This algorithm is based on the classical Tabu Search (TS). ITS is implemented in order to obtain results of multiple sequence alignment. Several ideas concerning neighbourhood generation, move selection mechanisms and intensification/diversification strategies for our proposed ITS is investigated. ITS have generated high-quality results in terms of measure of scores in comparison with the classical TS and simple iterative search algorithm.

Keywords: multiple sequence alignment, tabu search, improved tabu search, neighbourhood generation, selection mechanisms

Procedia PDF Downloads 305
2489 Prediction of Dubai Financial Market Stocks Movement Using K-Nearest Neighbor and Support Vector Regression

Authors: Abdulla D. Alblooshi

Abstract:

The stock market is a representation of human behavior and psychology, such as fear, greed, and discipline. Those are manifested in the form of price movements during the trading sessions. Therefore, predicting the stock movement and prices is a challenging effort. However, those trading sessions produce a large amount of data that can be utilized to train an AI agent for the purpose of predicting the stock movement. Predicting the stock market price action will be advantageous. In this paper, the stock movement data of three DFM listed stocks are studied using historical price movements and technical indicators value and used to train an agent using KNN and SVM methods to predict the future price movement. MATLAB Toolbox and a simple script is written to process and classify the information and output the prediction. It will also compare the different learning methods and parameters s using metrics like RMSE, MAE, and R².

Keywords: KNN, ANN, style, SVM, stocks, technical indicators, RSI, MACD, moving averages, RMSE, MAE

Procedia PDF Downloads 169
2488 Solving Process Planning and Scheduling with Number of Operation Plus Processing Time Due-Date Assignment Concurrently Using a Genetic Search

Authors: Halil Ibrahim Demir, Alper Goksu, Onur Canpolat, Caner Erden, Melek Nur

Abstract:

Traditionally process planning, scheduling and due date assignment are performed sequentially and separately. High interrelation between these functions makes integration very useful. Although there are numerous works on integrated process planning and scheduling and many works on scheduling with due date assignment, there are only a few works on the integration of these three functions. Here we tested the different integration levels of these three functions and found a fully integrated version as the best. We applied genetic search and random search and genetic search was found better compared to the random search. We penalized all earliness, tardiness and due date related costs. Since all these three terms are all undesired, it is better to penalize all of them.

Keywords: process planning, scheduling, due-date assignment, genetic algorithm, random search

Procedia PDF Downloads 375
2487 Human Gait Recognition Using Moment with Fuzzy

Authors: Jyoti Bharti, Navneet Manjhi, M. K.Gupta, Bimi Jain

Abstract:

A reliable gait features are required to extract the gait sequences from an images. In this paper suggested a simple method for gait identification which is based on moments. Moment values are extracted on different number of frames of gray scale and silhouette images of CASIA database. These moment values are considered as feature values. Fuzzy logic and nearest neighbour classifier are used for classification. Both achieved higher recognition.

Keywords: gait, fuzzy logic, nearest neighbour, recognition rate, moments

Procedia PDF Downloads 757
2486 Improving Cryptographically Generated Address Algorithm in IPv6 Secure Neighbor Discovery Protocol through Trust Management

Authors: M. Moslehpour, S. Khorsandi

Abstract:

As transition to widespread use of IPv6 addresses has gained momentum, it has been shown to be vulnerable to certain security attacks such as those targeting Neighbor Discovery Protocol (NDP) which provides the address resolution functionality in IPv6. To protect this protocol, Secure Neighbor Discovery (SEND) is introduced. This protocol uses Cryptographically Generated Address (CGA) and asymmetric cryptography as a defense against threats on integrity and identity of NDP. Although SEND protects NDP against attacks, it is computationally intensive due to Hash2 condition in CGA. To improve the CGA computation speed, we parallelized CGA generation process and used the available resources in a trusted network. Furthermore, we focused on the influence of the existence of malicious nodes on the overall load of un-malicious ones in the network. According to the evaluation results, malicious nodes have adverse impacts on the average CGA generation time and on the average number of tries. We utilized a Trust Management that is capable of detecting and isolating the malicious node to remove possible incentives for malicious behavior. We have demonstrated the effectiveness of the Trust Management System in detecting the malicious nodes and hence improving the overall system performance.

Keywords: CGA, ICMPv6, IPv6, malicious node, modifier, NDP, overall load, SEND, trust management

Procedia PDF Downloads 184
2485 LiDAR Based Real Time Multiple Vehicle Detection and Tracking

Authors: Zhongzhen Luo, Saeid Habibi, Martin v. Mohrenschildt

Abstract:

Self-driving vehicle require a high level of situational awareness in order to maneuver safely when driving in real world condition. This paper presents a LiDAR based real time perception system that is able to process sensor raw data for multiple target detection and tracking in dynamic environment. The proposed algorithm is nonparametric and deterministic that is no assumptions and priori knowledge are needed from the input data and no initializations are required. Additionally, the proposed method is working on the three-dimensional data directly generated by LiDAR while not scarifying the rich information contained in the domain of 3D. Moreover, a fast and efficient for real time clustering algorithm is applied based on a radially bounded nearest neighbor (RBNN). Hungarian algorithm procedure and adaptive Kalman filtering are used for data association and tracking algorithm. The proposed algorithm is able to run in real time with average run time of 70ms per frame.

Keywords: lidar, segmentation, clustering, tracking

Procedia PDF Downloads 423
2484 An Approximation of Daily Rainfall by Using a Pixel Value Data Approach

Authors: Sarisa Pinkham, Kanyarat Bussaban

Abstract:

The research aims to approximate the amount of daily rainfall by using a pixel value data approach. The daily rainfall maps from the Thailand Meteorological Department in period of time from January to December 2013 were the data used in this study. The results showed that this approach can approximate the amount of daily rainfall with RMSE=3.343.

Keywords: daily rainfall, image processing, approximation, pixel value data

Procedia PDF Downloads 387
2483 Design and Implementation of an Effective Machine Learning Approach to Crime Prediction and Prevention

Authors: Ashish Kumar, Kaptan Singh, Amit Saxena

Abstract:

Today, it is believed that crimes have the greatest impact on a person's ability to progress financially and personally. Identifying places where individuals shouldn't go is crucial for preventing crimes and is one of the key considerations. As society and technologies have advanced significantly, so have crimes and the harm they wreak. When there is a concentration of people in one place and changes happen quickly, it is even harder to prevent. Because of this, many crime prevention strategies have been embraced as a component of the development of smart cities in numerous cities. However, crimes can occur anywhere; all that is required is to identify the pattern of their occurrences, which will help to lower the crime rate. In this paper, an analysis related to crime has been done; information related to crimes is collected from all over India that can be accessed from anywhere. The purpose of this paper is to investigate the relationship between several factors and India's crime rate. The review has covered information related to every state of India and their associated regions of the period going in between 2001- 2014. However various classes of violations have a marginally unique scope over the years.

Keywords: K-nearest neighbor, random forest, decision tree, pre-processing

Procedia PDF Downloads 90
2482 Cotton Crops Vegetative Indices Based Assessment Using Multispectral Images

Authors: Muhammad Shahzad Shifa, Amna Shifa, Muhammad Omar, Aamir Shahzad, Rahmat Ali Khan

Abstract:

Many applications of remote sensing to vegetation and crop response depend on spectral properties of individual leaves and plants. Vegetation indices are usually determined to estimate crop biophysical parameters like crop canopies and crop leaf area indices with the help of remote sensing. Cotton crops assessment is performed with the help of vegetative indices. Remotely sensed images from an optical multispectral radiometer MSR5 are used in this study. The interpretation is based on the fact that different materials reflect and absorb light differently at different wavelengths. Non-normalized and normalized forms of these datasets are analyzed using two complementary data mining algorithms; K-means and K-nearest neighbor (KNN). Our analysis shows that the use of normalized reflectance data and vegetative indices are suitable for an automated assessment and decision making.

Keywords: cotton, condition assessment, KNN algorithm, clustering, MSR5, vegetation indices

Procedia PDF Downloads 333
2481 A Study of Permission-Based Malware Detection Using Machine Learning

Authors: Ratun Rahman, Rafid Islam, Akin Ahmed, Kamrul Hasan, Hasan Mahmud

Abstract:

Malware is becoming more prevalent, and several threat categories have risen dramatically in recent years. This paper provides a bird's-eye view of the world of malware analysis. The efficiency of five different machine learning methods (Naive Bayes, K-Nearest Neighbor, Decision Tree, Random Forest, and TensorFlow Decision Forest) combined with features picked from the retrieval of Android permissions to categorize applications as harmful or benign is investigated in this study. The test set consists of 1,168 samples (among these android applications, 602 are malware and 566 are benign applications), each consisting of 948 features (permissions). Using the permission-based dataset, the machine learning algorithms then produce accuracy rates above 80%, except the Naive Bayes Algorithm with 65% accuracy. Of the considered algorithms TensorFlow Decision Forest performed the best with an accuracy of 90%.

Keywords: android malware detection, machine learning, malware, malware analysis

Procedia PDF Downloads 167
2480 Microstructure Evolution and Pre-transformation Microstructure Reconstruction in Ti-6Al-4V Alloy

Authors: Shreyash Hadke, Manendra Singh Parihar, Rajesh Khatirkar

Abstract:

In the present investigation, the variation in the microstructure with the changes in the heat treatment conditions i.e. temperature and time was observed. Ti-6Al-4V alloy was subject to solution annealing treatments in β (1066C) and α+β phase (930C and 850C) followed by quenching, air cooling and furnace cooling to room temperature respectively. The effect of solution annealing and cooling on the microstructure was studied by using optical microscopy (OM), scanning electron microscopy (SEM), electron backscattered diffraction (EBSD) and x-ray diffraction (XRD). The chemical composition of the β phase for different conditions was determined with the help of energy dispersive spectrometer (EDS) attached to SEM. Furnace cooling resulted in the development of coarser structure (α+β), while air cooling resulted in much finer structure with widmanstatten morphology of α at the grain boundaries. Quenching from solution annealing temperature formed α’ martensite, their proportion being dependent on the temperature in β phase field. It is well known that the transformation of β to α follows Burger orientation relationship (OR). In order to reconstruct the microstructure of parent β phase, a MATLAB code was written using neighbor-to-neighbor, triplet method and Tari’s method. The code was tested on the annealed samples (1066C solution annealing temperature followed by furnace cooling to room temperature). The parent phase data thus generated was then plotted using the TSL-OIM software. The reconstruction results of the above methods were compared and analyzed. The Tari’s approach (clustering approach) gave better results compared to neighbor-to-neighbor and triplet method but the time taken by the triplet method was least compared to the other two methods.

Keywords: Ti-6Al-4V alloy, microstructure, electron backscattered diffraction, parent phase reconstruction

Procedia PDF Downloads 446
2479 Emotional Analysis for Text Search Queries on Internet

Authors: Gemma García López

Abstract:

The goal of this study is to analyze if search queries carried out in search engines such as Google, can offer emotional information about the user that performs them. Knowing the emotional state in which the Internet user is located can be a key to achieve the maximum personalization of content and the detection of worrying behaviors. For this, two studies were carried out using tools with advanced natural language processing techniques. The first study determines if a query can be classified as positive, negative or neutral, while the second study extracts emotional content from words and applies the categorical and dimensional models for the representation of emotions. In addition, we use search queries in Spanish and English to establish similarities and differences between two languages. The results revealed that text search queries performed by users on the Internet can be classified emotionally. This allows us to better understand the emotional state of the user at the time of the search, which could involve adapting the technology and personalizing the responses to different emotional states.

Keywords: emotion classification, text search queries, emotional analysis, sentiment analysis in text, natural language processing

Procedia PDF Downloads 141
2478 An Analysis of Sequential Pattern Mining on Databases Using Approximate Sequential Patterns

Authors: J. Suneetha, Vijayalaxmi

Abstract:

Sequential Pattern Mining involves applying data mining methods to large data repositories to extract usage patterns. Sequential pattern mining methodologies used to analyze the data and identify patterns. The patterns have been used to implement efficient systems can recommend on previously observed patterns, in making predictions, improve usability of systems, detecting events, and in general help in making strategic product decisions. In this paper, identified performance of approximate sequential pattern mining defines as identifying patterns approximately shared with many sequences. Approximate sequential patterns can effectively summarize and represent the databases by identifying the underlying trends in the data. Conducting an extensive and systematic performance over synthetic and real data. The results demonstrate that ApproxMAP effective and scalable in mining large sequences databases with long patterns.

Keywords: multiple data, performance analysis, sequential pattern, sequence database scalability

Procedia PDF Downloads 340