Search results for: k nearest neighbor
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 356

Search results for: k nearest neighbor

266 Trends and Inequalities in Distance to and Use of Nearest Natural Space in the Context of the 20-Minute Neighbourhood: A 4-Wave National Repeat Crosssectional Study, 2013 to 2019

Authors: Jonathan R. Olsen, Natalie Nicholls, Jenna Panter, Hannah Burnett, Michael Tornow, Richard Mitchell

Abstract:

The 20-minute neighborhood is a policy priority for governments worldwide and a key feature of this policy is providing access to natural space within 800 meters of home. The study aims were to (1) examine the association between distance to nearest natural space and frequent use over time and (2) examine whether frequent use and changes in use were patterned by income and housing tenure over time. Bi-annual Scottish Household Survey data were obtained for 2013 to 2019 (n:42128 aged 16+). Adults were asked the walking distance to their nearest natural space, the frequency of visits to this space and their housing tenure, as well as age, sex and income. We examined the association between distance from home of nearest natural space, housing tenure, and the likelihood of frequent natural space use (visited once a week or more). Two-way interaction terms were further applied to explore variation in the association between tenure and frequent natural space use over time. We found that 87% of respondents lived within 10 minute walk of a natural space, meeting the policy specification for a 20-minute neighbourhood. Greater proximity to natural space was associated with increased use; individuals living a 6 to 10 minute walk and over 10 minute walk were respectively 53% and 78% less likely to report frequent natural space use than those living within a 5 minute walk. Housing tenure was an important predictor of frequent natural space use; private renters and homeowners were more likely to report frequent natural space use than social renters. Our findings provide evidence that proximity to natural space is a strong predictor of frequent use. Our study provides important evidence that time-based access measures alone do not consider deep-rooted socioeconomic variation in use of Natural space. Policy makers should ensure a nuanced lens is applied to operationalising and monitoring the 20-minute neighbourhood to safeguard against exacerbating existing inequalities.

Keywords: natural space, housing, inequalities, 20-minute neighbourhood, urban design

Procedia PDF Downloads 83
265 Teaching Tools for Web Processing Services

Authors: Rashid Javed, Hardy Lehmkuehler, Franz Josef-Behr

Abstract:

Web Processing Services (WPS) have up growing concern in geoinformation research. However, teaching about them is difficult because of the generally complex circumstances of their use. They limit the possibilities for hands- on- exercises on Web Processing Services. To support understanding however a Training Tools Collection was brought on the way at University of Applied Sciences Stuttgart (HFT). It is limited to the scope of Geostatistical Interpolation of sample point data where different algorithms can be used like IDW, Nearest Neighbor etc. The Tools Collection aims to support understanding of the scope, definition and deployment of Web Processing Services. For example it is necessary to characterize the input of Interpolation by the data set, the parameters for the algorithm and the interpolation results (here a grid of interpolated values is assumed). This paper reports on first experiences using a pilot installation. This was intended to find suitable software interfaces for later full implementations and conclude on potential user interface characteristics. Experiences were made with Deegree software, one of several Services Suites (Collections). Being strictly programmed in Java, Deegree offers several OGC compliant Service Implementations that also promise to be of benefit for the project. The mentioned parameters for a WPS were formalized following the paradigm that any meaningful component will be defined in terms of suitable standards. E.g. the data output can be defined as a GML file. But, the choice of meaningful information pieces and user interactions is not free but partially determined by the selected WPS Processing Suite.

Keywords: deegree, interpolation, IDW, web processing service (WPS)

Procedia PDF Downloads 330
264 Intelligent Recognition of Diabetes Disease via FCM Based Attribute Weighting

Authors: Kemal Polat

Abstract:

In this paper, an attribute weighting method called fuzzy C-means clustering based attribute weighting (FCMAW) for classification of Diabetes disease dataset has been used. The aims of this study are to reduce the variance within attributes of diabetes dataset and to improve the classification accuracy of classifier algorithm transforming from non-linear separable datasets to linearly separable datasets. Pima Indians Diabetes dataset has two classes including normal subjects (500 instances) and diabetes subjects (268 instances). Fuzzy C-means clustering is an improved version of K-means clustering method and is one of most used clustering methods in data mining and machine learning applications. In this study, as the first stage, fuzzy C-means clustering process has been used for finding the centers of attributes in Pima Indians diabetes dataset and then weighted the dataset according to the ratios of the means of attributes to centers of theirs. Secondly, after weighting process, the classifier algorithms including support vector machine (SVM) and k-NN (k- nearest neighbor) classifiers have been used for classifying weighted Pima Indians diabetes dataset. Experimental results show that the proposed attribute weighting method (FCMAW) has obtained very promising results in the classification of Pima Indians diabetes dataset.

Keywords: fuzzy C-means clustering, fuzzy C-means clustering based attribute weighting, Pima Indians diabetes, SVM

Procedia PDF Downloads 382
263 Development of an EEG-Based Real-Time Emotion Recognition System on Edge AI

Authors: James Rigor Camacho, Wansu Lim

Abstract:

Over the last few years, the development of new wearable and processing technologies has accelerated in order to harness physiological data such as electroencephalograms (EEGs) for EEG-based applications. EEG has been demonstrated to be a source of emotion recognition signals with the highest classification accuracy among physiological signals. However, when emotion recognition systems are used for real-time classification, the training unit is frequently left to run offline or in the cloud rather than working locally on the edge. That strategy has hampered research, and the full potential of using an edge AI device has yet to be realized. Edge AI devices are computers with high performance that can process complex algorithms. It is capable of collecting, processing, and storing data on its own. It can also analyze and apply complicated algorithms like localization, detection, and recognition on a real-time application, making it a powerful embedded device. The NVIDIA Jetson series, specifically the Jetson Nano device, was used in the implementation. The cEEGrid, which is integrated to the open-source brain computer-interface platform (OpenBCI), is used to collect EEG signals. An EEG-based real-time emotion recognition system on Edge AI is proposed in this paper. To perform graphical spectrogram categorization of EEG signals and to predict emotional states based on input data properties, machine learning-based classifiers were used. Until the emotional state was identified, the EEG signals were analyzed using the K-Nearest Neighbor (KNN) technique, which is a supervised learning system. In EEG signal processing, after each EEG signal has been received in real-time and translated from time to frequency domain, the Fast Fourier Transform (FFT) technique is utilized to observe the frequency bands in each EEG signal. To appropriately show the variance of each EEG frequency band, power density, standard deviation, and mean are calculated and employed. The next stage is to identify the features that have been chosen to predict emotion in EEG data using the K-Nearest Neighbors (KNN) technique. Arousal and valence datasets are used to train the parameters defined by the KNN technique.Because classification and recognition of specific classes, as well as emotion prediction, are conducted both online and locally on the edge, the KNN technique increased the performance of the emotion recognition system on the NVIDIA Jetson Nano. Finally, this implementation aims to bridge the research gap on cost-effective and efficient real-time emotion recognition using a resource constrained hardware device, like the NVIDIA Jetson Nano. On the cutting edge of AI, EEG-based emotion identification can be employed in applications that can rapidly expand the research and implementation industry's use.

Keywords: edge AI device, EEG, emotion recognition system, supervised learning algorithm, sensors

Procedia PDF Downloads 77
262 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: artificial neural networks, breast cancer, classifiers, cervical cancer, f-score, machine learning, precision, recall

Procedia PDF Downloads 248
261 Spatial Data Mining by Decision Trees

Authors: Sihem Oujdi, Hafida Belbachir

Abstract:

Existing methods of data mining cannot be applied on spatial data because they require spatial specificity consideration, as spatial relationships. This paper focuses on the classification with decision trees, which are one of the data mining techniques. We propose an extension of the C4.5 algorithm for spatial data, based on two different approaches Join materialization and Querying on the fly the different tables. Similar works have been done on these two main approaches, the first - Join materialization - favors the processing time in spite of memory space, whereas the second - Querying on the fly different tables- promotes memory space despite of the processing time. The modified C4.5 algorithm requires three entries tables: a target table, a neighbor table, and a spatial index join that contains the possible spatial relationship among the objects in the target table and those in the neighbor table. Thus, the proposed algorithms are applied to a spatial data pattern in the accidentology domain. A comparative study of our approach with other works of classification by spatial decision trees will be detailed.

Keywords: C4.5 algorithm, decision trees, S-CART, spatial data mining

Procedia PDF Downloads 589
260 Comparative Study Using WEKA for Red Blood Cells Classification

Authors: Jameela Ali, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy

Abstract:

Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifying the RBCs as normal, or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithm tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital-alaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively.

Keywords: K-nearest neighbors algorithm, radial basis function neural network, red blood cells, support vector machine

Procedia PDF Downloads 373
259 Analysis of Genetic Variations in Camel Breeds (Camelus dromedarius)

Authors: Yasser M. Saad, Amr A. El Hanafy, Saleh A. Alkarim, Hussein A. Almehdar, Elrashdy M. Redwan

Abstract:

Camels are substantial providers of transport, milk, sport, meat, shelter, security and capital in many countries, particularly in Saudi Arabia. Inter simple sequence repeat technique was used to detect the genetic variations among some camel breeds (Majaheim, Safra, Wadah, and Hamara). Actual number of alleles, effective number of alleles, gene diversity, Shannon’s information index and polymorphic bands were calculated for each evaluated camel breed. Neighbor-joining tree that re-constructed for evaluated these camel breeds showed that, Hamara breed is distantly related from the other evaluated camels. In addition, the polymorphic sites, haplotypes and nucleotide diversity were identified for some camelidae cox1 gene sequences (obtained from NCBI). The distance value between C. bactrianus and C. dromedarius (0.072) was relatively low. Analysis of genetic diversity is an important way for conserving Camelus dromedarius genetic resources.

Keywords: camel, genetics, ISSR, neighbor-joining

Procedia PDF Downloads 440
258 Comparing Machine Learning Estimation of Fuel Consumption of Heavy-Duty Vehicles

Authors: Victor Bodell, Lukas Ekstrom, Somayeh Aghanavesi

Abstract:

Fuel consumption (FC) is one of the key factors in determining expenses of operating a heavy-duty vehicle. A customer may therefore request an estimate of the FC of a desired vehicle. The modular design of heavy-duty vehicles allows their construction by specifying the building blocks, such as gear box, engine and chassis type. If the combination of building blocks is unprecedented, it is unfeasible to measure the FC, since this would first r equire the construction of the vehicle. This paper proposes a machine learning approach to predict FC. This study uses around 40,000 vehicles specific and o perational e nvironmental c onditions i nformation, such as road slopes and driver profiles. A ll v ehicles h ave d iesel engines and a mileage of more than 20,000 km. The data is used to investigate the accuracy of machine learning algorithms Linear regression (LR), K-nearest neighbor (KNN) and Artificial n eural n etworks (ANN) in predicting fuel consumption for heavy-duty vehicles. Performance of the algorithms is evaluated by reporting the prediction error on both simulated data and operational measurements. The performance of the algorithms is compared using nested cross-validation and statistical hypothesis testing. The statistical evaluation procedure finds that ANNs have the lowest prediction error compared to LR and KNN in estimating fuel consumption on both simulated and operational data. The models have a mean relative prediction error of 0.3% on simulated data, and 4.2% on operational data.

Keywords: artificial neural networks, fuel consumption, friedman test, machine learning, statistical hypothesis testing

Procedia PDF Downloads 145
257 Catalytic Thermodynamics of Nanocluster Adsorbates from Informational Statistical Mechanics

Authors: Forrest Kaatz, Adhemar Bultheel

Abstract:

We use an informational statistical mechanics approach to study the catalytic thermodynamics of platinum and palladium cuboctahedral nanoclusters. Nanoclusters and their adatoms are viewed as chemical graphs with a nearest neighbor adjacency matrix. We use the Morse potential to determine bond energies between cluster atoms in a coordination type calculation. We use adsorbate energies calculated from density functional theory (DFT) to study the adatom effects on the thermodynamic quantities, which are derived from a Hamiltonian. Oxygen radical and molecular adsorbates are studied on platinum clusters and hydrogen on palladium clusters. We calculate the entropy, free energy, and total energy as the coverage of adsorbates increases from bridge and hollow sites on the surface. Thermodynamic behavior versus adatom coverage is related to the structural distribution of adatoms on the nanocluster surfaces. The thermodynamic functions are characterized using a simple adsorption model, with linear trends as the coverage of adatoms increases. The data exhibits size effects for the measured thermodynamic properties with cluster diameters between 2 and 5 nm. Entropy and enthalpy calculations of Pt-O2 compare well with previous theoretical data for Pt(111)-O2, and our Pd-H results show similar trends as experimental measurements for Pd-H2 nanoclusters. Our methods are general and may be applied to wide variety of nanocluster adsorbate systems.

Keywords: catalytic thermodynamics, palladium nanocluster absorbates, platinum nanocluster absorbates, statistical mechanics

Procedia PDF Downloads 125
256 A Comparative Study for Various Techniques Using WEKA for Red Blood Cells Classification

Authors: Jameela Ali, Hamid A. Jalab, Loay E. George, Abdul Rahim Ahmad, Azizah Suliman, Karim Al-Jashamy

Abstract:

Red blood cells (RBC) are the most common types of blood cells and are the most intensively studied in cell biology. The lack of RBCs is a condition in which the amount of hemoglobin level is lower than normal and is referred to as “anemia”. Abnormalities in RBCs will affect the exchange of oxygen. This paper presents a comparative study for various techniques for classifyig the red blood cells as normal, or abnormal (anemic) using WEKA. WEKA is an open source consists of different machine learning algorithms for data mining applications. The algorithm tested are Radial Basis Function neural network, Support vector machine, and K-Nearest Neighbors algorithm. Two sets of combined features were utilized for classification of blood cells images. The first set, exclusively consist of geometrical features, was used to identify whether the tested blood cell has a spherical shape or non-spherical cells. While the second set, consist mainly of textural features was used to recognize the types of the spherical cells. We have provided an evaluation based on applying these classification methods to our RBCs image dataset which were obtained from Serdang Hospital-Malaysia, and measuring the accuracy of test results. The best achieved classification rates are 97%, 98%, and 79% for Support vector machines, Radial Basis Function neural network, and K-Nearest Neighbors algorithm respectively

Keywords: red blood cells, classification, radial basis function neural networks, suport vector machine, k-nearest neighbors algorithm

Procedia PDF Downloads 438
255 A Gene Selection Algorithm for Microarray Cancer Classification Using an Improved Particle Swarm Optimization

Authors: Arfan Ali Nagra, Tariq Shahzad, Meshal Alharbi, Khalid Masood Khan, Muhammad Mugees Asif, Taher M. Ghazal, Khmaies Ouahada

Abstract:

Gene selection is an essential step for the classification of microarray cancer data. Gene expression cancer data (DNA microarray) facilitates computing the robust and concurrent expression of various genes. Particle swarm optimization (PSO) requires simple operators and less number of parameters for tuning the model in gene selection. The selection of a prognostic gene with small redundancy is a great challenge for the researcher as there are a few complications in PSO based selection method. In this research, a new variant of PSO (Self-inertia weight adaptive PSO) has been proposed. In the proposed algorithm, SIW-APSO-ELM is explored to achieve gene selection prediction accuracies. This new algorithm balances the exploration capabilities of the improved inertia weight adaptive particle swarm optimization and the exploitation. The self-inertia weight adaptive particle swarm optimization (SIW-APSO) is used to search the solution. The SIW-APSO is updated with an evolutionary process in such a way that each particle iteratively improves its velocities and positions. The extreme learning machine (ELM) has been designed for the selection procedure. The proposed method has been to identify a number of genes in the cancer dataset. The classification algorithm contains ELM, K- centroid nearest neighbor (KCNN), and support vector machine (SVM) to attain high forecast accuracy as compared to the start-of-the-art methods on microarray cancer datasets that show the effectiveness of the proposed method.

Keywords: microarray cancer, improved PSO, ELM, SVM, evolutionary algorithms

Procedia PDF Downloads 52
254 Improving the Global Competitiveness of SMEs by Logistics Transportation Management: Case Study Chicken Meat Supply Chain

Authors: P. Vanichkobchinda

Abstract:

The Logistics Transportation techniques, Open Vehicle Routing (OVR) is an approach toward transportation cost reduction, especially for long distance pickup and delivery nodes. The outstanding characteristic of OVR is that the route starting node and ending node are not necessary the same as in typical vehicle routing problems. This advantage enables the routing to flow continuously and the vehicle does not always return to its home base. This research aims to develop a heuristic for the open vehicle routing problem with pickup and delivery under time window and loading capacity constraints to minimize the total distance. The proposed heuristic is developed based on the Insertion method, which is a simple method and suitable for the rapid calculation that allows insertion of the new additional transportation requirements along the original paths. According to the heuristic analysis, cost comparisons between the proposed heuristic and companies are using method, nearest neighbor method show that the insertion heuristic. Moreover, the proposed heuristic gave superior solutions in all types of test problems. In conclusion, the proposed heuristic can effectively and efficiently solve the open vehicle routing. The research indicates that the improvement of new transport's calculation and the open vehicle routing with "Insertion Heuristic" represent a better outcome with 34.3 percent in average. in cost savings. Moreover, the proposed heuristic gave superior solutions in all types of test problems. In conclusion, the proposed heuristic can effectively and efficiently solve the open vehicle routing.

Keywords: business competitiveness, cost reduction, SMEs, logistics transportation, VRP

Procedia PDF Downloads 659
253 Heart Ailment Prediction Using Machine Learning Methods

Authors: Abhigyan Hedau, Priya Shelke, Riddhi Mirajkar, Shreyash Chaple, Mrunali Gadekar, Himanshu Akula

Abstract:

The heart is the coordinating centre of the major endocrine glandular structure of the body, which produces hormones that profoundly affect the operations of the body, and diagnosing cardiovascular disease is a difficult but critical task. By extracting knowledge and information about the disease from patient data, data mining is a more practical technique to help doctors detect disorders. We use a variety of machine learning methods here, including logistic regression and support vector classifiers (SVC), K-nearest neighbours Classifiers (KNN), Decision Tree Classifiers, Random Forest classifiers and Gradient Boosting classifiers. These algorithms are applied to patient data containing 13 different factors to build a system that predicts heart disease in less time with more accuracy.

Keywords: logistic regression, support vector classifier, k-nearest neighbour, decision tree, random forest and gradient boosting

Procedia PDF Downloads 19
252 Omni-Modeler: Dynamic Learning for Pedestrian Redetection

Authors: Michael Karnes, Alper Yilmaz

Abstract:

This paper presents the application of the omni-modeler towards pedestrian redetection. The pedestrian redetection task creates several challenges when applying deep neural networks (DNN) due to the variety of pedestrian appearance with camera position, the variety of environmental conditions, and the specificity required to recognize one pedestrian from another. DNNs require significant training sets and are not easily adapted for changes in class appearances or changes in the set of classes held in its knowledge domain. Pedestrian redetection requires an algorithm that can actively manage its knowledge domain as individuals move in and out of the scene, as well as learn individual appearances from a few frames of a video. The Omni-Modeler is a dynamically learning few-shot visual recognition algorithm developed for tasks with limited training data availability. The Omni-Modeler adapts the knowledge domain of pre-trained deep neural networks to novel concepts with a calculated localized language encoder. The Omni-Modeler knowledge domain is generated by creating a dynamic dictionary of concept definitions, which are directly updatable as new information becomes available. Query images are identified through nearest neighbor comparison to the learned object definitions. The study presented in this paper evaluates its performance in re-identifying individuals as they move through a scene in both single-camera and multi-camera tracking applications. The results demonstrate that the Omni-Modeler shows potential for across-camera view pedestrian redetection and is highly effective for single-camera redetection with a 93% accuracy across 30 individuals using 64 example images for each individual.

Keywords: dynamic learning, few-shot learning, pedestrian redetection, visual recognition

Procedia PDF Downloads 43
251 Characterization of Coastal Solid Waste: Basis for the Development of Waste Collector

Authors: Arnold I. Malag

Abstract:

The study wants to establish the data on the characteristics of coastal solid waste in main Island of Masbate as a model for technology interventions. The research utilized the Google Maps to measure the coastal length and Fishbowl Method for area identification. The solid wastes gathered were classified as residual, non-biodegradable, recyclable wastes, and special wastes, based on the waste analysis and characterization manual of Philippine Environmental Governance Project. The wastes were evaluated by weight in kg., dimension in cm., and characteristics as floating or non-floating. Based on the dimension of coastal solid waste, the biodegradable, recyclable, residual and special waste have the average of 40.95 cm., 16.25 cm., 31.37 cm., and 0.725cm. respectively. The waste in the coastal areas is dominated by biodegradable, followed by residual, then recyclable and special wastes with the data of 0.566 kg/m, 0.533 kg/m, 0.114 kg/m and .0007 kg/m respectively. The 97.15% of solid wastes collected is characterized as “floating”, where in the sources are the nearest rivers and waterways and/or the nearest populated areas adjacent to the island. This accumulation of solid wastes can be minimized and controlled by utilizing a floating equipment.

Keywords: solid waste, coastal waste, waste characterization, waste collector

Procedia PDF Downloads 54
250 Molecular Survey and Genetic Diversity of Bartonella henselae Strains Infecting Stray Cats from Algeria

Authors: Naouelle Azzag, Nadia Haddad, Benoit Durand, Elisabeth Petit, Ali Ammouche, Bruno Chomel, Henri J. Boulouis

Abstract:

Bartonella henselae is a small, gram negative, arthropod-borne bacterium that has been shown to cause multiple clinical manifestations in humans including cat scratch disease, bacillary angiomatosis, endocarditis, and bacteremia. In this research, we report the results of a cross sectional study of Bartonella henselae bacteremia in stray cats from Algiers. Whole blood of 227 stray cats from Algiers was tested for the presence of Bartonella species by culture and for the evaluation of the genetic diversity of B. henselae strains by multi-locus variable number of tandem repeats assay (MLVA). Bacteremia prevalence was 17% and only B. henselae was identified. Type I was the predominant type (64%). MLVA typing of 259 strains from 30 bacteremic cats revealed 52 different profiles. 51 of these profiles were specific to Algerian cats/identified for the first time. 20/30 cats (67%) harbored 2 to 7 MLVA profiles simultaneously. The similarity of MLVA profiles obtained from the same cat, neighbor-joining clustering and structure-neighbor clustering showed that such a diversity likely results from two different mechanisms occurring either independently or simultaneously independent infections and genetic drift from a primary strain.

Keywords: Bartonella, cat, MLVA, genetic

Procedia PDF Downloads 120
249 Heart Rate Variability Analysis for Early Stage Prediction of Sudden Cardiac Death

Authors: Reeta Devi, Hitender Kumar Tyagi, Dinesh Kumar

Abstract:

In present scenario, cardiovascular problems are growing challenge for researchers and physiologists. As heart disease have no geographic, gender or socioeconomic specific reasons; detecting cardiac irregularities at early stage followed by quick and correct treatment is very important. Electrocardiogram is the finest tool for continuous monitoring of heart activity. Heart rate variability (HRV) is used to measure naturally occurring oscillations between consecutive cardiac cycles. Analysis of this variability is carried out using time domain, frequency domain and non-linear parameters. This paper presents HRV analysis of the online dataset for normal sinus rhythm (taken as healthy subject) and sudden cardiac death (SCD subject) using all three methods computing values for parameters like standard deviation of node to node intervals (SDNN), square root of mean of the sequences of difference between adjacent RR intervals (RMSSD), mean of R to R intervals (mean RR) in time domain, very low-frequency (VLF), low-frequency (LF), high frequency (HF) and ratio of low to high frequency (LF/HF ratio) in frequency domain and Poincare plot for non linear analysis. To differentiate HRV of healthy subject from subject died with SCD, k –nearest neighbor (k-NN) classifier has been used because of its high accuracy. Results show highly reduced values for all stated parameters for SCD subjects as compared to healthy ones. As the dataset used for SCD patients is recording of their ECG signal one hour prior to their death, it is therefore, verified with an accuracy of 95% that proposed algorithm can identify mortality risk of a patient one hour before its death. The identification of a patient’s mortality risk at such an early stage may prevent him/her meeting sudden death if in-time and right treatment is given by the doctor.

Keywords: early stage prediction, heart rate variability, linear and non-linear analysis, sudden cardiac death

Procedia PDF Downloads 317
248 A Location-Based Search Approach According to Users’ Application Scenario

Authors: Shih-Ting Yang, Chih-Yun Lin, Ming-Yu Li, Jhong-Ting Syue, Wei-Ming Huang

Abstract:

Global positioning system (GPS) has become increasing precise in recent years, and the location-based service (LBS) has developed rapidly. Take the example of finding a parking lot (such as Parking apps). The location-based service can offer immediate information about a nearby parking lot, including the information about remaining parking spaces. However, it cannot provide expected search results according to the requirement situations of users. For that reason, this paper develops a “Location-based Search Approach according to Users’ Application Scenario” according to the location-based search and demand determination to help users obtain the information consistent with their requirements. The “Location-based Search Approach based on Users’ Application Scenario” of this paper consists of one mechanism and three kernel modules. First, in the Information Pre-processing Mechanism (IPM), this paper uses the cosine theorem to categorize the locations of users. Then, in the Information Category Evaluation Module (ICEM), the kNN (k-Nearest Neighbor) is employed to classify the browsing records of users. After that, in the Information Volume Level Determination Module (IVLDM), this paper makes a comparison between the number of users’ clicking the information at different locations and the average number of users’ clicking the information at a specific location, so as to evaluate the urgency of demand; then, the two-dimensional space is used to estimate the application situations of users. For the last step, in the Location-based Search Module (LBSM), this paper compares all search results and the average number of characters of the search results, categorizes the search results with the Manhattan Distance, and selects the results according to the application scenario of users. Additionally, this paper develops a Web-based system according to the methodology to demonstrate practical application of this paper. The application scenario-based estimate and the location-based search are used to evaluate the type and abundance of the information expected by the public at specific location, so that information demanders can obtain the information consistent with their application situations at specific location.

Keywords: data mining, knowledge management, location-based service, user application scenario

Procedia PDF Downloads 86
247 Maturity Classification of Oil Palm Fresh Fruit Bunches Using Thermal Imaging Technique

Authors: Shahrzad Zolfagharnassab, Abdul Rashid Mohamed Shariff, Reza Ehsani, Hawa Ze Jaffar, Ishak Aris

Abstract:

Ripeness estimation of oil palm fresh fruit is important processes that affect the profitableness and salability of oil palm fruits. The adulthood or ripeness of the oil palm fruits influences the quality of oil palm. Conventional procedure includes physical grading of Fresh Fruit Bunches (FFB) maturity by calculating the number of loose fruits per bunch. This physical classification of oil palm FFB is costly, time consuming and the results may have human error. Hence, many researchers try to develop the methods for ascertaining the maturity of oil palm fruits and thereby, deviously the oil content of distinct palm fruits without the need for exhausting oil extraction and analysis. This research investigates the potential of infrared images (Thermal Images) as a predictor to classify the oil palm FFB ripeness. A total of 270 oil palm fresh fruit bunches from most common cultivar of oil palm bunches Nigresens according to three maturity categories: under ripe, ripe and over ripe were collected. Each sample was scanned by the thermal imaging cameras FLIR E60 and FLIR T440. The average temperature of each bunches were calculated by using image processing in FLIR Tools and FLIR ThermaCAM researcher pro 2.10 environment software. The results show that temperature content decreased from immature to over mature oil palm FFBs. An overall analysis-of-variance (ANOVA) test was proved that this predictor gave significant difference between underripe, ripe and overripe maturity categories. This shows that the temperature as predictors can be good indicators to classify oil palm FFB. Classification analysis was performed by using the temperature of the FFB as predictors through Linear Discriminant Analysis (LDA), Mahalanobis Discriminant Analysis (MDA), Artificial Neural Network (ANN) and K- Nearest Neighbor (KNN) methods. The highest overall classification accuracy was 88.2% by using Artificial Neural Network. This research proves that thermal imaging and neural network method can be used as predictors of oil palm maturity classification.

Keywords: artificial neural network, maturity classification, oil palm FFB, thermal imaging

Procedia PDF Downloads 323
246 Improving Cell Type Identification of Single Cell Data by Iterative Graph-Based Noise Filtering

Authors: Annika Stechemesser, Rachel Pounds, Emma Lucas, Chris Dawson, Julia Lipecki, Pavle Vrljicak, Jan Brosens, Sean Kehoe, Jason Yap, Lawrence Young, Sascha Ott

Abstract:

Advances in technology make it now possible to retrieve the genetic information of thousands of single cancerous cells. One of the key challenges in single cell analysis of cancerous tissue is to determine the number of different cell types and their characteristic genes within the sample to better understand the tumors and their reaction to different treatments. For this analysis to be possible, it is crucial to filter out background noise as it can severely blur the downstream analysis and give misleading results. In-depth analysis of the state-of-the-art filtering methods for single cell data showed that they do, in some cases, not separate noisy and normal cells sufficiently. We introduced an algorithm that filters and clusters single cell data simultaneously without relying on certain genes or thresholds chosen by eye. It detects communities in a Shared Nearest Neighbor similarity network, which captures the similarities and dissimilarities of the cells by optimizing the modularity and then identifies and removes vertices with a weak clustering belonging. This strategy is based on the fact that noisy data instances are very likely to be similar to true cell types but do not match any of these wells. Once the clustering is complete, we apply a set of evaluation metrics on the cluster level and accept or reject clusters based on the outcome. The performance of our algorithm was tested on three datasets and led to convincing results. We were able to replicate the results on a Peripheral Blood Mononuclear Cells dataset. Furthermore, we applied the algorithm to two samples of ovarian cancer from the same patient before and after chemotherapy. Comparing the standard approach to our algorithm, we found a hidden cell type in the ovarian postchemotherapy data with interesting marker genes that are potentially relevant for medical research.

Keywords: cancer research, graph theory, machine learning, single cell analysis

Procedia PDF Downloads 78
245 Principal Component Analysis Combined Machine Learning Techniques on Pharmaceutical Samples by Laser Induced Breakdown Spectroscopy

Authors: Kemal Efe Eseller, Göktuğ Yazici

Abstract:

Laser-induced breakdown spectroscopy (LIBS) is a rapid optical atomic emission spectroscopy which is used for material identification and analysis with the advantages of in-situ analysis, elimination of intensive sample preparation, and micro-destructive properties for the material to be tested. LIBS delivers short pulses of laser beams onto the material in order to create plasma by excitation of the material to a certain threshold. The plasma characteristics, which consist of wavelength value and intensity amplitude, depends on the material and the experiment’s environment. In the present work, medicine samples’ spectrum profiles were obtained via LIBS. Medicine samples’ datasets include two different concentrations for both paracetamol based medicines, namely Aferin and Parafon. The spectrum data of the samples were preprocessed via filling outliers based on quartiles, smoothing spectra to eliminate noise and normalizing both wavelength and intensity axis. Statistical information was obtained and principal component analysis (PCA) was incorporated to both the preprocessed and raw datasets. The machine learning models were set based on two different train-test splits, which were 70% training – 30% test and 80% training – 20% test. Cross-validation was preferred to protect the models against overfitting; thus the sample amount is small. The machine learning results of preprocessed and raw datasets were subjected to comparison for both splits. This is the first time that all supervised machine learning classification algorithms; consisting of Decision Trees, Discriminant, naïve Bayes, Support Vector Machines (SVM), k-NN(k-Nearest Neighbor) Ensemble Learning and Neural Network algorithms; were incorporated to LIBS data of paracetamol based pharmaceutical samples, and their different concentrations on preprocessed and raw dataset in order to observe the effect of preprocessing.

Keywords: machine learning, laser-induced breakdown spectroscopy, medicines, principal component analysis, preprocessing

Procedia PDF Downloads 65
244 The Optimum Mel-Frequency Cepstral Coefficients (MFCCs) Contribution to Iranian Traditional Music Genre Classification by Instrumental Features

Authors: M. Abbasi Layegh, S. Haghipour, K. Athari, R. Khosravi, M. Tafkikialamdari

Abstract:

An approach to find the optimum mel-frequency cepstral coefficients (MFCCs) for the Radif of Mirzâ Ábdollâh, which is the principal emblem and the heart of Persian music, performed by most famous Iranian masters on two Iranian stringed instruments ‘Tar’ and ‘Setar’ is proposed. While investigating the variance of MFCC for each record in themusic database of 1500 gushe of the repertoire belonging to 12 modal systems (dastgâh and âvâz), we have applied the Fuzzy C-Mean clustering algorithm on each of the 12 coefficient and different combinations of those coefficients. We have applied the same experiment while increasing the number of coefficients but the clustering accuracy remained the same. Therefore, we can conclude that the first 7 MFCCs (V-7MFCC) are enough for classification of The Radif of Mirzâ Ábdollâh. Classical machine learning algorithms such as MLP neural networks, K-Nearest Neighbors (KNN), Gaussian Mixture Model (GMM), Hidden Markov Model (HMM) and Support Vector Machine (SVM) have been employed. Finally, it can be realized that SVM shows a better performance in this study.

Keywords: radif of Mirzâ Ábdollâh, Gushe, mel frequency cepstral coefficients, fuzzy c-mean clustering algorithm, k-nearest neighbors (KNN), gaussian mixture model (GMM), hidden markov model (HMM), support vector machine (SVM)

Procedia PDF Downloads 414
243 The Link between Corporate Governance and EU Competition Law Enforcement: A Conditional Logistic Regression Analysis of the Role of Diversity, Independence and Corporate Social Responsibility

Authors: Jeroen De Ceuster

Abstract:

This study is the first empirical analysis of the link between corporate governance and European Union competition law. Although competition law enforcement is often studied through the lens of competition law, we offer an alternative perspective by looking at a number of corporate governance factor at the level of the board of directors. We find that undertakings where the Chief Executive Officer is also chairman of the board are twice as likely to violate European Union competition law. No significant relationship was found between European Union competition law infringements and gender diversity of the board, the size of the board, the percentage of directors appointed after the Chief Executive Officer, the percentage of independent directors, or the presence of corporate social responsibility (CSR) committee. This contribution is based on a 1-1 matched peer study. Our sample includes all ultimate parent companies with a board that have been sanctioned by the European Commission for either anticompetitive agreements or abuse of dominance for the period from 2004 to 2018. These companies were matched to a company with headquarters in the same country, belongs to the same industry group, is active in the European Economic Area, and is the nearest neighbor to the infringing company in terms of revenue. Our final sample includes 121 pairs. As is common with matched peer studies, we use CLR to analyze the differences within these pairs. The only statistically significant independent variable after controlling for size and performance is CEO/Chair duality. The results indicate that companies whose Chief Executive Officer also functions as chairman of the board are twice as likely to infringe European Union competition law. This is in line with the monitoring theory of the board of directors, which states that its primary function is to monitor top management. Since competition law infringements are mostly organized by management and hidden from board directors, the results suggest that a Chief Executive Officer who is also chairman is more likely to be either complicit in the infringement or less critical towards his day-to-day colleagues and thus impedes proper detection by the board of competition law infringements.

Keywords: corporate governance, competition law, board of directors, board independence, ender diversity, corporate social responisbility

Procedia PDF Downloads 102
242 Fraud Detection in Credit Cards with Machine Learning

Authors: Anjali Chouksey, Riya Nimje, Jahanvi Saraf

Abstract:

Online transactions have increased dramatically in this new ‘social-distancing’ era. With online transactions, Fraud in online payments has also increased significantly. Frauds are a significant problem in various industries like insurance companies, baking, etc. These frauds include leaking sensitive information related to the credit card, which can be easily misused. Due to the government also pushing online transactions, E-commerce is on a boom. But due to increasing frauds in online payments, these E-commerce industries are suffering a great loss of trust from their customers. These companies are finding credit card fraud to be a big problem. People have started using online payment options and thus are becoming easy targets of credit card fraud. In this research paper, we will be discussing machine learning algorithms. We have used a decision tree, XGBOOST, k-nearest neighbour, logistic-regression, random forest, and SVM on a dataset in which there are transactions done online mode using credit cards. We will test all these algorithms for detecting fraud cases using the confusion matrix, F1 score, and calculating the accuracy score for each model to identify which algorithm can be used in detecting frauds.

Keywords: machine learning, fraud detection, artificial intelligence, decision tree, k nearest neighbour, random forest, XGBOOST, logistic regression, support vector machine

Procedia PDF Downloads 117
241 Optimal Management of Forest Stands under Wind Risk in Czech Republic

Authors: Zohreh Mohammadi, Jan Kaspar, Peter Lohmander, Robert Marusak, Harald Vacik, Ljusk Ola Eriksson

Abstract:

Storms are important damaging agents in European forest ecosystems. In the latest decades, significant economic losses in European forestry occurred due to storms. This study investigates the problem of optimal harvest planning when forest stands risk to be felled by storms. One of the most applicable mathematical methods which are being used to optimize forest management is stochastic dynamic programming (SDP). This method belongs to the adaptive optimization class. Sequential decisions, such as harvest decisions, can be optimized based on sequential information about events that cannot be perfectly predicted, such as the future storms and the future states of wind protection from other forest stands. In this paper, stochastic dynamic programming is used to maximize the expected present value of the profits from an area consisting of several forest stands. The region of analysis is the Czech Republic. The harvest decisions, in a particular time period, should be simultaneously taken in all neighbor stands. The reason is that different stands protect each other from possible winds. The optimal harvest age of a particular stand is a function of wind speed and different wind protection effects. The optimal harvest age often decreases with wind speed, but it cannot be determined for one stand at a time. When we consider a particular stand, this stand also protects other stands. Furthermore, the particular stand is protected by neighbor stands. In some forest stands, it may even be rational to increase the harvest age under the influence of stronger winds, in order to protect more valuable stands in the neighborhood. It is important to integrate wind risk in forestry decision-making.

Keywords: Czech republic, forest stands, stochastic dynamic programming, wind risk

Procedia PDF Downloads 112
240 A Machine Learning Approach for Earthquake Prediction in Various Zones Based on Solar Activity

Authors: Viacheslav Shkuratskyy, Aminu Bello Usman, Michael O’Dea, Saifur Rahman Sabuj

Abstract:

This paper examines relationships between solar activity and earthquakes; it applied machine learning techniques: K-nearest neighbour, support vector regression, random forest regression, and long short-term memory network. Data from the SILSO World Data Center, the NOAA National Center, the GOES satellite, NASA OMNIWeb, and the United States Geological Survey were used for the experiment. The 23rd and 24th solar cycles, daily sunspot number, solar wind velocity, proton density, and proton temperature were all included in the dataset. The study also examined sunspots, solar wind, and solar flares, which all reflect solar activity and earthquake frequency distribution by magnitude and depth. The findings showed that the long short-term memory network model predicts earthquakes more correctly than the other models applied in the study, and solar activity is more likely to affect earthquakes of lower magnitude and shallow depth than earthquakes of magnitude 5.5 or larger with intermediate depth and deep depth.

Keywords: k-nearest neighbour, support vector regression, random forest regression, long short-term memory network, earthquakes, solar activity, sunspot number, solar wind, solar flares

Procedia PDF Downloads 36
239 Accumulation of Trace Metals in Leaf Vegetables Cultivated in High Traffic Areas in Ghent, Belgium

Authors: Veronique Troch, Wouter Van der Borght, Véronique De Bleeker, Bram Marynissen, Nathan Van der Eecken, Gijs Du Laing

Abstract:

Among the challenges associated with increased urban food production are health risks from food contamination, due to the higher pollution loads in urban areas, compared to rural sites. Therefore, the risks posed by industrial or traffic pollution of locally grown food, was defined as one of five high-priority issues of urban agriculture requiring further investigation. The impact of air pollution on urban horticulture is the subject of this study. More particular, this study focuses on the atmospheric deposition of trace metals on leaf vegetables cultivated in the city of Ghent, Belgium. Ghent is a particularly interesting study site as it actively promotes urban agriculture. Plants accumulate heavy metals by absorption from contaminated soils and through deposition on parts exposed to polluted air. Accumulation of trace metals in vegetation grown near roads has been shown to be significantly higher than those grown in rural areas due to traffic-related contaminants in the air. Studies of vegetables demonstrated, that the uptake and accumulation of trace metals differed among crop type, species, and among plant parts. Studies on vegetables and fruit trees in Berlin, Germany, revealed significant differences in trace metal concentrations depending on local traffic, crop species, planting style and parameters related to barriers between sampling site and neighboring roads. This study aims to supplement this scarce research on heavy metal accumulation in urban horticulture. Samples from leaf vegetables were collected from different sites, including allotment gardens, in Ghent. Trace metal contents on these leaf vegetables were analyzed by ICP-MS (inductively coupled plasma mass spectrometry). In addition, precipitation on each sampling site was collected by NILU-type bulk collectors and similarly analyzed for trace metals. On one sampling site, different parameters which might influence trace metal content in leaf vegetables were analyzed in detail. These parameters are distance of planting site to the nearest road, barriers between planting site and nearest road, and type of leaf vegetable. For comparison, a rural site, located farther from city traffic and industrial pollution, was included in this study. Preliminary results show that there is a high correlation between trace metal content in the atmospheric deposition and trace metal content in leaf vegetables. Moreover, a significant higher Pb, Cu and Fe concentration was found on spinach collected from Ghent, compared to spinach collected from a rural site. The distance of planting site to the nearest road significantly affected the accumulation of Pb, Cu, Mo and Fe on spinach. Concentrations of those elements on spinach increased with decreasing distance between planting site and the nearest road. Preliminary results did not show a significant effect of barriers between planting site and the nearest road on accumulation of trace metals on leaf vegetables. The overall goal of this study is to complete and refine existing guidelines for urban gardening to exclude potential health risks from food contamination. Accordingly, this information can help city governments and civil society in the professionalization and sustainable development of urban agriculture.

Keywords: atmospheric deposition, leaf vegetables, trace metals, traffic pollution, urban agriculture

Procedia PDF Downloads 209
238 A Biophysical Model of CRISPR/Cas9 on- and off-Target Binding for Rational Design of Guide RNAs

Authors: Iman Farasat, Howard M. Salis

Abstract:

The CRISPR/Cas9 system has revolutionized genome engineering by enabling site-directed and high-throughput genome editing, genome insertion, and gene knockdowns in several species, including bacteria, yeast, flies, worms, and human cell lines. This technology has the potential to enable human gene therapy to treat genetic diseases and cancer at the molecular level; however, the current CRISPR/Cas9 system suffers from seemingly sporadic off-target genome mutagenesis that prevents its use in gene therapy. A comprehensive mechanistic model that explains how the CRISPR/Cas9 functions would enable the rational design of the guide-RNAs responsible for target site selection while minimizing unexpected genome mutagenesis. Here, we present the first quantitative model of the CRISPR/Cas9 genome mutagenesis system that predicts how guide-RNA sequences (crRNAs) control target site selection and cleavage activity. We used statistical thermodynamics and law of mass action to develop a five-step biophysical model of cas9 cleavage, and examined it in vivo and in vitro. To predict a crRNA's binding specificities and cleavage rates, we then compiled a nearest neighbor (NN) energy model that accounts for all possible base pairings and mismatches between the crRNA and the possible genomic DNA sites. These calculations correctly predicted crRNA specificity across 5518 sites. Our analysis reveals that cas9 activity and specificity are anti-correlated, and, the trade-off between them is the determining factor in performing an RNA-mediated cleavage with minimal off-targets. To find an optimal solution, we first created a scheme of safe-design criteria for Cas9 target selection by systematic analysis of available high throughput measurements. We then used our biophysical model to determine the optimal Cas9 expression levels and timing that maximizes on-target cleavage and minimizes off-target activity. We successfully applied this approach in bacterial and mammalian cell lines to reduce off-target activity to near background mutagenesis level while maintaining high on-target cleavage rate.

Keywords: biophysical model, CRISPR, Cas9, genome editing

Procedia PDF Downloads 377
237 Application of Rapidly Exploring Random Tree Star-Smart and G2 Quintic Pythagorean Hodograph Curves to the UAV Path Planning Problem

Authors: Luiz G. Véras, Felipe L. Medeiros, Lamartine F. Guimarães

Abstract:

This work approaches the automatic planning of paths for Unmanned Aerial Vehicles (UAVs) through the application of the Rapidly Exploring Random Tree Star-Smart (RRT*-Smart) algorithm. RRT*-Smart is a sampling process of positions of a navigation environment through a tree-type graph. The algorithm consists of randomly expanding a tree from an initial position (root node) until one of its branches reaches the final position of the path to be planned. The algorithm ensures the planning of the shortest path, considering the number of iterations tending to infinity. When a new node is inserted into the tree, each neighbor node of the new node is connected to it, if and only if the extension of the path between the root node and that neighbor node, with this new connection, is less than the current extension of the path between those two nodes. RRT*-smart uses an intelligent sampling strategy to plan less extensive routes by spending a smaller number of iterations. This strategy is based on the creation of samples/nodes near to the convex vertices of the navigation environment obstacles. The planned paths are smoothed through the application of the method called quintic pythagorean hodograph curves. The smoothing process converts a route into a dynamically-viable one based on the kinematic constraints of the vehicle. This smoothing method models the hodograph components of a curve with polynomials that obey the Pythagorean Theorem. Its advantage is that the obtained structure allows computation of the curve length in an exact way, without the need for quadratural techniques for the resolution of integrals.

Keywords: path planning, path smoothing, Pythagorean hodograph curve, RRT*-Smart

Procedia PDF Downloads 144