Search results for: K- Nearest neighborhood classifier
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 516

Search results for: K- Nearest neighborhood classifier

6 Perceptual Framework for a Modern Left-Turn Collision Warning System

Authors: E. Dabbour, S. M. Easa

Abstract:

Most of the collision warning systems currently available in the automotive market are mainly designed to warn against imminent rear-end and lane-changing collisions. No collision warning system is commercially available to warn against imminent turning collisions at intersections, especially for left-turn collisions when a driver attempts to make a left-turn at either a signalized or non-signalized intersection, conflicting with the path of other approaching vehicles traveling on the opposite-direction traffic stream. One of the major factors that lead to left-turn collisions is the human error and misjudgment of the driver of the turning vehicle when perceiving the speed and acceleration of other vehicles traveling on the opposite-direction traffic stream; therefore, using a properly-designed collision warning system will likely reduce, or even eliminate, this type of collisions by reducing human error. This paper introduces perceptual framework for a proposed collision warning system that can detect imminent left-turn collisions at intersections. The system utilizes a commercially-available detection sensor (either a radar sensor or a laser detector) to detect approaching vehicles traveling on the opposite-direction traffic stream and calculate their speeds and acceleration rates to estimate the time-tocollision and compare that time to the time required for the turning vehicle to clear the intersection. When calculating the time required for the turning vehicle to clear the intersection, consideration is given to the perception-reaction time of the driver of the turning vehicle, which is the time required by the driver to perceive the message given by the warning system and react to it by engaging the throttle. A regression model was developed to estimate perception-reaction time based on age and gender of the driver of the host vehicle. Desired acceleration rate selected by the driver of the turning vehicle, when making the left-turn movement, is another human factor that is considered by the system. Another regression model was developed to estimate the acceleration rate selected by the driver of the turning vehicle based on driver-s age and gender as well as on the location and speed of the nearest approaching vehicle along with the maximum acceleration rate provided by the mechanical characteristics of the turning vehicle. By comparing time-to-collision with the time required for the turning vehicle to clear the intersection, the system displays a message to the driver of the turning vehicle when departure is safe. An application example is provided to illustrate the logic algorithm of the proposed system.

Keywords: Collision warning systems, intelligent transportationsystems, vehicle safety.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2033
5 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings

Authors: G. Candel, D. Naccache

Abstract:

t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embedding. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic, and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n2) to O(n2/k), and the memory requirement from n2 to 2(n/k)2 which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.

Keywords: Concept drift, data visualization, dimension reduction, embedding, monitoring, reusability, t-SNE, unsupervised learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 469
4 Model-Driven and Data-Driven Approaches for Crop Yield Prediction: Analysis and Comparison

Authors: Xiangtuo Chen, Paul-Henry Cournéde

Abstract:

Crop yield prediction is a paramount issue in agriculture. The main idea of this paper is to find out efficient way to predict the yield of corn based meteorological records. The prediction models used in this paper can be classified into model-driven approaches and data-driven approaches, according to the different modeling methodologies. The model-driven approaches are based on crop mechanistic modeling. They describe crop growth in interaction with their environment as dynamical systems. But the calibration process of the dynamic system comes up with much difficulty, because it turns out to be a multidimensional non-convex optimization problem. An original contribution of this paper is to propose a statistical methodology, Multi-Scenarios Parameters Estimation (MSPE), for the parametrization of potentially complex mechanistic models from a new type of datasets (climatic data, final yield in many situations). It is tested with CORNFLO, a crop model for maize growth. On the other hand, the data-driven approach for yield prediction is free of the complex biophysical process. But it has some strict requirements about the dataset. A second contribution of the paper is the comparison of these model-driven methods with classical data-driven methods. For this purpose, we consider two classes of regression methods, methods derived from linear regression (Ridge and Lasso Regression, Principal Components Regression or Partial Least Squares Regression) and machine learning methods (Random Forest, k-Nearest Neighbor, Artificial Neural Network and SVM regression). The dataset consists of 720 records of corn yield at county scale provided by the United States Department of Agriculture (USDA) and the associated climatic data. A 5-folds cross-validation process and two accuracy metrics: root mean square error of prediction(RMSEP), mean absolute error of prediction(MAEP) were used to evaluate the crop prediction capacity. The results show that among the data-driven approaches, Random Forest is the most robust and generally achieves the best prediction error (MAEP 4.27%). It also outperforms our model-driven approach (MAEP 6.11%). However, the method to calibrate the mechanistic model from dataset easy to access offers several side-perspectives. The mechanistic model can potentially help to underline the stresses suffered by the crop or to identify the biological parameters of interest for breeding purposes. For this reason, an interesting perspective is to combine these two types of approaches.

Keywords: Crop yield prediction, crop model, sensitivity analysis, paramater estimation, particle swarm optimization, random forest.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1159
3 Towards End-To-End Disease Prediction from Raw Metagenomic Data

Authors: Maxence Queyrel, Edi Prifti, Alexandre Templier, Jean-Daniel Zucker

Abstract:

Analysis of the human microbiome using metagenomic sequencing data has demonstrated high ability in discriminating various human diseases. Raw metagenomic sequencing data require multiple complex and computationally heavy bioinformatics steps prior to data analysis. Such data contain millions of short sequences read from the fragmented DNA sequences and stored as fastq files. Conventional processing pipelines consist in multiple steps including quality control, filtering, alignment of sequences against genomic catalogs (genes, species, taxonomic levels, functional pathways, etc.). These pipelines are complex to use, time consuming and rely on a large number of parameters that often provide variability and impact the estimation of the microbiome elements. Training Deep Neural Networks directly from raw sequencing data is a promising approach to bypass some of the challenges associated with mainstream bioinformatics pipelines. Most of these methods use the concept of word and sentence embeddings that create a meaningful and numerical representation of DNA sequences, while extracting features and reducing the dimensionality of the data. In this paper we present an end-to-end approach that classifies patients into disease groups directly from raw metagenomic reads: metagenome2vec. This approach is composed of four steps (i) generating a vocabulary of k-mers and learning their numerical embeddings; (ii) learning DNA sequence (read) embeddings; (iii) identifying the genome from which the sequence is most likely to come and (iv) training a multiple instance learning classifier which predicts the phenotype based on the vector representation of the raw data. An attention mechanism is applied in the network so that the model can be interpreted, assigning a weight to the influence of the prediction for each genome. Using two public real-life data-sets as well a simulated one, we demonstrated that this original approach reaches high performance, comparable with the state-of-the-art methods applied directly on processed data though mainstream bioinformatics workflows. These results are encouraging for this proof of concept work. We believe that with further dedication, the DNN models have the potential to surpass mainstream bioinformatics workflows in disease classification tasks.

Keywords: Metagenomics, phenotype prediction, deep learning, embeddings, multiple instance learning.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 879
2 Modeling Engagement with Multimodal Multisensor Data: The Continuous Performance Test as an Objective Tool to Track Flow

Authors: Mohammad H. Taheri, David J. Brown, Nasser Sherkat

Abstract:

Engagement is one of the most important factors in determining successful outcomes and deep learning in students. Existing approaches to detect student engagement involve periodic human observations that are subject to inter-rater reliability. Our solution uses real-time multimodal multisensor data labeled by objective performance outcomes to infer the engagement of students. The study involves four students with a combined diagnosis of cerebral palsy and a learning disability who took part in a 3-month trial over 59 sessions. Multimodal multisensor data were collected while they participated in a continuous performance test. Eye gaze, electroencephalogram, body pose, and interaction data were used to create a model of student engagement through objective labeling from the continuous performance test outcomes. In order to achieve this, a type of continuous performance test is introduced, the Seek-X type. Nine features were extracted including high-level handpicked compound features. Using leave-one-out cross-validation, a series of different machine learning approaches were evaluated. Overall, the random forest classification approach achieved the best classification results. Using random forest, 93.3% classification for engagement and 42.9% accuracy for disengagement were achieved. We compared these results to outcomes from different models: AdaBoost, decision tree, k-Nearest Neighbor, naïve Bayes, neural network, and support vector machine. We showed that using a multisensor approach achieved higher accuracy than using features from any reduced set of sensors. We found that using high-level handpicked features can improve the classification accuracy in every sensor mode. Our approach is robust to both sensor fallout and occlusions. The single most important sensor feature to the classification of engagement and distraction was shown to be eye gaze. It has been shown that we can accurately predict the level of engagement of students with learning disabilities in a real-time approach that is not subject to inter-rater reliability, human observation or reliant on a single mode of sensor input. This will help teachers design interventions for a heterogeneous group of students, where teachers cannot possibly attend to each of their individual needs. Our approach can be used to identify those with the greatest learning challenges so that all students are supported to reach their full potential.

Keywords: Affective computing in education, affect detection, continuous performance test, engagement, flow, HCI, interaction, learning disabilities, machine learning, multimodal, multisensor, physiological sensors, Signal Detection Theory, student engagement.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1234
1 Socio-Economic Insight of the Secondary Housing Market in Colombo Suburbs: Seller’s Point of Views

Authors: R. G. Ariyawansa, M. A. N. R. M. Perera

Abstract:

“House” is a powerful symbol of socio-economic background of individuals and families. In fact, housing provides all types of needs/wants from basic needs to self-actualization needs. This phenomenon can be realized only having analyzed hidden motives of buyers and sellers of the housing market. Hence, the aim of this study is to examine the socio-economic insight of the secondary housing market in Colombo suburbs. This broader aim was achieved via analyzing the general pattern of the secondary housing market, identifying socio-economic motives of sellers of the secondary housing market, and reviewing sellers’ experience of buyer behavior. A purposive sample of 50 sellers from popular residential areas in Colombo such as Maharagama, Kottawa, Piliyandala, Punnipitiya, and Nugegoda was used to collect primary data instead of relevant secondary data from published and unpublished reports. The sample was limited to selling price ranging from Rs15 million to Rs25 million, which apparently falls into middle and upper-middle income houses in the context. Participatory observation and semi-structured interviews were adopted as key data collection tools. Data were descriptively analyzed. This study found that the market is mainly handled by informal agents who are unqualified and unorganized. People such as taxi/tree-wheel drivers, boutique venders, security personals etc. are engaged in housing brokerage as a part time career. Few fulltime and formally organized agents were found but they were also not professionally qualified. As far as housing quality is concerned, it was observed that 90% of houses was poorly maintained and illegally modified. They are situated in poorly maintained neighborhoods as well. Among the observed houses, 2% was moderately maintained and 8% was well maintained and modified. Major socio-economic motives of sellers were “migrating foreign countries for education and employment” (80% and 10% respectively), “family problems” (4%), and “social status” (3%). Other motives were “health” and “environmental/neighborhood problems” (3%). This study further noted that the secondary middle income housing market in the area directly related with the migrants who motivated for education in foreign countries, mainly Australia, UK and USA. As per the literature, families motivated for education tend to migrate Colombo suburbs from remote areas of the country. They are seeking temporary accommodation in lower middle income housing. However, the secondary middle income housing market relates with the migration from Colombo to major global cities. Therefore, final transaction price of this market may depend on migration related dates such as university deadlines, visa and other agreements. Hence, it creates a buyers’ market lowering the selling price. Also it was revealed that the buyers tend to trust more on this market as far as the quality of construction of houses is concerned than brand new houses which are built for selling purpose.

Keywords: Informal housing market, hidden motives of buyers and sellers, secondary housing market, socio-economic insight.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 673