Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 402

Search results for: supervised leaning

342 A Supervised Learning Data Mining Approach for Object Recognition and Classification in High Resolution Satellite Data

Authors: Mais Nijim, Rama Devi Chennuboyina, Waseem Al Aqqad

Abstract:

Advances in spatial and spectral resolution of satellite images have led to tremendous growth in large image databases. The data we acquire through satellites, radars and sensors consists of important geographical information that can be used for remote sensing applications such as region planning, disaster management. Spatial data classification and object recognition are important tasks for many applications. However, classifying objects and identifying them manually from images is a difficult task. Object recognition is often considered as a classification problem, this task can be performed using machine-learning techniques. Despite of many machine-learning algorithms, the classification is done using supervised classifiers such as Support Vector Machines (SVM) as the area of interest is known. We proposed a classification method, which considers neighboring pixels in a region for feature extraction and it evaluates classifications precisely according to neighboring classes for semantic interpretation of region of interest (ROI). A dataset has been created for training and testing purpose; we generated the attributes by considering pixel intensity values and mean values of reflectance. We demonstrated the benefits of using knowledge discovery and data-mining techniques, which can be on image data for accurate information extraction and classification from high spatial resolution remote sensing imagery.

Keywords: remote sensing, object recognition, classification, data mining, waterbody identification, feature extraction

Procedia PDF Downloads 325

341 A Study for Area-level Mosquito Abundance Prediction by Using Supervised Machine Learning Point-level Predictor

Authors: Theoktisti Makridou, Konstantinos Tsaprailis, George Arvanitakis, Charalampos Kontoes

Abstract:

In the literature, the data-driven approaches for mosquito abundance prediction relaying on supervised machine learning models that get trained with historical in-situ measurements. The counterpart of this approach is once the model gets trained on pointlevel (specific x,y coordinates) measurements, the predictions of the model refer again to point-level. These point-level predictions reduce the applicability of those solutions once a lot of early warning and mitigation actions applications need predictions for an area level, such as a municipality, village, etc... In this study, we apply a data-driven predictive model, which relies on public-open satellite Earth Observation and geospatial data and gets trained with historical point-level in-Situ measurements of mosquito abundance. Then we propose a methodology to extract information from a point-level predictive model to a broader area-level prediction. Our methodology relies on the randomly spatial sampling of the area of interest (similar to the Poisson hardcore process), obtaining the EO and geomorphological information for each sample, doing the point-wise prediction for each sample, and aggregating the predictions to represent the average mosquito abundance of the area. We quantify the performance of the transformation from the pointlevel to the area-level predictions, and we analyze it in order to understand which parameters have a positive or negative impact on it. The goal of this study is to propose a methodology that predicts the mosquito abundance of a given area by relying on point-level prediction and to provide qualitative insights regarding the expected performance of the area-level prediction. We applied our methodology to historical data (of Culex pipiens) of two areas of interest (Veneto region of Italy and Central Macedonia of Greece). In both cases, the results were consistent. The mean mosquito abundance of a given area can be estimated with similar accuracy to the point-level predictor, sometimes even better. The density of the samples that we use to represent one area has a positive effect on the performance in contrast to the actual number of sampling points which is not informative at all regarding the performance without the size of the area. Additionally, we saw that the distance between the sampling points and the real in-situ measurements that were used for training did not strongly affect the performance.

Keywords: mosquito abundance, supervised machine learning, culex pipiens, spatial sampling, west nile virus, earth observation data

Procedia PDF Downloads 134

340 High-Fidelity Materials Screening with a Multi-Fidelity Graph Neural Network and Semi-Supervised Learning

Authors: Akeel A. Shah, Tong Zhang

Abstract:

Computational approaches to learning the properties of materials are commonplace, motivated by the need to screen or design materials for a given application, e.g., semiconductors and energy storage. Experimental approaches can be both time consuming and costly. Unfortunately, computational approaches such as ab-initio electronic structure calculations and classical or ab-initio molecular dynamics are themselves can be too slow for the rapid evaluation of materials, often involving thousands to hundreds of thousands of candidates. Machine learning assisted approaches have been developed to overcome the time limitations of purely physics-based approaches. These approaches, on the other hand, require large volumes of data for training (hundreds of thousands on many standard data sets such as QM7b). This means that they are limited by how quickly such a large data set of physics-based simulations can be established. At high fidelity, such as configuration interaction, composite methods such as G4, and coupled cluster theory, gathering such a large data set can become infeasible, which can compromise the accuracy of the predictions - many applications require high accuracy, for example band structures and energy levels in semiconductor materials and the energetics of charge transfer in energy storage materials. In order to circumvent this problem, multi-fidelity approaches can be adopted, for example the Δ-ML method, which learns a high-fidelity output from a low-fidelity result such as Hartree-Fock or density functional theory (DFT). The general strategy is to learn a map between the low and high fidelity outputs, so that the high-fidelity output is obtained a simple sum of the physics-based low-fidelity and correction, Although this requires a low-fidelity calculation, it typically requires far fewer high-fidelity results to learn the correction map, and furthermore, the low-fidelity result, such as Hartree-Fock or semi-empirical ZINDO, is typically quick to obtain, For high-fidelity outputs the result can be an order of magnitude or more in speed up. In this work, a new multi-fidelity approach is developed, based on a graph convolutional network (GCN) combined with semi-supervised learning. The GCN allows for the material or molecule to be represented as a graph, which is known to improve accuracy, for example SchNet and MEGNET. The graph incorporates information regarding the numbers of, types and properties of atoms; the types of bonds; and bond angles. They key to the accuracy in multi-fidelity methods, however, is the incorporation of low-fidelity output to learn the high-fidelity equivalent, in this case by learning their difference. Semi-supervised learning is employed to allow for different numbers of low and high-fidelity training points, by using an additional GCN-based low-fidelity map to predict high fidelity outputs. It is shown on 4 different data sets that a significant (at least one order of magnitude) increase in accuracy is obtained, using one to two orders of magnitude fewer low and high fidelity training points. One of the data sets is developed in this work, pertaining to 1000 simulations of quinone molecules (up to 24 atoms) at 5 different levels of fidelity, furnishing the energy, dipole moment and HOMO/LUMO.

Keywords: .materials screening, computational materials, machine learning, multi-fidelity, graph convolutional network, semi-supervised learning

Procedia PDF Downloads 12

339 Unlocking Green Hydrogen Potential: A Machine Learning-Based Assessment

Authors: Said Alshukri, Mazhar Hussain Malik

Abstract:

Green hydrogen is hydrogen produced using renewable energy sources. In the last few years, Oman aimed to reduce its dependency on fossil fuels. Recently, the hydrogen economy has become a global trend, and many countries have started to investigate the feasibility of implementing this sector. Oman created an alliance to establish the policy and rules for this sector. With motivation coming from both global and local interest in green hydrogen, this paper investigates the potential of producing hydrogen from wind and solar energies in three different locations in Oman, namely Duqm, Salalah, and Sohar. By using machine learning-based software “WEKA” and local metrological data, the project was designed to figure out which location has the highest wind and solar energy potential. First, various supervised models were tested to obtain their prediction accuracy, and it was found that the Random Forest (RF) model has the best prediction performance. The RF model was applied to 2021 metrological data for each location, and the results indicated that Duqm has the highest wind and solar energy potential. The system of one wind turbine in Duqm can produce 8335 MWh/year, which could be utilized in the water electrolysis process to produce 88847 kg of hydrogen mass, while a solar system consisting of 2820 solar cells is estimated to produce 1666.223 MWh/ year which is capable of producing 177591 kg of hydrogen mass.

Keywords: green hydrogen, machine learning, wind and solar energies, WEKA, supervised models, random forest

Procedia PDF Downloads 62

338 Deep Reinforcement Learning Approach for Trading Automation in The Stock Market

Authors: Taylan Kabbani, Ekrem Duman

Abstract:

The design of adaptive systems that take advantage of financial markets while reducing the risk can bring more stagnant wealth into the global market. However, most efforts made to generate successful deals in trading financial assets rely on Supervised Learning (SL), which suffered from various limitations. Deep Reinforcement Learning (DRL) offers to solve these drawbacks of SL approaches by combining the financial assets price "prediction" step and the "allocation" step of the portfolio in one unified process to produce fully autonomous systems capable of interacting with its environment to make optimal decisions through trial and error. In this paper, a continuous action space approach is adopted to give the trading agent the ability to gradually adjust the portfolio's positions with each time step (dynamically re-allocate investments), resulting in better agent-environment interaction and faster convergence of the learning process. In addition, the approach supports the managing of a portfolio with several assets instead of a single one. This work represents a novel DRL model to generate profitable trades in the stock market, effectively overcoming the limitations of supervised learning approaches. We formulate the trading problem, or what is referred to as The Agent Environment as Partially observed Markov Decision Process (POMDP) model, considering the constraints imposed by the stock market, such as liquidity and transaction costs. More specifically, we design an environment that simulates the real-world trading process by augmenting the state representation with ten different technical indicators and sentiment analysis of news articles for each stock. We then solve the formulated POMDP problem using the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, which can learn policies in high-dimensional and continuous action spaces like those typically found in the stock market environment. From the point of view of stock market forecasting and the intelligent decision-making mechanism, this paper demonstrates the superiority of deep reinforcement learning in financial markets over other types of machine learning such as supervised learning and proves its credibility and advantages of strategic decision-making.

Keywords: the stock market, deep reinforcement learning, MDP, twin delayed deep deterministic policy gradient, sentiment analysis, technical indicators, autonomous agent

Procedia PDF Downloads 169

337 Unsupervised Echocardiogram View Detection via Autoencoder-Based Representation Learning

Authors: Andrea Trevino Gavito, Diego Klabjan, Sanjiv Shah

Abstract:

Echocardiograms serve as pivotal resources for clinicians in diagnosing cardiac conditions, offering non-invasive insights into a heart’s structure and function. When echocardiographic studies are conducted, no standardized labeling of the acquired views is performed. Employing machine learning algorithms for automated echocardiogram view detection has emerged as a promising solution to enhance efficiency in echocardiogram use for diagnosis. However, existing approaches predominantly rely on supervised learning, necessitating labor-intensive expert labeling. In this paper, we introduce a fully unsupervised echocardiographic view detection framework that leverages convolutional autoencoders to obtain lower dimensional representations and the K-means algorithm for clustering them into view-related groups. Our approach focuses on discriminative patches from echocardiographic frames. Additionally, we propose a trainable inverse average layer to optimize the decoding of average operations. By integrating both public and proprietary datasets, we obtain a marked improvement in model performance when compared to utilizing a proprietary dataset alone. Our experiments show boosts of 15.5% in accuracy and 9.0% in the F-1 score for frame-based clustering, 25.9% in accuracy, and 19.8% in the F-1 score for view-based clustering. Our research highlights the potential of unsupervised learning methodologies and the utilization of open-sourced data in addressing the complexities of echocardiogram interpretation, paving the way for more accurate and efficient cardiac diagnoses.

Keywords: artificial intelligence, machine learning, unsupervised learning, self-supervised representation learning, echocardiography, echocardiographic view detection

Procedia PDF Downloads 7

336 A Review of Deep Learning Methods in Computer-Aided Detection and Diagnosis Systems based on Whole Mammogram and Ultrasound Scan Classification

Authors: Ian Omung'a

Abstract:

Breast cancer remains to be one of the deadliest cancers for women worldwide, with the risk of developing tumors being as high as 50 percent in Sub-Saharan African countries like Kenya. With as many as 42 percent of these cases set to be diagnosed late when cancer has metastasized and or the prognosis has become terminal, Full Field Digital [FFD] Mammography remains an effective screening technique that leads to early detection where in most cases, successful interventions can be made to control or eliminate the tumors altogether. FFD Mammograms have been proven to multiply more effective when used together with Computer-Aided Detection and Diagnosis [CADe] systems, relying on algorithmic implementations of Deep Learning techniques in Computer Vision to carry out deep pattern recognition that is comparable to the level of a human radiologist and decipher whether specific areas of interest in the mammogram scan image portray abnormalities if any and whether these abnormalities are indicative of a benign or malignant tumor. Within this paper, we review emergent Deep Learning techniques that will prove relevant to the development of State-of-The-Art FFD Mammogram CADe systems. These techniques will span self-supervised learning for context-encoded occlusion, self-supervised learning for pre-processing and labeling automation, as well as the creation of a standardized large-scale mammography dataset as a benchmark for CADe systems' evaluation. Finally, comparisons are drawn between existing practices that pre-date these techniques and how the development of CADe systems that incorporate them will be different.

Keywords: breast cancer diagnosis, computer aided detection and diagnosis, deep learning, whole mammogram classfication, ultrasound classification, computer vision

Procedia PDF Downloads 81

335 Early Gastric Cancer Prediction from Diet and Epidemiological Data Using Machine Learning in Mizoram Population

Authors: Brindha Senthil Kumar, Payel Chakraborty, Senthil Kumar Nachimuthu, Arindam Maitra, Prem Nath

Abstract:

Gastric cancer is predominantly caused by demographic and diet factors as compared to other cancer types. The aim of the study is to predict Early Gastric Cancer (ECG) from diet and lifestyle factors using supervised machine learning algorithms. For this study, 160 healthy individual and 80 cases were selected who had been followed for 3 years (2016-2019), at Civil Hospital, Aizawl, Mizoram. A dataset containing 11 features that are core risk factors for the gastric cancer were extracted. Supervised machine algorithms: Logistic Regression, Naive Bayes, Support Vector Machine (SVM), Multilayer perceptron, and Random Forest were used to analyze the dataset using Python Jupyter Notebook Version 3. The obtained classified results had been evaluated using metrics parameters: minimum_false_positives, brier_score, accuracy, precision, recall, F1_score, and Receiver Operating Characteristics (ROC) curve. Data analysis results showed Naive Bayes - 88, 0.11; Random Forest - 83, 0.16; SVM - 77, 0.22; Logistic Regression - 75, 0.25 and Multilayer perceptron - 72, 0.27 with respect to accuracy and brier_score in percent. Naive Bayes algorithm out performs with very low false positive rates as well as brier_score and good accuracy. Naive Bayes algorithm classification results in predicting ECG showed very satisfactory results using only diet cum lifestyle factors which will be very helpful for the physicians to educate the patients and public, thereby mortality of gastric cancer can be reduced/avoided with this knowledge mining work.

Keywords: Early Gastric cancer, Machine Learning, Diet, Lifestyle Characteristics

Procedia PDF Downloads 152

334 An Integrated Label Propagation Network for Structural Condition Assessment

Authors: Qingsong Xiong, Cheng Yuan, Qingzhao Kong, Haibei Xiong

Abstract:

Deep-learning-driven approaches based on vibration responses have attracted larger attention in rapid structural condition assessment while obtaining sufficient measured training data with corresponding labels is relevantly costly and even inaccessible in practical engineering. This study proposes an integrated label propagation network for structural condition assessment, which is able to diffuse the labels from continuously-generating measurements by intact structure to those of missing labels of damage scenarios. The integrated network is embedded with damage-sensitive features extraction by deep autoencoder and pseudo-labels propagation by optimized fuzzy clustering, the architecture and mechanism which are elaborated. With a sophisticated network design and specified strategies for improving performance, the present network achieves to extends the superiority of self-supervised representation learning, unsupervised fuzzy clustering and supervised classification algorithms into an integration aiming at assessing damage conditions. Both numerical simulations and full-scale laboratory shaking table tests of a two-story building structure were conducted to validate its capability of detecting post-earthquake damage. The identifying accuracy of a present network was 0.95 in numerical validations and an average 0.86 in laboratory case studies, respectively. It should be noted that the whole training procedure of all involved models in the network stringently doesn’t rely upon any labeled data of damage scenarios but only several samples of intact structure, which indicates a significant superiority in model adaptability and feasible applicability in practice.

Keywords: autoencoder, condition assessment, fuzzy clustering, label propagation

Procedia PDF Downloads 87

333 The Rise of Blue Water Navy and its Implication for the Region

Authors: Riddhi Chopra

Abstract:

Alfred Thayer Mahan described the sea as a ‘great common,’ which would serve as a medium for communication, trade, and transport. The seas of Asia are witnessing an intriguing historical anomaly – rise of an indigenous maritime power against the backdrop of US domination over the region. As China transforms from an inward leaning economy to an outward-leaning economy, it has become increasingly dependent on the global sea; as a result, we witness an evolution in its maritime strategy from near seas defense to far seas deployment strategies. It is not only patrolling the international waters but has also built a network of civilian and military infrastructure across the disputed oceanic expanse. The paper analyses the reorientation of China from a naval power to a blue water navy in an era of extensive globalisation. The actions of the Chinese have created a zone of high alert amongst its neighbors such as Japan, Philippines, Vietnam and North Korea. These nations are trying to align themselves so as to counter China’s growing brinkmanship, but China has been pursuing claims through a carefully calibrated strategy in the region shunning any coercive measures taken by other forces. If China continues to expand its maritime boundaries, its neighbors – all smaller and weaker Asian nations would be limited to a narrow band of the sea along its coastlines. Hence it is essential for the US to intervene and support its allies to offset Chinese supremacy. The paper intends to provide a profound analysis over the disputes in South China Sea and East China Sea focusing on Philippines and Japan respectively. Moreover, the paper attempts to give an account of US involvement in the region and its alignment with its South Asian allies. The geographic dynamics is said the breed a national coalition dominating the strategic ambitions of China as well as the weak littoral states. China has conducted behind the scenes diplomacy trying to persuade its neighbors to support its position on the territorial disputes. These efforts have been successful in creating fault lines in ASEAN thereby undermining regional integrity to reach a consensus on the issue. Chinese diplomatic efforts have also forced the US to revisit its foreign policy and engage with players like Cambodia and Laos. The current scenario in the SCS points to a strong Chinese hold trying to outspace all others with no regards to International law. Chinese activities are in contrast with US principles like Freedom of Navigation thereby signaling US to take bold actions to prevent Chinese hegemony in the region. The paper ultimately seeks to explore the changing power dynamics among various claimants where a rival superpower like US can pursue the traditional policy of alliance formation play a decisive role in changing the status quo in the arena, consequently determining the future trajectory.

Keywords: China, East China Sea, South China Sea, USA

Procedia PDF Downloads 229

332 Normalizing Flow to Augmented Posterior: Conditional Density Estimation with Interpretable Dimension Reduction for High Dimensional Data

Authors: Cheng Zeng, George Michailidis, Hitoshi Iyatomi, Leo L. Duan

Abstract:

The conditional density characterizes the distribution of a response variable y given other predictor x and plays a key role in many statistical tasks, including classification and outlier detection. Although there has been abundant work on the problem of Conditional Density Estimation (CDE) for a low-dimensional response in the presence of a high-dimensional predictor, little work has been done for a high-dimensional response such as images. The promising performance of normalizing flow (NF) neural networks in unconditional density estimation acts as a motivating starting point. In this work, the authors extend NF neural networks when external x is present. Specifically, they use the NF to parameterize a one-to-one transform between a high-dimensional y and a latent z that comprises two components [zₚ, zₙ]. The zₚ component is a low-dimensional subvector obtained from the posterior distribution of an elementary predictive model for x, such as logistic/linear regression. The zₙ component is a high-dimensional independent Gaussian vector, which explains the variations in y not or less related to x. Unlike existing CDE methods, the proposed approach coined Augmented Posterior CDE (AP-CDE) only requires a simple modification of the common normalizing flow framework while significantly improving the interpretation of the latent component since zₚ represents a supervised dimension reduction. In image analytics applications, AP-CDE shows good separation of 𝑥-related variations due to factors such as lighting condition and subject id from the other random variations. Further, the experiments show that an unconditional NF neural network based on an unsupervised model of z, such as a Gaussian mixture, fails to generate interpretable results.

Keywords: conditional density estimation, image generation, normalizing flow, supervised dimension reduction

Procedia PDF Downloads 81

331 The Relationship between Organization Culture and Organization Learning in Three Different Types of Companies

Authors: Mahmoud Timar, Javad Joukar Borazjani

Abstract:

A dynamic organization helps the management to overcome both internal and external uncertainties and complexities of the organization with more confidence and efficiency. Regarding this issue, in this paper, the influence of organizational culture factors over organizational learning components, which both of them are considered as important characteristics of a dynamic organization, has been studied in three subsidiary companies (production, consultation and service) of National Iranian Oil Company, and moreover we also tried to identify the most dominant culture in these three subsidiaries. Analysis of 840 received questionnaires by SPSS shows that there is a significant relationship between the components of organizational culture and organizational learning; however the rate of relationship between these two factors was different among the examined companies. By the use of Regression, it has been clarified that in the servicing company the highest relationship is between mission and learning environment, while in production division, there is a significant relationship between adaptability and learning needs satisfaction and however in consulting company the highest relationship is between involvement and applying learning in workplace.

Keywords: denison model, culture, leaning, organizational culture, organizational learning

Procedia PDF Downloads 361

330 Machine Learning Techniques for COVID-19 Detection: A Comparative Analysis

Authors: Abeer A. Aljohani

Abstract:

COVID-19 virus spread has been one of the extreme pandemics across the globe. It is also referred to as coronavirus, which is a contagious disease that continuously mutates into numerous variants. Currently, the B.1.1.529 variant labeled as omicron is detected in South Africa. The huge spread of COVID-19 disease has affected several lives and has surged exceptional pressure on the healthcare systems worldwide. Also, everyday life and the global economy have been at stake. This research aims to predict COVID-19 disease in its initial stage to reduce the death count. Machine learning (ML) is nowadays used in almost every area. Numerous COVID-19 cases have produced a huge burden on the hospitals as well as health workers. To reduce this burden, this paper predicts COVID-19 disease is based on the symptoms and medical history of the patient. This research presents a unique architecture for COVID-19 detection using ML techniques integrated with feature dimensionality reduction. This paper uses a standard UCI dataset for predicting COVID-19 disease. This dataset comprises symptoms of 5434 patients. This paper also compares several supervised ML techniques to the presented architecture. The architecture has also utilized 10-fold cross validation process for generalization and the principal component analysis (PCA) technique for feature reduction. Standard parameters are used to evaluate the proposed architecture including F1-Score, precision, accuracy, recall, receiver operating characteristic (ROC), and area under curve (AUC). The results depict that decision tree, random forest, and neural networks outperform all other state-of-the-art ML techniques. This achieved result can help effectively in identifying COVID-19 infection cases.

Keywords: supervised machine learning, COVID-19 prediction, healthcare analytics, random forest, neural network

Procedia PDF Downloads 82

329 Predicting Loss of Containment in Surface Pipeline using Computational Fluid Dynamics and Supervised Machine Learning Model to Improve Process Safety in Oil and Gas Operations

Authors: Muhammmad Riandhy Anindika Yudhy, Harry Patria, Ramadhani Santoso

Abstract:

Loss of containment is the primary hazard that process safety management is concerned within the oil and gas industry. Escalation to more serious consequences all begins with the loss of containment, starting with oil and gas release from leakage or spillage from primary containment resulting in pool fire, jet fire and even explosion when reacted with various ignition sources in the operations. Therefore, the heart of process safety management is avoiding loss of containment and mitigating its impact through the implementation of safeguards. The most effective safeguard for the case is an early detection system to alert Operations to take action prior to a potential case of loss of containment. The detection system value increases when applied to a long surface pipeline that is naturally difficult to monitor at all times and is exposed to multiple causes of loss of containment, from natural corrosion to illegal tapping. Based on prior researches and studies, detecting loss of containment accurately in the surface pipeline is difficult. The trade-off between cost-effectiveness and high accuracy has been the main issue when selecting the traditional detection method. The current best-performing method, Real-Time Transient Model (RTTM), requires analysis of closely positioned pressure, flow and temperature (PVT) points in the pipeline to be accurate. Having multiple adjacent PVT sensors along the pipeline is expensive, hence generally not a viable alternative from an economic standpoint.A conceptual approach to combine mathematical modeling using computational fluid dynamics and a supervised machine learning model has shown promising results to predict leakage in the pipeline. Mathematical modeling is used to generate simulation data where this data is used to train the leak detection and localization models. Mathematical models and simulation software have also been shown to provide comparable results with experimental data with very high levels of accuracy. While the supervised machine learning model requires a large training dataset for the development of accurate models, mathematical modeling has been shown to be able to generate the required datasets to justify the application of data analytics for the development of model-based leak detection systems for petroleum pipelines. This paper presents a review of key leak detection strategies for oil and gas pipelines, with a specific focus on crude oil applications, and presents the opportunities for the use of data analytics tools and mathematical modeling for the development of robust real-time leak detection and localization system for surface pipelines. A case study is also presented.

Keywords: pipeline, leakage, detection, AI

Procedia PDF Downloads 174

328 DNA Methylation Score Development for In utero Exposure to Paternal Smoking Using a Supervised Machine Learning Approach

Authors: Cristy Stagnar, Nina Hubig, Diana Ivankovic

Abstract:

The epigenome is a compelling candidate for mediating long-term responses to environmental effects modifying disease risk. The main goal of this research is to develop a machine learning-based DNA methylation score, which will be valuable in delineating the unique contribution of paternal epigenetic modifications to the germline impacting childhood health outcomes. It will also be a useful tool in validating self-reports of nonsmoking and in adjusting epigenome-wide DNA methylation association studies for this early-life exposure. Using secondary data from two population-based methylation profiling studies, our DNA methylation score is based on CpG DNA methylation measurements from cord blood gathered from children whose fathers smoked pre- and peri-conceptually. Each child’s mother and father fell into one of three class labels in the accompanying questionnaires -never smoker, former smoker, or current smoker. By applying different machine learning algorithms to the accessible resource for integrated epigenomic studies (ARIES) sub-study of the Avon longitudinal study of parents and children (ALSPAC) data set, which we used for training and testing of our model, the best-performing algorithm for classifying the father smoker and mother never smoker was selected based on Cohen’s κ. Error in the model was identified and optimized. The final DNA methylation score was further tested and validated in an independent data set. This resulted in a linear combination of methylation values of selected probes via a logistic link function that accurately classified each group and contributed the most towards classification. The result is a unique, robust DNA methylation score which combines information on DNA methylation and early life exposure of offspring to paternal smoking during pregnancy and which may be used to examine the paternal contribution to offspring health outcomes.

Keywords: epigenome, health outcomes, paternal preconception environmental exposures, supervised machine learning

Procedia PDF Downloads 177

327 Assessment of Rangeland Condition in a Dryland System Using UAV-Based Multispectral Imagery

Authors: Vistorina Amputu, Katja Tielboerger, Nichola Knox

Abstract:

Primary productivity in dry savannahs is constraint by moisture availability and under increasing anthropogenic pressure. Thus, considering climate change and the unprecedented pace and scale of rangeland deterioration, methods for assessing the status of such rangelands should be easy to apply, yield reliable and repeatable results that can be applied over large spatial scales. Global and local scale monitoring of rangelands through satellite data and labor-intensive field measurements respectively, are limited in accurately assessing the spatiotemporal heterogeneity of vegetation dynamics to provide crucial information that detects degradation in its early stages. Fortunately, newly emerging techniques such as unmanned aerial vehicles (UAVs), associated miniaturized sensors and improving digital photogrammetric software provide an opportunity to transcend these limitations. Yet, they have not been extensively calibrated in natural systems to encompass their complexities if they are to be integrated for long-term monitoring. Limited research using drone technology has been conducted in arid savannas, for example to assess the health status of this dynamic two-layer vegetation ecosystem. In our study, we fill this gap by testing the relationship between UAV-estimated cover of rangeland functional attributes and field data collected in discrete sample plots in a Namibian dryland savannah along a degradation gradient. The first results are based on a supervised classification performed on the ultra-high resolution multispectral imagery to distinguish between rangeland functional attributes (bare, non-woody, and woody), with a relatively good match to the field observations. Integrating UAV-based observations to improve rangeland monitoring could greatly assist in climate-adapted rangeland management.

Keywords: arid savannah, degradation gradient, field observations, narrow-band sensor, supervised classification

Procedia PDF Downloads 117

326 An Approximation Technique to Automate Tron

Authors: P. Jayashree, S. Rajkumar

Abstract:

With the trend of virtual and augmented reality environments booming to provide a life like experience, gaming is a major tool in supporting such learning environments. In this work, a variant of Voronoi heuristics, employing supervised learning for the TRON game is proposed. The paper discusses the features that would be really useful when a machine learning bot is to be used as an opponent against a human player. Various game scenarios, nature of the bot and the experimental results are provided for the proposed variant to prove that the approach is better than those that are currently followed.

Keywords: artificial Intelligence, automation, machine learning, TRON game, Voronoi heuristics

Procedia PDF Downloads 451

325 Impact of Tablet Based Learning on Continuous Assessment (ESPRIT Smart School Framework)

Authors: Mehdi Attia, Sana Ben Fadhel, Lamjed Bettaieb

Abstract:

Mobile technology has become a part of our daily lives and assist learners (despite their level and age) in their leaning process using various apparatus and mobile devices (laptop, tablets, etc.). This paper presents a new learning framework based on tablets. This solution has been developed and tested in ESPRIT “Ecole Supérieure Privée d’Igénieurie et de Technologies”, a Tunisian school of engineering. This application is named ESSF: Esprit Smart School Framework. In this work, the main features of the proposed solution are listed, particularly its impact on the learners’ evaluation process. Learner’s assessment has always been a critical component of the learning process as it measures students’ knowledge. However, traditional evaluation methods in which the learner is evaluated once or twice each year cannot reflect his real level. This is why a continuous assessment (CA) process becomes necessary. In this context we have proved that ESSF offers many important features that enhance and facilitate the implementation of the CA process.

Keywords: continuous assessment, mobile learning, tablet based learning, smart school, ESSF

Procedia PDF Downloads 317

324 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: ABET, accreditation, benchmark collection, machine learning, program educational objectives, student outcomes, supervised multi-class classification, text mining

Procedia PDF Downloads 156

323 Flood Hazard Assessment and Land Cover Dynamics of the Orai Khola Watershed, Bardiya, Nepal

Authors: Loonibha Manandhar, Rajendra Bhandari, Kumud Raj Kafle

Abstract:

Nepal’s Terai region is a part of the Ganges river basin which is one of the most disaster-prone areas of the world, with recurrent monsoon flooding causing millions in damage and the death and displacement of hundreds of people and households every year. The vulnerability of human settlements to natural disasters such as floods is increasing, and mapping changes in land use practices and hydro-geological parameters is essential in developing resilient communities and strong disaster management policies. The objective of this study was to develop a flood hazard zonation map of Orai Khola watershed and map the decadal land use/land cover dynamics of the watershed. The watershed area was delineated using SRTM DEM, and LANDSAT images were classified into five land use classes (forest, grassland, sediment and bare land, settlement area and cropland, and water body) using pixel-based semi-automated supervised maximum likelihood classification. Decadal changes in each class were then quantified using spatial modelling. Flood hazard mapping was performed by assigning weights to factors slope, rainfall distribution, distance from the river and land use/land cover on the basis of their estimated influence in causing flood hazard and performing weighed overlay analysis to identify areas that are highly vulnerable. The forest and grassland coverage increased by 11.53 km² (3.8%) and 1.43 km² (0.47%) from 1996 to 2016. The sediment and bare land areas decreased by 12.45 km² (4.12%) from 1996 to 2016 whereas settlement and cropland areas showed a consistent increase to 14.22 km² (4.7%). Waterbody coverage also increased to 0.3 km² (0.09%) from 1996-2016. 1.27% (3.65 km²) of total watershed area was categorized into very low hazard zone, 20.94% (60.31 km²) area into low hazard zone, 37.59% (108.3 km²) area into moderate hazard zone, 29.25% (84.27 km²) area into high hazard zone and 31 villages which comprised 10.95% (31.55 km²) were categorized into high hazard zone area.

Keywords: flood hazard, land use/land cover, Orai river, supervised maximum likelihood classification, weighed overlay analysis

Procedia PDF Downloads 332

322 Deleterious SNP’s Detection Using Machine Learning

Authors: Hamza Zidoum

Abstract:

This paper investigates the impact of human genetic variation on the function of human proteins using machine-learning algorithms. Single-Nucleotide Polymorphism represents the most common form of human genome variation. We focus on the single amino-acid polymorphism located in the coding region as they can affect the protein function leading to pathologic phenotypic change. We use several supervised Machine Learning methods to identify structural properties correlated with increased risk of the missense mutation being damaging. SVM associated with Principal Component Analysis give the best performance.

Keywords: single-nucleotide polymorphism, machine learning, feature selection, SVM

Procedia PDF Downloads 365

321 Developing Students’ Intercultural Understanding and Awareness through Adapting an Intercultural Pedagogy in Foreign Language Teaching

Authors: Guerriche Amina

Abstract:

The recent trends in foreign language teaching -influenced widely by the process of globalization, interculturalism, and global flows and migration- are leaning towards adopting an intercultural perspective to help in developing students who are global citizens able to effectively function across diverse boundaries (cultural, social, geographical). Researchers call for intercultural learning and teaching perspective that would foster and increase intercultural awareness and understanding (e.g., Guilherme, 2002; Byram et al., 2002). The present research aims at unfolding whether including the cultural dimension in foreign language instruction can help in developing students’ intercultural understanding and awareness. In doing so, a cultural pedagogical experiment was designed and conducted for the period of one year at the level of the university. Data were collected qualitatively and analyzed thematically. Results help in drawing important implications for educational institutions, foreign language teachers, and syllabus designers about the importance and effectiveness of perceiving foreign language instruction as a social activity that can nurture interculturally competent individuals who adequately respond to the demands of today’s intercultural and globalized societies.

Keywords: foreign language teaching, intercultural awareness, language and culture, intercultural understanding

Procedia PDF Downloads 117

320 The Concept of Neurostatistics as a Neuroscience

Authors: Igwenagu Chinelo Mercy

Abstract:

This study is on the concept of Neurostatistics in relation to neuroscience. Neuroscience also known as neurobiology is the scientific study of the nervous system. In the study of neuroscience, it has been noted that brain function and its relations to the process of acquiring knowledge and behaviour can be better explained by the use of various interrelated methods. The scope of neuroscience has broadened over time to include different approaches used to study the nervous system at different scales. On the other hand, Neurostatistics based on this study is viewed as a statistical concept that uses similar techniques of neuron mechanisms to solve some problems especially in the field of life science. This study is imperative in this era of Artificial intelligence/Machine leaning in the sense that clear understanding of the technique and its proper application could assist in solving some medical disorder that are mainly associated with the nervous system. This will also help in layman’s understanding of the technique of the nervous system in order to overcome some of the health challenges associated with it. For this concept to be well understood, an illustrative example using a brain associated disorder was used for demonstration. Structural equation modelling was adopted in the analysis. The results clearly show the link between the techniques of statistical model and nervous system. Hence, based on this study, the appropriateness of Neurostatistics application in relation to neuroscience could be based on the understanding of the behavioural pattern of both concepts.

Keywords: brain, neurons, neuroscience, neurostatistics, structural equation modeling

Procedia PDF Downloads 61

319 Identification and Classification of Fiber-Fortified Semolina by Near-Infrared Spectroscopy (NIR)

Authors: Amanda T. Badaró, Douglas F. Barbin, Sofia T. Garcia, Maria Teresa P. S. Clerici, Amanda R. Ferreira

Abstract:

Food fortification is the intentional addition of a nutrient in a food matrix and has been widely used to overcome the lack of nutrients in the diet or increasing the nutritional value of food. Fortified food must meet the demand of the population, taking into account their habits and risks that these foods may cause. Wheat and its by-products, such as semolina, has been strongly indicated to be used as a food vehicle since it is widely consumed and used in the production of other foods. These products have been strategically used to add some nutrients, such as fibers. Methods of analysis and quantification of these kinds of components are destructive and require lengthy sample preparation and analysis. Therefore, the industry has searched for faster and less invasive methods, such as Near-Infrared Spectroscopy (NIR). NIR is a rapid and cost-effective method, however, it is based on indirect measurements, yielding high amount of data. Therefore, NIR spectroscopy requires calibration with mathematical and statistical tools (Chemometrics) to extract analytical information from the corresponding spectra, as Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). PCA is well suited for NIR, once it can handle many spectra at a time and be used for non-supervised classification. Advantages of the PCA, which is also a data reduction technique, is that it reduces the data spectra to a smaller number of latent variables for further interpretation. On the other hand, LDA is a supervised method that searches the Canonical Variables (CV) with the maximum separation among different categories. In LDA, the first CV is the direction of maximum ratio between inter and intra-class variances. The present work used a portable infrared spectrometer (NIR) for identification and classification of pure and fiber-fortified semolina samples. The fiber was added to semolina in two different concentrations, and after the spectra acquisition, the data was used for PCA and LDA to identify and discriminate the samples. The results showed that NIR spectroscopy associate to PCA was very effective in identifying pure and fiber-fortified semolina. Additionally, the classification range of the samples using LDA was between 78.3% and 95% for calibration and 75% and 95% for cross-validation. Thus, after the multivariate analysis such as PCA and LDA, it was possible to verify that NIR associated to chemometric methods is able to identify and classify the different samples in a fast and non-destructive way.

Keywords: Chemometrics, fiber, linear discriminant analysis, near-infrared spectroscopy, principal component analysis, semolina

Procedia PDF Downloads 198

318 Classroom Incivility Behaviours among Medical Students: A Comparative Study in Pakistan

Authors: Manal Rauf

Abstract:

Trained medical practitioners are produced from medical colleges serving in public and private sectors. Prime responsibility of teaching faculty is to inculcate required work ethic among the students by serving as role models for them. It is an observed fact that classroom incivility behaviours are providing a friction in achieving these targets. Present study aimed at identification of classroom incivility behaviours observed by teachers and students of public and private medical colleges as per Glasser’s Choice Theory, making a comparison and investigating the strategies being adopted by teachers of both sectors to control undesired class room behaviours. Findings revealed that a significant difference occurs between teacher and student incivility behaviours. Public sector teacher focussed on survival as a strong factor behind in civil behaviours whereas private sector teachers considered power as the precedent for incivility. Teachers of both sectors are required to use verbal as well as non-verbal immediacy to reach a healthy leaning environment.

Keywords: classroom incivility behaviour, glasser choice theory, Mehrabian immediacy theory

Procedia PDF Downloads 222

317 Child Protection Decision Making in England and Finland: A Comparative Analysis

Authors: Rachel Falconer

Abstract:

Background: The United Nations Convention on the Rights of the Child sets out the duties placed on signatory nations to take measures to protect children from all forms of violence, abuse, neglect and maltreatment. The systems for ensuring this protection vary globally, shaped by national welfare policies. In England and Finland, past research has highlighted differences in how child protection issues are framed and how state agencies respond. However, less is known about how such differences impact processes of social work judgment and decision making in practice. Method: Data was collected as part of a wider PhD project in three stages. First, social workers in sites across England and Finland were asked to complete a short questionnaire. Participants were then asked to comment on two constructed case vignettes, and were interviewed about their experiences of child protection decision making at the point of referral. Interviews were analyzed using NVivo to draw out key themes. Findings: There were similarities in how the English and Finnish social workers responded to the case vignettes; for example, participants in both countries expressed concerns about similar risk factors and all felt further assessment was needed. Differences were observed, in particular, in regard to the sources of support and guidance participants referred to, with the English social workers appearing to rely more upon managerial input for their decisions than the Finnish social workers. These findings suggest evidence for two distinct decision making approaches: ‘supervised’ and ‘supported’ judgement. Implications for practice: The findings have relevance to the conference theme of research and evaluation of social work practice, and support the findings of previous studies that have emphasized the significance of organizational factors in child protection decision making. The comparative methodology has also helped to demonstrate how organizational factors can influence practice in different child protection system ‘orientations’. The presentation will discuss the potential practice implications of ‘supervised’, manager-led approaches to decision making as contrasted with ‘supported’, team-led approaches, inviting discussion about the relevance of these findings for social work in other countries.

Keywords: child protection, comparative research, decision making, social work, vignettes

Procedia PDF Downloads 247

316 Extracting Attributes for Twitter Hashtag Communities

Authors: Ashwaq Alsulami, Jianhua Shao

Abstract:

Various organisations often need to understand discussions on social media, such as what trending topics are and characteristics of the people engaged in the discussion. A number of approaches have been proposed to extract attributes that would characterise a discussion group. However, these approaches are largely based on supervised learning, and as such they require a large amount of labelled data. We propose an approach in this paper that does not require labelled data, but rely on lexical sources to detect meaningful attributes for online discussion groups. Our findings show an acceptable level of accuracy in detecting attributes for Twitter discussion groups.

Keywords: attributed community, attribute detection, community, social network

Procedia PDF Downloads 146

315 Landsat Data from Pre Crop Season to Estimate the Area to Be Planted with Summer Crops

Authors: Valdir Moura, Raniele dos Anjos de Souza, Fernando Gomes de Souza, Jose Vagner da Silva, Jerry Adriani Johann

Abstract:

The estimate of the Area of Land to be planted with annual crops and its stratification by the municipality are important variables in crop forecast. Nowadays in Brazil, these information’s are obtained by the Brazilian Institute of Geography and Statistics (IBGE) and published under the report Assessment of the Agricultural Production. Due to the high cloud cover in the main crop growing season (October to March) it is difficult to acquire good orbital images. Thus, one alternative is to work with remote sensing data from dates before the crop growing season. This work presents the use of multitemporal Landsat data gathered on July and September (before the summer growing season) in order to estimate the area of land to be planted with summer crops in an area of São Paulo State, Brazil. Geographic Information Systems (GIS) and digital image processing techniques were applied for the treatment of the available data. Supervised and non-supervised classifications were used for data in digital number and reflectance formats and the multitemporal Normalized Difference Vegetation Index (NDVI) images. The objective was to discriminate the tracts with higher probability to become planted with summer crops. Classification accuracies were evaluated using a sampling system developed basically for this study region. The estimated areas were corrected using the error matrix derived from these evaluations. The classification techniques presented an excellent level according to the kappa index. The proportion of crops stratified by municipalities was derived by a field work during the crop growing season. These proportion coefficients were applied onto the area of land to be planted with summer crops (derived from Landsat data). Thus, it was possible to derive the area of each summer crop by the municipality. The discrepancies between official statistics and our results were attributed to the sampling and the stratification procedures. Nevertheless, this methodology can be improved in order to provide good crop area estimates using remote sensing data, despite the cloud cover during the growing season.

Keywords: area intended for summer culture, estimated area planted, agriculture, Landsat, planting schedule

Procedia PDF Downloads 136

314 Data Mining in Medicine Domain Using Decision Trees and Vector Support Machine

Authors: Djamila Benhaddouche, Abdelkader Benyettou

Abstract:

In this paper, we used data mining to extract biomedical knowledge. In general, complex biomedical data collected in studies of populations are treated by statistical methods, although they are robust, they are not sufficient in themselves to harness the potential wealth of data. For that you used in step two learning algorithms: the Decision Trees and Support Vector Machine (SVM). These supervised classification methods are used to make the diagnosis of thyroid disease. In this context, we propose to promote the study and use of symbolic data mining techniques.

Keywords: biomedical data, learning, classifier, algorithms decision tree, knowledge extraction

Procedia PDF Downloads 541

313 How to Teach Italian Intransitive Verbs: Focusing on Unaccusatives and Unergatives

Authors: Joung Hyoun Lee

Abstract:

Intransitive verbs consist of two subclasses called unergatives and unaccusatives. However, traditionally Italian intransitive verbs have been taught regardless their semantic distinctions and any mention of grammatical terms such as unaccusatives and unergatives even though there is a huge gap between them. This paper aims to explore the teaching of Italian intransitive verbs categorizing them into unaccusatives and unergatives, which is compared with researches on the teaching of English unaccusative and unergative verbs. For this purpose, first, the study analyses various aspects of English vs. Italian unergatives and unaccusatives, and their properties of the constructions. Next, this study highlights the research trend on Korean students' learning errors, which is leaning toward causal analyses of the over passivization of English unaccusative verbs. In order to investigate these issues, 53 students of the Busan University of Foreign Studies, who are studying Italian language as a second language, were surveyed through a grammaticality judgment test divided into 9 sections. As expected, the findings confirmed that the test results of Italian unaccusatives and unergatives showed similar and different aspects comparing to those of English. Moreover, there was a highly affirmative demand for a more careful way of teaching which should be considered both syntactically and semantically according to the grammatical items. The research provides a framework of a more effective and systematic teaching method of Italian intransitive verbs for further research.

Keywords: unaccusative verbs, unergative verbs, agent, patient, theme, overpassivization

Procedia PDF Downloads 248