Search results for: KTH dataset
149 Exploring the Synergistic Effects of Aerobic Exercise and Cinnamon Extract on Metabolic Markers in Insulin-Resistant Rats through Advanced Machine Learning and Deep Learning Techniques
Authors: Masoomeh Alsadat Mirshafaei
Abstract:
The present study aims to explore the effect of an 8-week aerobic training regimen combined with cinnamon extract on serum irisin and leptin levels in insulin-resistant rats. Additionally, this research leverages various machine learning (ML) and deep learning (DL) algorithms to model the complex interdependencies between exercise, nutrition, and metabolic markers, offering a groundbreaking approach to obesity and diabetes research. Forty-eight Wistar rats were selected and randomly divided into four groups: control, training, cinnamon, and training cinnamon. The training protocol was conducted over 8 weeks, with sessions 5 days a week at 75-80% VO2 max. The cinnamon and training-cinnamon groups were injected with 200 ml/kg/day of cinnamon extract. Data analysis included serum data, dietary intake, exercise intensity, and metabolic response variables, with blood samples collected 72 hours after the final training session. The dataset was analyzed using one-way ANOVA (P<0.05) and fed into various ML and DL models, including Support Vector Machines (SVM), Random Forest (RF), and Convolutional Neural Networks (CNN). Traditional statistical methods indicated that aerobic training, with and without cinnamon extract, significantly increased serum irisin and decreased leptin levels. Among the algorithms, the CNN model provided superior performance in identifying specific interactions between cinnamon extract concentration and exercise intensity, optimizing the increase in irisin and the decrease in leptin. The CNN model achieved an accuracy of 92%, outperforming the SVM (85%) and RF (88%) models in predicting the optimal conditions for metabolic marker improvements. The study demonstrated that advanced ML and DL techniques could uncover nuanced relationships and potential cellular responses to exercise and dietary supplements, which is not evident through traditional methods. These findings advocate for the integration of advanced analytical techniques in nutritional science and exercise physiology, paving the way for personalized health interventions in managing obesity and diabetes.Keywords: aerobic training, cinnamon extract, insulin resistance, irisin, leptin, convolutional neural networks, exercise physiology, support vector machines, random forest
Procedia PDF Downloads 38148 Construction of Graph Signal Modulations via Graph Fourier Transform and Its Applications
Authors: Xianwei Zheng, Yuan Yan Tang
Abstract:
Classical window Fourier transform has been widely used in signal processing, image processing, machine learning and pattern recognition. The related Gabor transform is powerful enough to capture the texture information of any given dataset. Recently, in the emerging field of graph signal processing, researchers devoting themselves to develop a graph signal processing theory to handle the so-called graph signals. Among the new developing theory, windowed graph Fourier transform has been constructed to establish a time-frequency analysis framework of graph signals. The windowed graph Fourier transform is defined by using the translation and modulation operators of graph signals, following the similar calculations in classical windowed Fourier transform. Specifically, the translation and modulation operators of graph signals are defined by using the Laplacian eigenvectors as follows. For a given graph signal, its translation is defined by a similar manner as its definition in classical signal processing. Specifically, the translation operator can be defined by using the Fourier atoms; the graph signal translation is defined similarly by using the Laplacian eigenvectors. The modulation of the graph can also be established by using the Laplacian eigenvectors. The windowed graph Fourier transform based on these two operators has been applied to obtain time-frequency representations of graph signals. Fundamentally, the modulation operator is defined similarly to the classical modulation by multiplying a graph signal with the entries in each Fourier atom. However, a single Laplacian eigenvector entry cannot play a similar role as the Fourier atom. This definition ignored the relationship between the translation and modulation operators. In this paper, a new definition of the modulation operator is proposed and thus another time-frequency framework for graph signal is constructed. Specifically, the relationship between the translation and modulation operations can be established by the Fourier transform. Specifically, for any signal, the Fourier transform of its translation is the modulation of its Fourier transform. Thus, the modulation of any signal can be defined as the inverse Fourier transform of the translation of its Fourier transform. Therefore, similarly, the graph modulation of any graph signal can be defined as the inverse graph Fourier transform of the translation of its graph Fourier. The novel definition of the graph modulation operator established a relationship of the translation and modulation operations. The new modulation operation and the original translation operation are applied to construct a new framework of graph signal time-frequency analysis. Furthermore, a windowed graph Fourier frame theory is developed. Necessary and sufficient conditions for constructing windowed graph Fourier frames, tight frames and dual frames are presented in this paper. The novel graph signal time-frequency analysis framework is applied to signals defined on well-known graphs, e.g. Minnesota road graph and random graphs. Experimental results show that the novel framework captures new features of graph signals.Keywords: graph signals, windowed graph Fourier transform, windowed graph Fourier frames, vertex frequency analysis
Procedia PDF Downloads 342147 Assessing the Influence of Station Density on Geostatistical Prediction of Groundwater Levels in a Semi-arid Watershed of Karnataka
Authors: Sakshi Dhumale, Madhushree C., Amba Shetty
Abstract:
The effect of station density on the geostatistical prediction of groundwater levels is of critical importance to ensure accurate and reliable predictions. Monitoring station density directly impacts the accuracy and reliability of geostatistical predictions by influencing the model's ability to capture localized variations and small-scale features in groundwater levels. This is particularly crucial in regions with complex hydrogeological conditions and significant spatial heterogeneity. Insufficient station density can result in larger prediction uncertainties, as the model may struggle to adequately represent the spatial variability and correlation patterns of the data. On the other hand, an optimal distribution of monitoring stations enables effective coverage of the study area and captures the spatial variability of groundwater levels more comprehensively. In this study, we investigate the effect of station density on the predictive performance of groundwater levels using the geostatistical technique of Ordinary Kriging. The research utilizes groundwater level data collected from 121 observation wells within the semi-arid Berambadi watershed, gathered over a six-year period (2010-2015) from the Indian Institute of Science (IISc), Bengaluru. The dataset is partitioned into seven subsets representing varying sampling densities, ranging from 15% (12 wells) to 100% (121 wells) of the total well network. The results obtained from different monitoring networks are compared against the existing groundwater monitoring network established by the Central Ground Water Board (CGWB). The findings of this study demonstrate that higher station densities significantly enhance the accuracy of geostatistical predictions for groundwater levels. The increased number of monitoring stations enables improved interpolation accuracy and captures finer-scale variations in groundwater levels. These results shed light on the relationship between station density and the geostatistical prediction of groundwater levels, emphasizing the importance of appropriate station densities to ensure accurate and reliable predictions. The insights gained from this study have practical implications for designing and optimizing monitoring networks, facilitating effective groundwater level assessments, and enabling sustainable management of groundwater resources.Keywords: station density, geostatistical prediction, groundwater levels, monitoring networks, interpolation accuracy, spatial variability
Procedia PDF Downloads 60146 Optimization for Autonomous Robotic Construction by Visual Guidance through Machine Learning
Authors: Yangzhi Li
Abstract:
Network transfer of information and performance customization is now a viable method of digital industrial production in the era of Industry 4.0. Robot platforms and network platforms have grown more important in digital design and construction. The pressing need for novel building techniques is driven by the growing labor scarcity problem and increased awareness of construction safety. Robotic approaches in construction research are regarded as an extension of operational and production tools. Several technological theories related to robot autonomous recognition, which include high-performance computing, physical system modeling, extensive sensor coordination, and dataset deep learning, have not been explored using intelligent construction. Relevant transdisciplinary theory and practice research still has specific gaps. Optimizing high-performance computing and autonomous recognition visual guidance technologies improves the robot's grasp of the scene and capacity for autonomous operation. Intelligent vision guidance technology for industrial robots has a serious issue with camera calibration, and the use of intelligent visual guiding and identification technologies for industrial robots in industrial production has strict accuracy requirements. It can be considered that visual recognition systems have challenges with precision issues. In such a situation, it will directly impact the effectiveness and standard of industrial production, necessitating a strengthening of the visual guiding study on positioning precision in recognition technology. To best facilitate the handling of complicated components, an approach for the visual recognition of parts utilizing machine learning algorithms is proposed. This study will identify the position of target components by detecting the information at the boundary and corner of a dense point cloud and determining the aspect ratio in accordance with the guidelines for the modularization of building components. To collect and use components, operational processing systems assign them to the same coordinate system based on their locations and postures. The RGB image's inclination detection and the depth image's verification will be used to determine the component's present posture. Finally, a virtual environment model for the robot's obstacle-avoidance route will be constructed using the point cloud information.Keywords: robotic construction, robotic assembly, visual guidance, machine learning
Procedia PDF Downloads 86145 Design and Implementation of Generative Models for Odor Classification Using Electronic Nose
Authors: Kumar Shashvat, Amol P. Bhondekar
Abstract:
In the midst of the five senses, odor is the most reminiscent and least understood. Odor testing has been mysterious and odor data fabled to most practitioners. The delinquent of recognition and classification of odor is important to achieve. The facility to smell and predict whether the artifact is of further use or it has become undesirable for consumption; the imitation of this problem hooked on a model is of consideration. The general industrial standard for this classification is color based anyhow; odor can be improved classifier than color based classification and if incorporated in machine will be awfully constructive. For cataloging of odor for peas, trees and cashews various discriminative approaches have been used Discriminative approaches offer good prognostic performance and have been widely used in many applications but are incapable to make effectual use of the unlabeled information. In such scenarios, generative approaches have better applicability, as they are able to knob glitches, such as in set-ups where variability in the series of possible input vectors is enormous. Generative models are integrated in machine learning for either modeling data directly or as a transitional step to form an indeterminate probability density function. The algorithms or models Linear Discriminant Analysis and Naive Bayes Classifier have been used for classification of the odor of cashews. Linear Discriminant Analysis is a method used in data classification, pattern recognition, and machine learning to discover a linear combination of features that typifies or divides two or more classes of objects or procedures. The Naive Bayes algorithm is a classification approach base on Bayes rule and a set of qualified independence theory. Naive Bayes classifiers are highly scalable, requiring a number of restraints linear in the number of variables (features/predictors) in a learning predicament. The main recompenses of using the generative models are generally a Generative Models make stronger assumptions about the data, specifically, about the distribution of predictors given the response variables. The Electronic instrument which is used for artificial odor sensing and classification is an electronic nose. This device is designed to imitate the anthropological sense of odor by providing an analysis of individual chemicals or chemical mixtures. The experimental results have been evaluated in the form of the performance measures i.e. are accuracy, precision and recall. The investigational results have proven that the overall performance of the Linear Discriminant Analysis was better in assessment to the Naive Bayes Classifier on cashew dataset.Keywords: odor classification, generative models, naive bayes, linear discriminant analysis
Procedia PDF Downloads 387144 Using Arellano-Bover/Blundell-Bond Estimator in Dynamic Panel Data Analysis – Case of Finnish Housing Price Dynamics
Authors: Janne Engblom, Elias Oikarinen
Abstract:
A panel dataset is one that follows a given sample of individuals over time, and thus provides multiple observations on each individual in the sample. Panel data models include a variety of fixed and random effects models which form a wide range of linear models. A special case of panel data models are dynamic in nature. A complication regarding a dynamic panel data model that includes the lagged dependent variable is endogeneity bias of estimates. Several approaches have been developed to account for this problem. In this paper, the panel models were estimated using the Arellano-Bover/Blundell-Bond Generalized method of moments (GMM) estimator which is an extension of the Arellano-Bond model where past values and different transformations of past values of the potentially problematic independent variable are used as instruments together with other instrumental variables. The Arellano–Bover/Blundell–Bond estimator augments Arellano–Bond by making an additional assumption that first differences of instrument variables are uncorrelated with the fixed effects. This allows the introduction of more instruments and can dramatically improve efficiency. It builds a system of two equations—the original equation and the transformed one—and is also known as system GMM. In this study, Finnish housing price dynamics were examined empirically by using the Arellano–Bover/Blundell–Bond estimation technique together with ordinary OLS. The aim of the analysis was to provide a comparison between conventional fixed-effects panel data models and dynamic panel data models. The Arellano–Bover/Blundell–Bond estimator is suitable for this analysis for a number of reasons: It is a general estimator designed for situations with 1) a linear functional relationship; 2) one left-hand-side variable that is dynamic, depending on its own past realizations; 3) independent variables that are not strictly exogenous, meaning they are correlated with past and possibly current realizations of the error; 4) fixed individual effects; and 5) heteroskedasticity and autocorrelation within individuals but not across them. Based on data of 14 Finnish cities over 1988-2012 differences of short-run housing price dynamics estimates were considerable when different models and instrumenting were used. Especially, the use of different instrumental variables caused variation of model estimates together with their statistical significance. This was particularly clear when comparing estimates of OLS with different dynamic panel data models. Estimates provided by dynamic panel data models were more in line with theory of housing price dynamics.Keywords: dynamic model, fixed effects, panel data, price dynamics
Procedia PDF Downloads 1508143 Assessing Online Learning Paths in an Learning Management Systems Using a Data Mining and Machine Learning Approach
Authors: Alvaro Figueira, Bruno Cabral
Abstract:
Nowadays, students are used to be assessed through an online platform. Educators have stepped up from a period in which they endured the transition from paper to digital. The use of a diversified set of question types that range from quizzes to open questions is currently common in most university courses. In many courses, today, the evaluation methodology also fosters the students’ online participation in forums, the download, and upload of modified files, or even the participation in group activities. At the same time, new pedagogy theories that promote the active participation of students in the learning process, and the systematic use of problem-based learning, are being adopted using an eLearning system for that purpose. However, although there can be a lot of feedback from these activities to student’s, usually it is restricted to the assessments of online well-defined tasks. In this article, we propose an automatic system that informs students of abnormal deviations of a 'correct' learning path in the course. Our approach is based on the fact that by obtaining this information earlier in the semester, may provide students and educators an opportunity to resolve an eventual problem regarding the student’s current online actions towards the course. Our goal is to prevent situations that have a significant probability to lead to a poor grade and, eventually, to failing. In the major learning management systems (LMS) currently available, the interaction between the students and the system itself is registered in log files in the form of registers that mark beginning of actions performed by the user. Our proposed system uses that logged information to derive new one: the time each student spends on each activity, the time and order of the resources used by the student and, finally, the online resource usage pattern. Then, using the grades assigned to the students in previous years, we built a learning dataset that is used to feed a machine learning meta classifier. The produced classification model is then used to predict the grades a learning path is heading to, in the current year. Not only this approach serves the teacher, but also the student to receive automatic feedback on her current situation, having past years as a perspective. Our system can be applied to online courses that integrate the use of an online platform that stores user actions in a log file, and that has access to other student’s evaluations. The system is based on a data mining process on the log files and on a self-feedback machine learning algorithm that works paired with the Moodle LMS.Keywords: data mining, e-learning, grade prediction, machine learning, student learning path
Procedia PDF Downloads 122142 Integrating Data Mining with Case-Based Reasoning for Diagnosing Sorghum Anthracnose
Authors: Mariamawit T. Belete
Abstract:
Cereal production and marketing are the means of livelihood for millions of households in Ethiopia. However, cereal production is constrained by technical and socio-economic factors. Among the technical factors, cereal crop diseases are the major contributing factors to the low yield. The aim of this research is to develop an integration of data mining and knowledge based system for sorghum anthracnose disease diagnosis that assists agriculture experts and development agents to make timely decisions. Anthracnose diagnosing systems gather information from Melkassa agricultural research center and attempt to score anthracnose severity scale. Empirical research is designed for data exploration, modeling, and confirmatory procedures for testing hypothesis and prediction to draw a sound conclusion. WEKA (Waikato Environment for Knowledge Analysis) was employed for the modeling. Knowledge based system has come across a variety of approaches based on the knowledge representation method; case-based reasoning (CBR) is one of the popular approaches used in knowledge-based system. CBR is a problem solving strategy that uses previous cases to solve new problems. The system utilizes hidden knowledge extracted by employing clustering algorithms, specifically K-means clustering from sampled anthracnose dataset. Clustered cases with centroid value are mapped to jCOLIBRI, and then the integrator application is created using NetBeans with JDK 8.0.2. The important part of a case based reasoning model includes case retrieval; the similarity measuring stage, reuse; which allows domain expert to transfer retrieval case solution to suit for the current case, revise; to test the solution, and retain to store the confirmed solution to the case base for future use. Evaluation of the system was done for both system performance and user acceptance. For testing the prototype, seven test cases were used. Experimental result shows that the system achieves an average precision and recall values of 70% and 83%, respectively. User acceptance testing also performed by involving five domain experts, and an average of 83% acceptance is achieved. Although the result of this study is promising, however, further study should be done an investigation on hybrid approach such as rule based reasoning, and pictorial retrieval process are recommended.Keywords: sorghum anthracnose, data mining, case based reasoning, integration
Procedia PDF Downloads 82141 Dividend Policy in Family Controlling Firms from a Governance Perspective: Empirical Evidence in Thailand
Authors: Tanapond S.
Abstract:
Typically, most of the controlling firms are relate to family firms which are widespread and important for economic growth particularly in Asian Pacific region. The unique characteristics of the controlling families tend to play an important role in determining the corporate policies such as dividend policy. Given the complexity of the family business phenomenon, the empirical evidence has been unclear on how the families behind business groups influence dividend policy in Asian markets with the prevalent existence of cross-shareholdings and pyramidal structure. Dividend policy as one of an important determinant of firm value could also be implemented in order to examine the effect of the controlling families behind business groups on strategic decisions-making in terms of a governance perspective and agency problems. The purpose of this paper is to investigate the impact of ownership structure and concentration which are influential internal corporate governance mechanisms in family firms on dividend decision-making. Using panel data and constructing a unique dataset of family ownership and control through hand-collecting information from the nonfinancial companies listed in Stock Exchange of Thailand (SET) between 2000 and 2015, the study finds that family firms with large stakes distribute higher dividends than family firms with small stakes. Family ownership can mitigate the agency problems and the expropriation of minority investors in family firms. To provide insight into the distinguish between ownership rights and control rights, this study examines specific firm characteristics including the degrees of concentration of controlling shareholders by classifying family ownership in different categories. The results show that controlling families with large deviation between voting rights and cash flow rights have more power and affect lower dividend payment. These situations become worse when second blockholders are families. To the best knowledge of the researcher, this study is the first to examine the association between family firms’ characteristics and dividend policy from the corporate governance perspectives in Thailand with weak investor protection environment and high ownership concentration. This research also underscores the importance of family control especially in a context in which family business groups and pyramidal structure are prevalent. As a result, academics and policy makers can develop markets and corporate policies to eliminate agency problem.Keywords: agency theory, dividend policy, family control, Thailand
Procedia PDF Downloads 290140 Profiling Risky Code Using Machine Learning
Authors: Zunaira Zaman, David Bohannon
Abstract:
This study explores the application of machine learning (ML) for detecting security vulnerabilities in source code. The research aims to assist organizations with large application portfolios and limited security testing capabilities in prioritizing security activities. ML-based approaches offer benefits such as increased confidence scores, false positives and negatives tuning, and automated feedback. The initial approach using natural language processing techniques to extract features achieved 86% accuracy during the training phase but suffered from overfitting and performed poorly on unseen datasets during testing. To address these issues, the study proposes using the abstract syntax tree (AST) for Java and C++ codebases to capture code semantics and structure and generate path-context representations for each function. The Code2Vec model architecture is used to learn distributed representations of source code snippets for training a machine-learning classifier for vulnerability prediction. The study evaluates the performance of the proposed methodology using two datasets and compares the results with existing approaches. The Devign dataset yielded 60% accuracy in predicting vulnerable code snippets and helped resist overfitting, while the Juliet Test Suite predicted specific vulnerabilities such as OS-Command Injection, Cryptographic, and Cross-Site Scripting vulnerabilities. The Code2Vec model achieved 75% accuracy and a 98% recall rate in predicting OS-Command Injection vulnerabilities. The study concludes that even partial AST representations of source code can be useful for vulnerability prediction. The approach has the potential for automated intelligent analysis of source code, including vulnerability prediction on unseen source code. State-of-the-art models using natural language processing techniques and CNN models with ensemble modelling techniques did not generalize well on unseen data and faced overfitting issues. However, predicting vulnerabilities in source code using machine learning poses challenges such as high dimensionality and complexity of source code, imbalanced datasets, and identifying specific types of vulnerabilities. Future work will address these challenges and expand the scope of the research.Keywords: code embeddings, neural networks, natural language processing, OS command injection, software security, code properties
Procedia PDF Downloads 107139 Surface Elevation Dynamics Assessment Using Digital Elevation Models, Light Detection and Ranging, GPS and Geospatial Information Science Analysis: Ecosystem Modelling Approach
Authors: Ali K. M. Al-Nasrawi, Uday A. Al-Hamdany, Sarah M. Hamylton, Brian G. Jones, Yasir M. Alyazichi
Abstract:
Surface elevation dynamics have always responded to disturbance regimes. Creating Digital Elevation Models (DEMs) to detect surface dynamics has led to the development of several methods, devices and data clouds. DEMs can provide accurate and quick results with cost efficiency, in comparison to the inherited geomatics survey techniques. Nowadays, remote sensing datasets have become a primary source to create DEMs, including LiDAR point clouds with GIS analytic tools. However, these data need to be tested for error detection and correction. This paper evaluates various DEMs from different data sources over time for Apple Orchard Island, a coastal site in southeastern Australia, in order to detect surface dynamics. Subsequently, 30 chosen locations were examined in the field to test the error of the DEMs surface detection using high resolution global positioning systems (GPSs). Results show significant surface elevation changes on Apple Orchard Island. Accretion occurred on most of the island while surface elevation loss due to erosion is limited to the northern and southern parts. Concurrently, the projected differential correction and validation method aimed to identify errors in the dataset. The resultant DEMs demonstrated a small error ratio (≤ 3%) from the gathered datasets when compared with the fieldwork survey using RTK-GPS. As modern modelling approaches need to become more effective and accurate, applying several tools to create different DEMs on a multi-temporal scale would allow easy predictions in time-cost-frames with more comprehensive coverage and greater accuracy. With a DEM technique for the eco-geomorphic context, such insights about the ecosystem dynamic detection, at such a coastal intertidal system, would be valuable to assess the accuracy of the predicted eco-geomorphic risk for the conservation management sustainability. Demonstrating this framework to evaluate the historical and current anthropogenic and environmental stressors on coastal surface elevation dynamism could be profitably applied worldwide.Keywords: DEMs, eco-geomorphic-dynamic processes, geospatial Information Science, remote sensing, surface elevation changes,
Procedia PDF Downloads 267138 In silico Designing of Imidazo [4,5-b] Pyridine as a Probable Lead for Potent Decaprenyl Phosphoryl-β-D-Ribose 2′-Epimerase (DprE1) Inhibitors as Antitubercular Agents
Authors: Jineetkumar Gawad, Chandrakant Bonde
Abstract:
Tuberculosis (TB) is a major worldwide concern whose control has been exacerbated by HIV, the rise of multidrug-resistance (MDR-TB) and extensively drug resistance (XDR-TB) strains of Mycobacterium tuberculosis. The interest for newer and faster acting antitubercular drugs are more remarkable than any time. To search potent compounds is need and challenge for researchers. Here, we tried to design lead for inhibition of Decaprenyl phosphoryl-β-D-ribose 2′-epimerase (DprE1) enzyme. Arabinose is an essential constituent of mycobacterial cell wall. DprE1 is a flavoenzyme that converts decaprenylphosphoryl-D-ribose into decaprenylphosphoryl-2-keto-ribose, which is intermediate in biosynthetic pathway of arabinose. Latter, DprE2 converts keto-ribose into decaprenylphosphoryl-D-arabinose. We had a selection of 23 compounds from azaindole series for computational study, and they were drawn using marvisketch. Ligands were prepared using Maestro molecular modeling interface, Schrodinger, v10.5. Common pharmacophore hypotheses were developed by applying dataset thresholds to yield active and inactive set of compounds. There were 326 hypotheses were developed. On the basis of survival score, ADRRR (Survival Score: 5.453) was selected. Selected pharmacophore hypotheses were subjected to virtual screening results into 1000 hits. Hits were prepared and docked with protein 4KW5 (oxydoreductase inhibitor) was downloaded in .pdb format from RCSB Protein Data Bank. Protein was prepared using protein preparation wizard. Protein was preprocessed, the workspace was analyzed using force field OPLS 2005. Glide grid was generated by picking single atom in molecule. Prepared ligands were docked with prepared protein 4KW5 using Glide docking. After docking, on the basis of glide score top-five compounds were selected, (5223, 5812, 0661, 0662, and 2945) and the glide docking score (-8.928, -8.534, -8.412, -8.411, -8.351) respectively. There were interactions of ligand and protein, specifically HIS 132, LYS 418, TRY 230, ASN 385. Pi-pi stacking was observed in few compounds with basic Imidazo [4,5-b] pyridine ring. We had basic azaindole ring in parent compounds, but after glide docking, we received compounds with Imidazo [4,5-b] pyridine as a basic ring. That might be the new lead in the process of drug discovery.Keywords: DprE1 inhibitors, in silico drug designing, imidazo [4, 5-b] pyridine, lead, tuberculosis
Procedia PDF Downloads 154137 Tumor Size and Lymph Node Metastasis Detection in Colon Cancer Patients Using MR Images
Authors: Mohammadreza Hedyehzadeh, Mahdi Yousefi
Abstract:
Colon cancer is one of the most common cancer, which predicted to increase its prevalence due to the bad eating habits of peoples. Nowadays, due to the busyness of people, the use of fast foods is increasing, and therefore, diagnosis of this disease and its treatment are of particular importance. To determine the best treatment approach for each specific colon cancer patients, the oncologist should be known the stage of the tumor. The most common method to determine the tumor stage is TNM staging system. In this system, M indicates the presence of metastasis, N indicates the extent of spread to the lymph nodes, and T indicates the size of the tumor. It is clear that in order to determine all three of these parameters, an imaging method must be used, and the gold standard imaging protocols for this purpose are CT and PET/CT. In CT imaging, due to the use of X-rays, the risk of cancer and the absorbed dose of the patient is high, while in the PET/CT method, there is a lack of access to the device due to its high cost. Therefore, in this study, we aimed to estimate the tumor size and the extent of its spread to the lymph nodes using MR images. More than 1300 MR images collected from the TCIA portal, and in the first step (pre-processing), histogram equalization to improve image qualities and resizing to get the same image size was done. Two expert radiologists, which work more than 21 years on colon cancer cases, segmented the images and extracted the tumor region from the images. The next step is feature extraction from segmented images and then classify the data into three classes: T0N0، T3N1 و T3N2. In this article, the VGG-16 convolutional neural network has been used to perform both of the above-mentioned tasks, i.e., feature extraction and classification. This network has 13 convolution layers for feature extraction and three fully connected layers with the softmax activation function for classification. In order to validate the proposed method, the 10-fold cross validation method used in such a way that the data was randomly divided into three parts: training (70% of data), validation (10% of data) and the rest for testing. It is repeated 10 times, each time, the accuracy, sensitivity and specificity of the model are calculated and the average of ten repetitions is reported as the result. The accuracy, specificity and sensitivity of the proposed method for testing dataset was 89/09%, 95/8% and 96/4%. Compared to previous studies, using a safe imaging technique (MRI) and non-use of predefined hand-crafted imaging features to determine the stage of colon cancer patients are some of the study advantages.Keywords: colon cancer, VGG-16, magnetic resonance imaging, tumor size, lymph node metastasis
Procedia PDF Downloads 59136 Optimizing Production Yield Through Process Parameter Tuning Using Deep Learning Models: A Case Study in Precision Manufacturing
Authors: Tolulope Aremu
Abstract:
This paper is based on the idea of using deep learning methodology for optimizing production yield by tuning a few key process parameters in a manufacturing environment. The study was explicitly on how to maximize production yield and minimize operational costs by utilizing advanced neural network models, specifically Long Short-Term Memory and Convolutional Neural Networks. These models were implemented using Python-based frameworks—TensorFlow and Keras. The targets of the research are the precision molding processes in which temperature ranges between 150°C and 220°C, the pressure ranges between 5 and 15 bar, and the material flow rate ranges between 10 and 50 kg/h, which are critical parameters that have a great effect on yield. A dataset of 1 million production cycles has been considered for five continuous years, where detailed logs are present showing the exact setting of parameters and yield output. The LSTM model would model time-dependent trends in production data, while CNN analyzed the spatial correlations between parameters. Models are designed in a supervised learning manner. For the model's loss, an MSE loss function is used, optimized through the Adam optimizer. After running a total of 100 training epochs, 95% accuracy was achieved by the models recommending optimal parameter configurations. Results indicated that with the use of RSM and DOE traditional methods, there was an increase in production yield of 12%. Besides, the error margin was reduced by 8%, hence consistent quality products from the deep learning models. The monetary value was annually around $2.5 million, the cost saved from material waste, energy consumption, and equipment wear resulting from the implementation of optimized process parameters. This system was deployed in an industrial production environment with the help of a hybrid cloud system: Microsoft Azure, for data storage, and the training and deployment of their models were performed on Google Cloud AI. The functionality of real-time monitoring of the process and automatic tuning of parameters depends on cloud infrastructure. To put it into perspective, deep learning models, especially those employing LSTM and CNN, optimize the production yield by fine-tuning process parameters. Future research will consider reinforcement learning with a view to achieving further enhancement of system autonomy and scalability across various manufacturing sectors.Keywords: production yield optimization, deep learning, tuning of process parameters, LSTM, CNN, precision manufacturing, TensorFlow, Keras, cloud infrastructure, cost saving
Procedia PDF Downloads 32135 Comparison of Parametric and Bayesian Survival Regression Models in Simulated and HIV Patient Antiretroviral Therapy Data: Case Study of Alamata Hospital, North Ethiopia
Authors: Zeytu G. Asfaw, Serkalem K. Abrha, Demisew G. Degefu
Abstract:
Background: HIV/AIDS remains a major public health problem in Ethiopia and heavily affecting people of productive and reproductive age. We aimed to compare the performance of Parametric Survival Analysis and Bayesian Survival Analysis using simulations and in a real dataset application focused on determining predictors of HIV patient survival. Methods: A Parametric Survival Models - Exponential, Weibull, Log-normal, Log-logistic, Gompertz and Generalized gamma distributions were considered. Simulation study was carried out with two different algorithms that were informative and noninformative priors. A retrospective cohort study was implemented for HIV infected patients under Highly Active Antiretroviral Therapy in Alamata General Hospital, North Ethiopia. Results: A total of 320 HIV patients were included in the study where 52.19% females and 47.81% males. According to Kaplan-Meier survival estimates for the two sex groups, females has shown better survival time in comparison with their male counterparts. The median survival time of HIV patients was 79 months. During the follow-up period 89 (27.81%) deaths and 231 (72.19%) censored individuals registered. The average baseline cluster of differentiation 4 (CD4) cells count for HIV/AIDS patients were 126.01 but after a three-year antiretroviral therapy follow-up the average cluster of differentiation 4 (CD4) cells counts were 305.74, which was quite encouraging. Age, functional status, tuberculosis screen, past opportunistic infection, baseline cluster of differentiation 4 (CD4) cells, World Health Organization clinical stage, sex, marital status, employment status, occupation type, baseline weight were found statistically significant factors for longer survival of HIV patients. The standard error of all covariate in Bayesian log-normal survival model is less than the classical one. Hence, Bayesian survival analysis showed better performance than classical parametric survival analysis, when subjective data analysis was performed by considering expert opinions and historical knowledge about the parameters. Conclusions: Thus, HIV/AIDS patient mortality rate could be reduced through timely antiretroviral therapy with special care on the potential factors. Moreover, Bayesian log-normal survival model was preferable than the classical log-normal survival model for determining predictors of HIV patients survival.Keywords: antiretroviral therapy (ART), Bayesian analysis, HIV, log-normal, parametric survival models
Procedia PDF Downloads 196134 Computational Characterization of Electronic Charge Transfer in Interfacial Phospholipid-Water Layers
Authors: Samira Baghbanbari, A. B. P. Lever, Payam S. Shabestari, Donald Weaver
Abstract:
Existing signal transmission models, although undoubtedly useful, have proven insufficient to explain the full complexity of information transfer within the central nervous system. The development of transformative models will necessitate a more comprehensive understanding of neuronal lipid membrane electrophysiology. Pursuant to this goal, the role of highly organized interfacial phospholipid-water layers emerges as a promising case study. A series of phospholipids in neural-glial gap junction interfaces as well as cholesterol molecules have been computationally modelled using high-performance density functional theory (DFT) calculations. Subsequent 'charge decomposition analysis' calculations have revealed a net transfer of charge from phospholipid orbitals through the organized interfacial water layer before ultimately finding its way to cholesterol acceptor molecules. The specific pathway of charge transfer from phospholipid via water layers towards cholesterol has been mapped in detail. Cholesterol is an essential membrane component that is overrepresented in neuronal membranes as compared to other mammalian cells; given this relative abundance, its apparent role as an electronic acceptor may prove to be a relevant factor in further signal transmission studies of the central nervous system. The timescales over which this electronic charge transfer occurs have also been evaluated by utilizing a system design that systematically increases the number of water molecules separating lipids and cholesterol. Memory loss through hydrogen-bonded networks in water can occur at femtosecond timescales, whereas existing action potential-based models are limited to micro or nanosecond scales. As such, the development of future models that attempt to explain faster timescale signal transmission in the central nervous system may benefit from our work, which provides additional information regarding fast timescale energy transfer mechanisms occurring through interfacial water. The study possesses a dataset that includes six distinct phospholipids and a collection of cholesterol. Ten optimized geometric characteristics (features) were employed to conduct binary classification through an artificial neural network (ANN), differentiating cholesterol from the various phospholipids. This stems from our understanding that all lipids within the first group function as electronic charge donors, while cholesterol serves as an electronic charge acceptor.Keywords: charge transfer, signal transmission, phospholipids, water layers, ANN
Procedia PDF Downloads 73133 Influence of Atmospheric Circulation Patterns on Dust Pollution Transport during the Harmattan Period over West Africa
Authors: Ayodeji Oluleye
Abstract:
This study used Total Ozone Mapping Spectrometer (TOMS) Aerosol Index (AI) and reanalysis dataset of thirty years (1983-2012) to investigate the influence of the atmospheric circulation on dust transport during the Harmattan period over WestAfrica using TOMS data. The Harmattan dust mobilization and atmospheric circulation pattern were evaluated using a kernel density estimate which shows the areas where most points are concentrated between the variables. The evolution of the Inter-Tropical Discontinuity (ITD), Sea surface Temperature (SST) over the Gulf of Guinea, and the North Atlantic Oscillation (NAO) index during the Harmattan period (November-March) was also analyzed and graphs of the average ITD positions, SST and the NAO were observed on daily basis. The Pearson moment correlation analysis was also employed to assess the effect of atmospheric circulation on Harmattan dust transport. The results show that the departure (increased) of TOMS AI values from the long-term mean (1.64) occurred from around 21st of December, which signifies the rich dust days during winter period. Strong TOMS AI signal were observed from January to March with the maximum occurring in the latter months (February and March). The inter-annual variability of TOMSAI revealed that the rich dust years were found between 1984-1985, 1987-1988, 1997-1998, 1999-2000, and 2002-2004. Significantly, poor dust year was found between 2005 and 2006 in all the periods. The study has found strong north-easterly (NE) trade winds were over most of the Sahelianregion of West Africa during the winter months with the maximum wind speed reaching 8.61m/s inJanuary.The strength of NE winds determines the extent of dust transport to the coast of Gulf of Guinea during winter. This study has confirmed that the presence of the Harmattan is strongly dependent on theSST over Atlantic Ocean and ITD position. The locus of the average SST and ITD positions over West Africa could be described by polynomial functions. The study concludes that the evolution of near surface wind field at 925 hpa, and the variations of SST and ITD positions are the major large scale atmospheric circulation systems driving the emission, distribution, and transport of Harmattan dust aerosols over West Africa. However, the influence of NAO was shown to have fewer significance effects on the Harmattan dust transport over the region.Keywords: atmospheric circulation, dust aerosols, Harmattan, West Africa
Procedia PDF Downloads 310132 Different Data-Driven Bivariate Statistical Approaches to Landslide Susceptibility Mapping (Uzundere, Erzurum, Turkey)
Authors: Azimollah Aleshzadeh, Enver Vural Yavuz
Abstract:
The main goal of this study is to produce landslide susceptibility maps using different data-driven bivariate statistical approaches; namely, entropy weight method (EWM), evidence belief function (EBF), and information content model (ICM), at Uzundere county, Erzurum province, in the north-eastern part of Turkey. Past landslide occurrences were identified and mapped from an interpretation of high-resolution satellite images, and earlier reports as well as by carrying out field surveys. In total, 42 landslide incidence polygons were mapped using ArcGIS 10.4.1 software and randomly split into a construction dataset 70 % (30 landslide incidences) for building the EWM, EBF, and ICM models and the remaining 30 % (12 landslides incidences) were used for verification purposes. Twelve layers of landslide-predisposing parameters were prepared, including total surface radiation, maximum relief, soil groups, standard curvature, distance to stream/river sites, distance to the road network, surface roughness, land use pattern, engineering geological rock group, topographical elevation, the orientation of slope, and terrain slope gradient. The relationships between the landslide-predisposing parameters and the landslide inventory map were determined using different statistical models (EWM, EBF, and ICM). The model results were validated with landslide incidences, which were not used during the model construction. In addition, receiver operating characteristic curves were applied, and the area under the curve (AUC) was determined for the different susceptibility maps using the success (construction data) and prediction (verification data) rate curves. The results revealed that the AUC for success rates are 0.7055, 0.7221, and 0.7368, while the prediction rates are 0.6811, 0.6997, and 0.7105 for EWM, EBF, and ICM models, respectively. Consequently, landslide susceptibility maps were classified into five susceptibility classes, including very low, low, moderate, high, and very high. Additionally, the portion of construction and verification landslides incidences in high and very high landslide susceptibility classes in each map was determined. The results showed that the EWM, EBF, and ICM models produced satisfactory accuracy. The obtained landslide susceptibility maps may be useful for future natural hazard mitigation studies and planning purposes for environmental protection.Keywords: entropy weight method, evidence belief function, information content model, landslide susceptibility mapping
Procedia PDF Downloads 132131 Spectrogram Pre-Processing to Improve Isotopic Identification to Discriminate Gamma and Neutrons Sources
Authors: Mustafa Alhamdi
Abstract:
Industrial application to classify gamma rays and neutron events is investigated in this study using deep machine learning. The identification using a convolutional neural network and recursive neural network showed a significant improvement in predication accuracy in a variety of applications. The ability to identify the isotope type and activity from spectral information depends on feature extraction methods, followed by classification. The features extracted from the spectrum profiles try to find patterns and relationships to present the actual spectrum energy in low dimensional space. Increasing the level of separation between classes in feature space improves the possibility to enhance classification accuracy. The nonlinear nature to extract features by neural network contains a variety of transformation and mathematical optimization, while principal component analysis depends on linear transformations to extract features and subsequently improve the classification accuracy. In this paper, the isotope spectrum information has been preprocessed by finding the frequencies components relative to time and using them as a training dataset. Fourier transform implementation to extract frequencies component has been optimized by a suitable windowing function. Training and validation samples of different isotope profiles interacted with CdTe crystal have been simulated using Geant4. The readout electronic noise has been simulated by optimizing the mean and variance of normal distribution. Ensemble learning by combing voting of many models managed to improve the classification accuracy of neural networks. The ability to discriminate gamma and neutron events in a single predication approach using deep machine learning has shown high accuracy using deep learning. The paper findings show the ability to improve the classification accuracy by applying the spectrogram preprocessing stage to the gamma and neutron spectrums of different isotopes. Tuning deep machine learning models by hyperparameter optimization of neural network models enhanced the separation in the latent space and provided the ability to extend the number of detected isotopes in the training database. Ensemble learning contributed significantly to improve the final prediction.Keywords: machine learning, nuclear physics, Monte Carlo simulation, noise estimation, feature extraction, classification
Procedia PDF Downloads 150130 Systematic Evaluation of Convolutional Neural Network on Land Cover Classification from Remotely Sensed Images
Authors: Eiman Kattan, Hong Wei
Abstract:
In using Convolutional Neural Network (CNN) for classification, there is a set of hyperparameters available for the configuration purpose. This study aims to evaluate the impact of a range of parameters in CNN architecture i.e. AlexNet on land cover classification based on four remotely sensed datasets. The evaluation tests the influence of a set of hyperparameters on the classification performance. The parameters concerned are epoch values, batch size, and convolutional filter size against input image size. Thus, a set of experiments were conducted to specify the effectiveness of the selected parameters using two implementing approaches, named pertained and fine-tuned. We first explore the number of epochs under several selected batch size values (32, 64, 128 and 200). The impact of kernel size of convolutional filters (1, 3, 5, 7, 10, 15, 20, 25 and 30) was evaluated against the image size under testing (64, 96, 128, 180 and 224), which gave us insight of the relationship between the size of convolutional filters and image size. To generalise the validation, four remote sensing datasets, AID, RSD, UCMerced and RSCCN, which have different land covers and are publicly available, were used in the experiments. These datasets have a wide diversity of input data, such as number of classes, amount of labelled data, and texture patterns. A specifically designed interactive deep learning GPU training platform for image classification (Nvidia Digit) was employed in the experiments. It has shown efficiency in both training and testing. The results have shown that increasing the number of epochs leads to a higher accuracy rate, as expected. However, the convergence state is highly related to datasets. For the batch size evaluation, it has shown that a larger batch size slightly decreases the classification accuracy compared to a small batch size. For example, selecting the value 32 as the batch size on the RSCCN dataset achieves the accuracy rate of 90.34 % at the 11th epoch while decreasing the epoch value to one makes the accuracy rate drop to 74%. On the other extreme, setting an increased value of batch size to 200 decreases the accuracy rate at the 11th epoch is 86.5%, and 63% when using one epoch only. On the other hand, selecting the kernel size is loosely related to data set. From a practical point of view, the filter size 20 produces 70.4286%. The last performed image size experiment shows a dependency in the accuracy improvement. However, an expensive performance gain had been noticed. The represented conclusion opens the opportunities toward a better classification performance in various applications such as planetary remote sensing.Keywords: CNNs, hyperparamters, remote sensing, land cover, land use
Procedia PDF Downloads 169129 Cartographic Depiction and Visualization of Wetlands Changes in the North-Western States of India
Authors: Bansal Ashwani
Abstract:
Cartographic depiction and visualization of wetland changes is an important tool to map spatial-temporal information about the wetland dynamics effectively and to comprehend the response of these water bodies in maintaining the groundwater and surrounding ecosystem. This is true for the states of North Western India, i.e., J&K, Himachal, Punjab, and Haryana that are bestowed upon with several natural wetlands in the flood plains or on the courses of its rivers. Thus, the present study documents, analyses and reconstructs the lost wetlands, which existed in the flood plains of the major river basins of these states, i.e., Chenab, Jhelum, Satluj, Beas, Ravi, and Ghagar, in the beginning of the 20th century. To achieve the objective, the study has used multi-temporal datasets since the 1960s using high to medium resolution satellite datasets, e.g., Corona (1960s/70s), Landsat (1990s-2017) and Sentinel (2017). The Sentinel (2017) satellite image has been used for making the wetland inventory owing to its comparatively higher spatial resolution with multi-spectral bands. In addition, historical records, repeated photographs, historical maps, field observations including geomorphological evidence were also used. The water index techniques, i.e., band rationing, normalized difference water index (NDWI), modified NDWI (MNDWI) have been compared and used to map the wetlands. The wetland types found in the north-western states have been categorized under 19 classes suggested by Space Application Centre, India. These enable the researcher to provide with the wetlands inventory and a series of cartographic representation that includes overlaying multiple temporal wetlands extent vectors. A preliminary result shows the general state of wetland shrinkage since the 1960s with varying area shrinkage rate from one wetland to another. In addition, it is observed that majority of wetlands have not been documented so far and even do not have names. Moreover, the purpose is to emphasize their elimination in addition to establishing a baseline dataset that can be a tool for wetland planning and management. Finally, the applicability of cartographic depiction and visualization, historical map sources, repeated photographs and remote sensing data for reconstruction of long term wetlands fluctuations, especially in the northern part of India, will be addressed.Keywords: cartographic depiction and visualization, wetland changes, NDWI/MDWI, geomorphological evidence and remote sensing
Procedia PDF Downloads 264128 Escalation of Commitment and Turnover in Top Management Teams
Authors: Dmitriy V. Chulkov
Abstract:
Escalation of commitment is defined as continuation of a project after receiving negative information about it. While literature in management and psychology identified various factors contributing to escalation behavior, this phenomenon has received little analysis in economics, potentially due to the apparent irrationality of escalation. In this study, we present an economic model of escalation with asymmetric information in a principal-agent setup where the agents are responsible for a project selection decision and discover the outcome of the project before the principal. Our theoretical model complements the existing literature on several accounts. First, we link the incentive to escalate commitment to a project with the turnover decision by the manager. When a manager learns the outcome of the project and stops it that reveals that a mistake was made. There is an incentive to continue failing projects and avoid admitting the mistake. This incentive is enhanced when the agent may voluntarily resign from the firm before the outcome of the failing project is revealed, and thus not bear the full extent of reputation damage due to project failure. As long as some successful managers leave the firm for extraneous reasons, outside firms find it difficult to link failing projects with certainty to managers that left a firm. Second, we demonstrate that non-CEO managers have reputation concerns separate from those of the CEO, and thus may escalate commitment to projects they oversee, when such escalation can attenuate damage to reputation from impending project failure. Such incentive for escalation will be present for non-CEO managers if the CEO delegates responsibility for a project to a non-CEO executive. If reputation matters for promotion to the CEO, the incentive for a rising executive to escalate in order to protect reputation is distinct from that of a CEO. Third, our theoretical model is supported by empirical analysis of changes in the firm’s operations measured by the presence of discontinued operations at the time of turnover among the top four members of the top management team. Discontinued operations are indicative of termination of failing projects at a firm. The empirical results demonstrate that in a large dataset of over three thousand publicly traded U.S. firms for a period from 1993 to 2014 turnover by top executives significantly increases the likelihood that the firm discontinues operations. Furthermore, the type of turnover matters as this effect is strongest when at least one non-CEO member of the top management team leaves the firm and when the CEO departure is due to a voluntary resignation and not to a retirement or illness. Empirical results are consistent with the predictions of the theoretical model and suggest that escalation of commitment is primarily observed in decisions by non-CEO members of the top management team.Keywords: discontinued operations, escalation of commitment, executive turnover, top management teams
Procedia PDF Downloads 365127 Application of Improved Semantic Communication Technology in Remote Sensing Data Transmission
Authors: Tingwei Shu, Dong Zhou, Chengjun Guo
Abstract:
Semantic communication is an emerging form of communication that realize intelligent communication by extracting semantic information of data at the source and transmitting it, and recovering the data at the receiving end. It can effectively solve the problem of data transmission under the situation of large data volume, low SNR and restricted bandwidth. With the development of Deep Learning, semantic communication further matures and is gradually applied in the fields of the Internet of Things, Uumanned Air Vehicle cluster communication, remote sensing scenarios, etc. We propose an improved semantic communication system for the situation where the data volume is huge and the spectrum resources are limited during the transmission of remote sensing images. At the transmitting, we need to extract the semantic information of remote sensing images, but there are some problems. The traditional semantic communication system based on Convolutional Neural Network cannot take into account the global semantic information and local semantic information of the image, which results in less-than-ideal image recovery at the receiving end. Therefore, we adopt the improved vision-Transformer-based structure as the semantic encoder instead of the mainstream one using CNN to extract the image semantic features. In this paper, we first perform pre-processing operations on remote sensing images to improve the resolution of the images in order to obtain images with more semantic information. We use wavelet transform to decompose the image into high-frequency and low-frequency components, perform bilinear interpolation on the high-frequency components and bicubic interpolation on the low-frequency components, and finally perform wavelet inverse transform to obtain the preprocessed image. We adopt the improved Vision-Transformer structure as the semantic coder to extract and transmit the semantic information of remote sensing images. The Vision-Transformer structure can better train the huge data volume and extract better image semantic features, and adopt the multi-layer self-attention mechanism to better capture the correlation between semantic features and reduce redundant features. Secondly, to improve the coding efficiency, we reduce the quadratic complexity of the self-attentive mechanism itself to linear so as to improve the image data processing speed of the model. We conducted experimental simulations on the RSOD dataset and compared the designed system with a semantic communication system based on CNN and image coding methods such as BGP and JPEG to verify that the method can effectively alleviate the problem of excessive data volume and improve the performance of image data communication.Keywords: semantic communication, transformer, wavelet transform, data processing
Procedia PDF Downloads 78126 Applying GIS Geographic Weighted Regression Analysis to Assess Local Factors Impeding Smallholder Farmers from Participating in Agribusiness Markets: A Case Study of Vihiga County, Western Kenya
Authors: Mwehe Mathenge, Ben G. J. S. Sonneveld, Jacqueline E. W. Broerse
Abstract:
Smallholder farmers are important drivers of agriculture productivity, food security, and poverty reduction in Sub-Saharan Africa. However, they are faced with myriad challenges in their efforts at participating in agribusiness markets. How the geographic explicit factors existing at the local level interact to impede smallholder farmers' decision to participates (or not) in agribusiness markets is not well understood. Deconstructing the spatial complexity of the local environment could provide a deeper insight into how geographically explicit determinants promote or impede resource-poor smallholder farmers from participating in agribusiness. This paper’s objective was to identify, map, and analyze local spatial autocorrelation in factors that impede poor smallholders from participating in agribusiness markets. Data were collected using geocoded researcher-administered survey questionnaires from 392 households in Western Kenya. Three spatial statistics methods in geographic information system (GIS) were used to analyze data -Global Moran’s I, Cluster and Outliers Analysis (Anselin Local Moran’s I), and geographically weighted regression. The results of Global Moran’s I reveal the presence of spatial patterns in the dataset that was not caused by spatial randomness of data. Subsequently, Anselin Local Moran’s I result identified spatially and statistically significant local spatial clustering (hot spots and cold spots) in factors hindering smallholder participation. Finally, the geographically weighted regression results unearthed those specific geographic explicit factors impeding market participation in the study area. The results confirm that geographically explicit factors are indispensable in influencing the smallholder farming decisions, and policymakers should take cognizance of them. Additionally, this research demonstrated how geospatial explicit analysis conducted at the local level, using geographically disaggregated data, could help in identifying households and localities where the most impoverished and resource-poor smallholder households reside. In designing spatially targeted interventions, policymakers could benefit from geospatial analysis methods in understanding complex geographic factors and processes that interact to influence smallholder farmers' decision-making processes and choices.Keywords: agribusiness markets, GIS, smallholder farmers, spatial statistics, disaggregated spatial data
Procedia PDF Downloads 139125 Real Estate Trend Prediction with Artificial Intelligence Techniques
Authors: Sophia Liang Zhou
Abstract:
For investors, businesses, consumers, and governments, an accurate assessment of future housing prices is crucial to critical decisions in resource allocation, policy formation, and investment strategies. Previous studies are contradictory about macroeconomic determinants of housing price and largely focused on one or two areas using point prediction. This study aims to develop data-driven models to accurately predict future housing market trends in different markets. This work studied five different metropolitan areas representing different market trends and compared three-time lagging situations: no lag, 6-month lag, and 12-month lag. Linear regression (LR), random forest (RF), and artificial neural network (ANN) were employed to model the real estate price using datasets with S&P/Case-Shiller home price index and 12 demographic and macroeconomic features, such as gross domestic product (GDP), resident population, personal income, etc. in five metropolitan areas: Boston, Dallas, New York, Chicago, and San Francisco. The data from March 2005 to December 2018 were collected from the Federal Reserve Bank, FBI, and Freddie Mac. In the original data, some factors are monthly, some quarterly, and some yearly. Thus, two methods to compensate missing values, backfill or interpolation, were compared. The models were evaluated by accuracy, mean absolute error, and root mean square error. The LR and ANN models outperformed the RF model due to RF’s inherent limitations. Both ANN and LR methods generated predictive models with high accuracy ( > 95%). It was found that personal income, GDP, population, and measures of debt consistently appeared as the most important factors. It also showed that technique to compensate missing values in the dataset and implementation of time lag can have a significant influence on the model performance and require further investigation. The best performing models varied for each area, but the backfilled 12-month lag LR models and the interpolated no lag ANN models showed the best stable performance overall, with accuracies > 95% for each city. This study reveals the influence of input variables in different markets. It also provides evidence to support future studies to identify the optimal time lag and data imputing methods for establishing accurate predictive models.Keywords: linear regression, random forest, artificial neural network, real estate price prediction
Procedia PDF Downloads 103124 Frequent Pattern Mining for Digenic Human Traits
Authors: Atsuko Okazaki, Jurg Ott
Abstract:
Some genetic diseases (‘digenic traits’) are due to the interaction between two DNA variants. For example, certain forms of Retinitis Pigmentosa (a genetic form of blindness) occur in the presence of two mutant variants, one in the ROM1 gene and one in the RDS gene, while the occurrence of only one of these mutant variants leads to a completely normal phenotype. Detecting such digenic traits by genetic methods is difficult. A common approach to finding disease-causing variants is to compare 100,000s of variants between individuals with a trait (cases) and those without the trait (controls). Such genome-wide association studies (GWASs) have been very successful but hinge on genetic effects of single variants, that is, there should be a difference in allele or genotype frequencies between cases and controls at a disease-causing variant. Frequent pattern mining (FPM) methods offer an avenue at detecting digenic traits even in the absence of single-variant effects. The idea is to enumerate pairs of genotypes (genotype patterns) with each of the two genotypes originating from different variants that may be located at very different genomic positions. What is needed is for genotype patterns to be significantly more common in cases than in controls. Let Y = 2 refer to cases and Y = 1 to controls, with X denoting a specific genotype pattern. We are seeking association rules, ‘X → Y’, with high confidence, P(Y = 2|X), significantly higher than the proportion of cases, P(Y = 2) in the study. Clearly, generally available FPM methods are very suitable for detecting disease-associated genotype patterns. We use fpgrowth as the basic FPM algorithm and built a framework around it to enumerate high-frequency digenic genotype patterns and to evaluate their statistical significance by permutation analysis. Application to a published dataset on opioid dependence furnished results that could not be found with classical GWAS methodology. There were 143 cases and 153 healthy controls, each genotyped for 82 variants in eight genes of the opioid system. The aim was to find out whether any of these variants were disease-associated. The single-variant analysis did not lead to significant results. Application of our FPM implementation resulted in one significant (p < 0.01) genotype pattern with both genotypes in the pattern being heterozygous and originating from two variants on different chromosomes. This pattern occurred in 14 cases and none of the controls. Thus, the pattern seems quite specific to this form of substance abuse and is also rather predictive of disease. An algorithm called Multifactor Dimension Reduction (MDR) was developed some 20 years ago and has been in use in human genetics ever since. This and our algorithms share some similar properties, but they are also very different in other respects. The main difference seems to be that our algorithm focuses on patterns of genotypes while the main object of inference in MDR is the 3 × 3 table of genotypes at two variants.Keywords: digenic traits, DNA variants, epistasis, statistical genetics
Procedia PDF Downloads 122123 Investigating the Impacts on Cyclist Casualty Severity at Roundabouts: A UK Case Study
Authors: Nurten Akgun, Dilum Dissanayake, Neil Thorpe, Margaret C. Bell
Abstract:
Cycling has gained a great attention with comparable speeds, low cost, health benefits and reducing the impact on the environment. The main challenge associated with cycling is the provision of safety for the people choosing to cycle as their main means of transport. From the road safety point of view, cyclists are considered as vulnerable road users because they are at higher risk of serious casualty in the urban network but more specifically at roundabouts. This research addresses the development of an enhanced mathematical model by including a broad spectrum of casualty related variables. These variables were geometric design measures (approach number of lanes and entry path radius), speed limit, meteorological condition variables (light, weather, road surface) and socio-demographic characteristics (age and gender), as well as contributory factors. Contributory factors included driver’s behavior related variables such as failed to look properly, sudden braking, a vehicle passing too close to a cyclist, junction overshot, failed to judge other person’s path, restart moving off at the junction, poor turn or manoeuvre and disobeyed give-way. Tyne and Wear in the UK were selected as a case study area. The cyclist casualty data was obtained from UK STATS19 National dataset. The reference categories for the regression model were set to slight and serious cyclist casualties. Therefore, binary logistic regression was applied. Binary logistic regression analysis showed that approach number of lanes was statistically significant at the 95% level of confidence. A higher number of approach lanes increased the probability of severity of cyclist casualty occurrence. In addition, sudden braking statistically significantly increased the cyclist casualty severity at the 95% level of confidence. The result concluded that cyclist casualty severity was highly related to approach a number of lanes and sudden braking. Further research should be carried out an in-depth analysis to explore connectivity of sudden braking and approach number of lanes in order to investigate the driver’s behavior at approach locations. The output of this research will inform investment in measure to improve the safety of cyclists at roundabouts.Keywords: binary logistic regression, casualty severity, cyclist safety, roundabout
Procedia PDF Downloads 178122 The Trade Flow of Small Association Agreements When Rules of Origin Are Relaxed
Authors: Esmat Kamel
Abstract:
This paper aims to shed light on the extent to which the Agadir Association agreement has fostered inter regional trade between the E.U_26 and the Agadir_4 countries; once that we control for the evolution of Agadir agreement’s exports to the rest of the world. The next valid question will be regarding any remarkable variation in the spatial/sectoral structure of exports, and to what extent has it been induced by the Agadir agreement itself and precisely after the adoption of rules of origin and the PANEURO diagonal cumulative scheme? The paper’s empirical dataset covering a timeframe from [2000 -2009] was designed to account for sector specific export and intermediate flows and the bilateral structured gravity model was custom tailored to capture sector and regime specific rules of origin and the Poisson Pseudo Maximum Likelihood Estimator was used to calculate the gravity equation. The methodological approach of this work is considered to be a threefold one which starts first by conducting a ‘Hierarchal Cluster Analysis’ to classify final export flows showing a certain degree of linkage between each other. The analysis resulted in three main sectoral clusters of exports between Agadir_4 and E.U_26: cluster 1 for Petrochemical related sectors, cluster 2 durable goods and finally cluster 3 for heavy duty machinery and spare parts sectors. Second step continues by taking export flows resulting from the 3 clusters to be subject to treatment with diagonal Rules of origin through ‘The Double Differences Approach’, versus an equally comparable untreated control group. Third step is to verify results through a robustness check applied by ‘Propensity Score Matching’ to validate that the same sectoral final export and intermediate flows increased when rules of origin were relaxed. Through all the previous analysis, a remarkable and partial significance of the interaction term combining both treatment effects and time for the coefficients of 13 out of the 17 covered sectors turned out to be partially significant and it further asserted that treatment with diagonal rules of origin contributed in increasing Agadir’s_4 final and intermediate exports to the E.U._26 on average by 335% and in changing Agadir_4 exports structure and composition to the E.U._26 countries.Keywords: agadir association agreement, structured gravity model, hierarchal cluster analysis, double differences estimation, propensity score matching, diagonal and relaxed rules of origin
Procedia PDF Downloads 319121 Development of a Bi-National Thyroid Cancer Clinical Quality Registry
Authors: Liane J. Ioannou, Jonathan Serpell, Joanne Dean, Cino Bendinelli, Jenny Gough, Dean Lisewski, Julie Miller, Win Meyer-Rochow, Stan Sidhu, Duncan Topliss, David Walters, John Zalcberg, Susannah Ahern
Abstract:
Background: The occurrence of thyroid cancer is increasing throughout the developed world, including Australia and New Zealand, and since the 1990s has become the fastest increasing malignancy. Following the success of a number of institutional databases that monitor outcomes after thyroid surgery, the Australian and New Zealand Endocrine Surgeons (ANZES) agreed to auspice the development of a bi-national thyroid cancer registry. Objectives: To establish a bi-national population-based clinical quality registry with the aim of monitoring and improving the quality of care provided to patients diagnosed with thyroid cancer in Australia and New Zealand. Patients and Methods: The Australian and New Zealand Thyroid Cancer Registry (ANZTCR) captures clinical data for all patients, over the age of 18 years, diagnosed with thyroid cancer, confirmed by histopathology report, that have been diagnosed, assessed or treated at a contributing hospital. Data is collected by endocrine surgeons using a web-based interface, REDCap, primarily via direct data entry. Results: A multi-disciplinary Steering Committee was formed, and with operational support from Monash University the ANZTCR was established in early 2017. The pilot phase of the registry is currently operating in Victoria, New South Wales, Queensland, Western Australia and South Australia, with over 30 sites expected to come on board across Australia and New Zealand in 2018. A modified-Delphi process was undertaken to determine the key quality indicators to be reported by the registry, and a minimum dataset was developed comprising information regarding thyroid cancer diagnosis, pathology, surgery, and 30-day follow up. Conclusion: There are very few established thyroid cancer registries internationally, yet clinical quality registries have shown valuable outcomes and patient benefits in other cancers. The establishment of the ANZTCR provides the opportunity for Australia and New Zealand to further understand the current practice in the treatment of thyroid cancer and reasons for variation in outcomes. The engagement of endocrine surgeons in supporting this initiative is crucial. While the pilot registry has a focus on early clinical outcomes, it is anticipated that future collection of longer-term outcome data particularly for patients with the poor prognostic disease will add significant further value to the registry.Keywords: thyroid cancer, clinical registry, population health, quality improvement
Procedia PDF Downloads 193120 Knowledge Graph Development to Connect Earth Metadata and Standard English Queries
Authors: Gabriel Montague, Max Vilgalys, Catherine H. Crawford, Jorge Ortiz, Dava Newman
Abstract:
There has never been so much publicly accessible atmospheric and environmental data. The possibilities of these data are exciting, but the sheer volume of available datasets represents a new challenge for researchers. The task of identifying and working with a new dataset has become more difficult with the amount and variety of available data. Datasets are often documented in ways that differ substantially from the common English used to describe the same topics. This presents a barrier not only for new scientists, but for researchers looking to find comparisons across multiple datasets or specialists from other disciplines hoping to collaborate. This paper proposes a method for addressing this obstacle: creating a knowledge graph to bridge the gap between everyday English language and the technical language surrounding these datasets. Knowledge graph generation is already a well-established field, although there are some unique challenges posed by working with Earth data. One is the sheer size of the databases – it would be infeasible to replicate or analyze all the data stored by an organization like The National Aeronautics and Space Administration (NASA) or the European Space Agency. Instead, this approach identifies topics from metadata available for datasets in NASA’s Earthdata database, which can then be used to directly request and access the raw data from NASA. By starting with a single metadata standard, this paper establishes an approach that can be generalized to different databases, but leaves the challenge of metadata harmonization for future work. Topics generated from the metadata are then linked to topics from a collection of English queries through a variety of standard and custom natural language processing (NLP) methods. The results from this method are then compared to a baseline of elastic search applied to the metadata. This comparison shows the benefits of the proposed knowledge graph system over existing methods, particularly in interpreting natural language queries and interpreting topics in metadata. For the research community, this work introduces an application of NLP to the ecological and environmental sciences, expanding the possibilities of how machine learning can be applied in this discipline. But perhaps more importantly, it establishes the foundation for a platform that can enable common English to access knowledge that previously required considerable effort and experience. By making this public data accessible to the full public, this work has the potential to transform environmental understanding, engagement, and action.Keywords: earth metadata, knowledge graphs, natural language processing, question-answer systems
Procedia PDF Downloads 149