Search results for: multiwavelenght dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1094

Search results for: multiwavelenght dataset

134 Design and Implementation of Generative Models for Odor Classification Using Electronic Nose

Authors: Kumar Shashvat, Amol P. Bhondekar

Abstract:

In the midst of the five senses, odor is the most reminiscent and least understood. Odor testing has been mysterious and odor data fabled to most practitioners. The delinquent of recognition and classification of odor is important to achieve. The facility to smell and predict whether the artifact is of further use or it has become undesirable for consumption; the imitation of this problem hooked on a model is of consideration. The general industrial standard for this classification is color based anyhow; odor can be improved classifier than color based classification and if incorporated in machine will be awfully constructive. For cataloging of odor for peas, trees and cashews various discriminative approaches have been used Discriminative approaches offer good prognostic performance and have been widely used in many applications but are incapable to make effectual use of the unlabeled information. In such scenarios, generative approaches have better applicability, as they are able to knob glitches, such as in set-ups where variability in the series of possible input vectors is enormous. Generative models are integrated in machine learning for either modeling data directly or as a transitional step to form an indeterminate probability density function. The algorithms or models Linear Discriminant Analysis and Naive Bayes Classifier have been used for classification of the odor of cashews. Linear Discriminant Analysis is a method used in data classification, pattern recognition, and machine learning to discover a linear combination of features that typifies or divides two or more classes of objects or procedures. The Naive Bayes algorithm is a classification approach base on Bayes rule and a set of qualified independence theory. Naive Bayes classifiers are highly scalable, requiring a number of restraints linear in the number of variables (features/predictors) in a learning predicament. The main recompenses of using the generative models are generally a Generative Models make stronger assumptions about the data, specifically, about the distribution of predictors given the response variables. The Electronic instrument which is used for artificial odor sensing and classification is an electronic nose. This device is designed to imitate the anthropological sense of odor by providing an analysis of individual chemicals or chemical mixtures. The experimental results have been evaluated in the form of the performance measures i.e. are accuracy, precision and recall. The investigational results have proven that the overall performance of the Linear Discriminant Analysis was better in assessment to the Naive Bayes Classifier on cashew dataset.

Keywords: odor classification, generative models, naive bayes, linear discriminant analysis

Procedia PDF Downloads 357
133 Using Arellano-Bover/Blundell-Bond Estimator in Dynamic Panel Data Analysis – Case of Finnish Housing Price Dynamics

Authors: Janne Engblom, Elias Oikarinen

Abstract:

A panel dataset is one that follows a given sample of individuals over time, and thus provides multiple observations on each individual in the sample. Panel data models include a variety of fixed and random effects models which form a wide range of linear models. A special case of panel data models are dynamic in nature. A complication regarding a dynamic panel data model that includes the lagged dependent variable is endogeneity bias of estimates. Several approaches have been developed to account for this problem. In this paper, the panel models were estimated using the Arellano-Bover/Blundell-Bond Generalized method of moments (GMM) estimator which is an extension of the Arellano-Bond model where past values and different transformations of past values of the potentially problematic independent variable are used as instruments together with other instrumental variables. The Arellano–Bover/Blundell–Bond estimator augments Arellano–Bond by making an additional assumption that first differences of instrument variables are uncorrelated with the fixed effects. This allows the introduction of more instruments and can dramatically improve efficiency. It builds a system of two equations—the original equation and the transformed one—and is also known as system GMM. In this study, Finnish housing price dynamics were examined empirically by using the Arellano–Bover/Blundell–Bond estimation technique together with ordinary OLS. The aim of the analysis was to provide a comparison between conventional fixed-effects panel data models and dynamic panel data models. The Arellano–Bover/Blundell–Bond estimator is suitable for this analysis for a number of reasons: It is a general estimator designed for situations with 1) a linear functional relationship; 2) one left-hand-side variable that is dynamic, depending on its own past realizations; 3) independent variables that are not strictly exogenous, meaning they are correlated with past and possibly current realizations of the error; 4) fixed individual effects; and 5) heteroskedasticity and autocorrelation within individuals but not across them. Based on data of 14 Finnish cities over 1988-2012 differences of short-run housing price dynamics estimates were considerable when different models and instrumenting were used. Especially, the use of different instrumental variables caused variation of model estimates together with their statistical significance. This was particularly clear when comparing estimates of OLS with different dynamic panel data models. Estimates provided by dynamic panel data models were more in line with theory of housing price dynamics.

Keywords: dynamic model, fixed effects, panel data, price dynamics

Procedia PDF Downloads 1443
132 Assessing Online Learning Paths in an Learning Management Systems Using a Data Mining and Machine Learning Approach

Authors: Alvaro Figueira, Bruno Cabral

Abstract:

Nowadays, students are used to be assessed through an online platform. Educators have stepped up from a period in which they endured the transition from paper to digital. The use of a diversified set of question types that range from quizzes to open questions is currently common in most university courses. In many courses, today, the evaluation methodology also fosters the students’ online participation in forums, the download, and upload of modified files, or even the participation in group activities. At the same time, new pedagogy theories that promote the active participation of students in the learning process, and the systematic use of problem-based learning, are being adopted using an eLearning system for that purpose. However, although there can be a lot of feedback from these activities to student’s, usually it is restricted to the assessments of online well-defined tasks. In this article, we propose an automatic system that informs students of abnormal deviations of a 'correct' learning path in the course. Our approach is based on the fact that by obtaining this information earlier in the semester, may provide students and educators an opportunity to resolve an eventual problem regarding the student’s current online actions towards the course. Our goal is to prevent situations that have a significant probability to lead to a poor grade and, eventually, to failing. In the major learning management systems (LMS) currently available, the interaction between the students and the system itself is registered in log files in the form of registers that mark beginning of actions performed by the user. Our proposed system uses that logged information to derive new one: the time each student spends on each activity, the time and order of the resources used by the student and, finally, the online resource usage pattern. Then, using the grades assigned to the students in previous years, we built a learning dataset that is used to feed a machine learning meta classifier. The produced classification model is then used to predict the grades a learning path is heading to, in the current year. Not only this approach serves the teacher, but also the student to receive automatic feedback on her current situation, having past years as a perspective. Our system can be applied to online courses that integrate the use of an online platform that stores user actions in a log file, and that has access to other student’s evaluations. The system is based on a data mining process on the log files and on a self-feedback machine learning algorithm that works paired with the Moodle LMS.

Keywords: data mining, e-learning, grade prediction, machine learning, student learning path

Procedia PDF Downloads 101
131 Integrating Data Mining with Case-Based Reasoning for Diagnosing Sorghum Anthracnose

Authors: Mariamawit T. Belete

Abstract:

Cereal production and marketing are the means of livelihood for millions of households in Ethiopia. However, cereal production is constrained by technical and socio-economic factors. Among the technical factors, cereal crop diseases are the major contributing factors to the low yield. The aim of this research is to develop an integration of data mining and knowledge based system for sorghum anthracnose disease diagnosis that assists agriculture experts and development agents to make timely decisions. Anthracnose diagnosing systems gather information from Melkassa agricultural research center and attempt to score anthracnose severity scale. Empirical research is designed for data exploration, modeling, and confirmatory procedures for testing hypothesis and prediction to draw a sound conclusion. WEKA (Waikato Environment for Knowledge Analysis) was employed for the modeling. Knowledge based system has come across a variety of approaches based on the knowledge representation method; case-based reasoning (CBR) is one of the popular approaches used in knowledge-based system. CBR is a problem solving strategy that uses previous cases to solve new problems. The system utilizes hidden knowledge extracted by employing clustering algorithms, specifically K-means clustering from sampled anthracnose dataset. Clustered cases with centroid value are mapped to jCOLIBRI, and then the integrator application is created using NetBeans with JDK 8.0.2. The important part of a case based reasoning model includes case retrieval; the similarity measuring stage, reuse; which allows domain expert to transfer retrieval case solution to suit for the current case, revise; to test the solution, and retain to store the confirmed solution to the case base for future use. Evaluation of the system was done for both system performance and user acceptance. For testing the prototype, seven test cases were used. Experimental result shows that the system achieves an average precision and recall values of 70% and 83%, respectively. User acceptance testing also performed by involving five domain experts, and an average of 83% acceptance is achieved. Although the result of this study is promising, however, further study should be done an investigation on hybrid approach such as rule based reasoning, and pictorial retrieval process are recommended.

Keywords: sorghum anthracnose, data mining, case based reasoning, integration

Procedia PDF Downloads 61
130 Dividend Policy in Family Controlling Firms from a Governance Perspective: Empirical Evidence in Thailand

Authors: Tanapond S.

Abstract:

Typically, most of the controlling firms are relate to family firms which are widespread and important for economic growth particularly in Asian Pacific region. The unique characteristics of the controlling families tend to play an important role in determining the corporate policies such as dividend policy. Given the complexity of the family business phenomenon, the empirical evidence has been unclear on how the families behind business groups influence dividend policy in Asian markets with the prevalent existence of cross-shareholdings and pyramidal structure. Dividend policy as one of an important determinant of firm value could also be implemented in order to examine the effect of the controlling families behind business groups on strategic decisions-making in terms of a governance perspective and agency problems. The purpose of this paper is to investigate the impact of ownership structure and concentration which are influential internal corporate governance mechanisms in family firms on dividend decision-making. Using panel data and constructing a unique dataset of family ownership and control through hand-collecting information from the nonfinancial companies listed in Stock Exchange of Thailand (SET) between 2000 and 2015, the study finds that family firms with large stakes distribute higher dividends than family firms with small stakes. Family ownership can mitigate the agency problems and the expropriation of minority investors in family firms. To provide insight into the distinguish between ownership rights and control rights, this study examines specific firm characteristics including the degrees of concentration of controlling shareholders by classifying family ownership in different categories. The results show that controlling families with large deviation between voting rights and cash flow rights have more power and affect lower dividend payment. These situations become worse when second blockholders are families. To the best knowledge of the researcher, this study is the first to examine the association between family firms’ characteristics and dividend policy from the corporate governance perspectives in Thailand with weak investor protection environment and high ownership concentration. This research also underscores the importance of family control especially in a context in which family business groups and pyramidal structure are prevalent. As a result, academics and policy makers can develop markets and corporate policies to eliminate agency problem.

Keywords: agency theory, dividend policy, family control, Thailand

Procedia PDF Downloads 252
129 Profiling Risky Code Using Machine Learning

Authors: Zunaira Zaman, David Bohannon

Abstract:

This study explores the application of machine learning (ML) for detecting security vulnerabilities in source code. The research aims to assist organizations with large application portfolios and limited security testing capabilities in prioritizing security activities. ML-based approaches offer benefits such as increased confidence scores, false positives and negatives tuning, and automated feedback. The initial approach using natural language processing techniques to extract features achieved 86% accuracy during the training phase but suffered from overfitting and performed poorly on unseen datasets during testing. To address these issues, the study proposes using the abstract syntax tree (AST) for Java and C++ codebases to capture code semantics and structure and generate path-context representations for each function. The Code2Vec model architecture is used to learn distributed representations of source code snippets for training a machine-learning classifier for vulnerability prediction. The study evaluates the performance of the proposed methodology using two datasets and compares the results with existing approaches. The Devign dataset yielded 60% accuracy in predicting vulnerable code snippets and helped resist overfitting, while the Juliet Test Suite predicted specific vulnerabilities such as OS-Command Injection, Cryptographic, and Cross-Site Scripting vulnerabilities. The Code2Vec model achieved 75% accuracy and a 98% recall rate in predicting OS-Command Injection vulnerabilities. The study concludes that even partial AST representations of source code can be useful for vulnerability prediction. The approach has the potential for automated intelligent analysis of source code, including vulnerability prediction on unseen source code. State-of-the-art models using natural language processing techniques and CNN models with ensemble modelling techniques did not generalize well on unseen data and faced overfitting issues. However, predicting vulnerabilities in source code using machine learning poses challenges such as high dimensionality and complexity of source code, imbalanced datasets, and identifying specific types of vulnerabilities. Future work will address these challenges and expand the scope of the research.

Keywords: code embeddings, neural networks, natural language processing, OS command injection, software security, code properties

Procedia PDF Downloads 79
128 Surface Elevation Dynamics Assessment Using Digital Elevation Models, Light Detection and Ranging, GPS and Geospatial Information Science Analysis: Ecosystem Modelling Approach

Authors: Ali K. M. Al-Nasrawi, Uday A. Al-Hamdany, Sarah M. Hamylton, Brian G. Jones, Yasir M. Alyazichi

Abstract:

Surface elevation dynamics have always responded to disturbance regimes. Creating Digital Elevation Models (DEMs) to detect surface dynamics has led to the development of several methods, devices and data clouds. DEMs can provide accurate and quick results with cost efficiency, in comparison to the inherited geomatics survey techniques. Nowadays, remote sensing datasets have become a primary source to create DEMs, including LiDAR point clouds with GIS analytic tools. However, these data need to be tested for error detection and correction. This paper evaluates various DEMs from different data sources over time for Apple Orchard Island, a coastal site in southeastern Australia, in order to detect surface dynamics. Subsequently, 30 chosen locations were examined in the field to test the error of the DEMs surface detection using high resolution global positioning systems (GPSs). Results show significant surface elevation changes on Apple Orchard Island. Accretion occurred on most of the island while surface elevation loss due to erosion is limited to the northern and southern parts. Concurrently, the projected differential correction and validation method aimed to identify errors in the dataset. The resultant DEMs demonstrated a small error ratio (≤ 3%) from the gathered datasets when compared with the fieldwork survey using RTK-GPS. As modern modelling approaches need to become more effective and accurate, applying several tools to create different DEMs on a multi-temporal scale would allow easy predictions in time-cost-frames with more comprehensive coverage and greater accuracy. With a DEM technique for the eco-geomorphic context, such insights about the ecosystem dynamic detection, at such a coastal intertidal system, would be valuable to assess the accuracy of the predicted eco-geomorphic risk for the conservation management sustainability. Demonstrating this framework to evaluate the historical and current anthropogenic and environmental stressors on coastal surface elevation dynamism could be profitably applied worldwide.

Keywords: DEMs, eco-geomorphic-dynamic processes, geospatial Information Science, remote sensing, surface elevation changes,

Procedia PDF Downloads 247
127 In silico Designing of Imidazo [4,5-b] Pyridine as a Probable Lead for Potent Decaprenyl Phosphoryl-β-D-Ribose 2′-Epimerase (DprE1) Inhibitors as Antitubercular Agents

Authors: Jineetkumar Gawad, Chandrakant Bonde

Abstract:

Tuberculosis (TB) is a major worldwide concern whose control has been exacerbated by HIV, the rise of multidrug-resistance (MDR-TB) and extensively drug resistance (XDR-TB) strains of Mycobacterium tuberculosis. The interest for newer and faster acting antitubercular drugs are more remarkable than any time. To search potent compounds is need and challenge for researchers. Here, we tried to design lead for inhibition of Decaprenyl phosphoryl-β-D-ribose 2′-epimerase (DprE1) enzyme. Arabinose is an essential constituent of mycobacterial cell wall. DprE1 is a flavoenzyme that converts decaprenylphosphoryl-D-ribose into decaprenylphosphoryl-2-keto-ribose, which is intermediate in biosynthetic pathway of arabinose. Latter, DprE2 converts keto-ribose into decaprenylphosphoryl-D-arabinose. We had a selection of 23 compounds from azaindole series for computational study, and they were drawn using marvisketch. Ligands were prepared using Maestro molecular modeling interface, Schrodinger, v10.5. Common pharmacophore hypotheses were developed by applying dataset thresholds to yield active and inactive set of compounds. There were 326 hypotheses were developed. On the basis of survival score, ADRRR (Survival Score: 5.453) was selected. Selected pharmacophore hypotheses were subjected to virtual screening results into 1000 hits. Hits were prepared and docked with protein 4KW5 (oxydoreductase inhibitor) was downloaded in .pdb format from RCSB Protein Data Bank. Protein was prepared using protein preparation wizard. Protein was preprocessed, the workspace was analyzed using force field OPLS 2005. Glide grid was generated by picking single atom in molecule. Prepared ligands were docked with prepared protein 4KW5 using Glide docking. After docking, on the basis of glide score top-five compounds were selected, (5223, 5812, 0661, 0662, and 2945) and the glide docking score (-8.928, -8.534, -8.412, -8.411, -8.351) respectively. There were interactions of ligand and protein, specifically HIS 132, LYS 418, TRY 230, ASN 385. Pi-pi stacking was observed in few compounds with basic Imidazo [4,5-b] pyridine ring. We had basic azaindole ring in parent compounds, but after glide docking, we received compounds with Imidazo [4,5-b] pyridine as a basic ring. That might be the new lead in the process of drug discovery.

Keywords: DprE1 inhibitors, in silico drug designing, imidazo [4, 5-b] pyridine, lead, tuberculosis

Procedia PDF Downloads 129
126 Tumor Size and Lymph Node Metastasis Detection in Colon Cancer Patients Using MR Images

Authors: Mohammadreza Hedyehzadeh, Mahdi Yousefi

Abstract:

Colon cancer is one of the most common cancer, which predicted to increase its prevalence due to the bad eating habits of peoples. Nowadays, due to the busyness of people, the use of fast foods is increasing, and therefore, diagnosis of this disease and its treatment are of particular importance. To determine the best treatment approach for each specific colon cancer patients, the oncologist should be known the stage of the tumor. The most common method to determine the tumor stage is TNM staging system. In this system, M indicates the presence of metastasis, N indicates the extent of spread to the lymph nodes, and T indicates the size of the tumor. It is clear that in order to determine all three of these parameters, an imaging method must be used, and the gold standard imaging protocols for this purpose are CT and PET/CT. In CT imaging, due to the use of X-rays, the risk of cancer and the absorbed dose of the patient is high, while in the PET/CT method, there is a lack of access to the device due to its high cost. Therefore, in this study, we aimed to estimate the tumor size and the extent of its spread to the lymph nodes using MR images. More than 1300 MR images collected from the TCIA portal, and in the first step (pre-processing), histogram equalization to improve image qualities and resizing to get the same image size was done. Two expert radiologists, which work more than 21 years on colon cancer cases, segmented the images and extracted the tumor region from the images. The next step is feature extraction from segmented images and then classify the data into three classes: T0N0، T3N1 و T3N2. In this article, the VGG-16 convolutional neural network has been used to perform both of the above-mentioned tasks, i.e., feature extraction and classification. This network has 13 convolution layers for feature extraction and three fully connected layers with the softmax activation function for classification. In order to validate the proposed method, the 10-fold cross validation method used in such a way that the data was randomly divided into three parts: training (70% of data), validation (10% of data) and the rest for testing. It is repeated 10 times, each time, the accuracy, sensitivity and specificity of the model are calculated and the average of ten repetitions is reported as the result. The accuracy, specificity and sensitivity of the proposed method for testing dataset was 89/09%, 95/8% and 96/4%. Compared to previous studies, using a safe imaging technique (MRI) and non-use of predefined hand-crafted imaging features to determine the stage of colon cancer patients are some of the study advantages.

Keywords: colon cancer, VGG-16, magnetic resonance imaging, tumor size, lymph node metastasis

Procedia PDF Downloads 38
125 Comparison of Parametric and Bayesian Survival Regression Models in Simulated and HIV Patient Antiretroviral Therapy Data: Case Study of Alamata Hospital, North Ethiopia

Authors: Zeytu G. Asfaw, Serkalem K. Abrha, Demisew G. Degefu

Abstract:

Background: HIV/AIDS remains a major public health problem in Ethiopia and heavily affecting people of productive and reproductive age. We aimed to compare the performance of Parametric Survival Analysis and Bayesian Survival Analysis using simulations and in a real dataset application focused on determining predictors of HIV patient survival. Methods: A Parametric Survival Models - Exponential, Weibull, Log-normal, Log-logistic, Gompertz and Generalized gamma distributions were considered. Simulation study was carried out with two different algorithms that were informative and noninformative priors. A retrospective cohort study was implemented for HIV infected patients under Highly Active Antiretroviral Therapy in Alamata General Hospital, North Ethiopia. Results: A total of 320 HIV patients were included in the study where 52.19% females and 47.81% males. According to Kaplan-Meier survival estimates for the two sex groups, females has shown better survival time in comparison with their male counterparts. The median survival time of HIV patients was 79 months. During the follow-up period 89 (27.81%) deaths and 231 (72.19%) censored individuals registered. The average baseline cluster of differentiation 4 (CD4) cells count for HIV/AIDS patients were 126.01 but after a three-year antiretroviral therapy follow-up the average cluster of differentiation 4 (CD4) cells counts were 305.74, which was quite encouraging. Age, functional status, tuberculosis screen, past opportunistic infection, baseline cluster of differentiation 4 (CD4) cells, World Health Organization clinical stage, sex, marital status, employment status, occupation type, baseline weight were found statistically significant factors for longer survival of HIV patients. The standard error of all covariate in Bayesian log-normal survival model is less than the classical one. Hence, Bayesian survival analysis showed better performance than classical parametric survival analysis, when subjective data analysis was performed by considering expert opinions and historical knowledge about the parameters. Conclusions: Thus, HIV/AIDS patient mortality rate could be reduced through timely antiretroviral therapy with special care on the potential factors. Moreover, Bayesian log-normal survival model was preferable than the classical log-normal survival model for determining predictors of HIV patients survival.

Keywords: antiretroviral therapy (ART), Bayesian analysis, HIV, log-normal, parametric survival models

Procedia PDF Downloads 162
124 Computational Characterization of Electronic Charge Transfer in Interfacial Phospholipid-Water Layers

Authors: Samira Baghbanbari, A. B. P. Lever, Payam S. Shabestari, Donald Weaver

Abstract:

Existing signal transmission models, although undoubtedly useful, have proven insufficient to explain the full complexity of information transfer within the central nervous system. The development of transformative models will necessitate a more comprehensive understanding of neuronal lipid membrane electrophysiology. Pursuant to this goal, the role of highly organized interfacial phospholipid-water layers emerges as a promising case study. A series of phospholipids in neural-glial gap junction interfaces as well as cholesterol molecules have been computationally modelled using high-performance density functional theory (DFT) calculations. Subsequent 'charge decomposition analysis' calculations have revealed a net transfer of charge from phospholipid orbitals through the organized interfacial water layer before ultimately finding its way to cholesterol acceptor molecules. The specific pathway of charge transfer from phospholipid via water layers towards cholesterol has been mapped in detail. Cholesterol is an essential membrane component that is overrepresented in neuronal membranes as compared to other mammalian cells; given this relative abundance, its apparent role as an electronic acceptor may prove to be a relevant factor in further signal transmission studies of the central nervous system. The timescales over which this electronic charge transfer occurs have also been evaluated by utilizing a system design that systematically increases the number of water molecules separating lipids and cholesterol. Memory loss through hydrogen-bonded networks in water can occur at femtosecond timescales, whereas existing action potential-based models are limited to micro or nanosecond scales. As such, the development of future models that attempt to explain faster timescale signal transmission in the central nervous system may benefit from our work, which provides additional information regarding fast timescale energy transfer mechanisms occurring through interfacial water. The study possesses a dataset that includes six distinct phospholipids and a collection of cholesterol. Ten optimized geometric characteristics (features) were employed to conduct binary classification through an artificial neural network (ANN), differentiating cholesterol from the various phospholipids. This stems from our understanding that all lipids within the first group function as electronic charge donors, while cholesterol serves as an electronic charge acceptor.

Keywords: charge transfer, signal transmission, phospholipids, water layers, ANN

Procedia PDF Downloads 40
123 Influence of Atmospheric Circulation Patterns on Dust Pollution Transport during the Harmattan Period over West Africa

Authors: Ayodeji Oluleye

Abstract:

This study used Total Ozone Mapping Spectrometer (TOMS) Aerosol Index (AI) and reanalysis dataset of thirty years (1983-2012) to investigate the influence of the atmospheric circulation on dust transport during the Harmattan period over WestAfrica using TOMS data. The Harmattan dust mobilization and atmospheric circulation pattern were evaluated using a kernel density estimate which shows the areas where most points are concentrated between the variables. The evolution of the Inter-Tropical Discontinuity (ITD), Sea surface Temperature (SST) over the Gulf of Guinea, and the North Atlantic Oscillation (NAO) index during the Harmattan period (November-March) was also analyzed and graphs of the average ITD positions, SST and the NAO were observed on daily basis. The Pearson moment correlation analysis was also employed to assess the effect of atmospheric circulation on Harmattan dust transport. The results show that the departure (increased) of TOMS AI values from the long-term mean (1.64) occurred from around 21st of December, which signifies the rich dust days during winter period. Strong TOMS AI signal were observed from January to March with the maximum occurring in the latter months (February and March). The inter-annual variability of TOMSAI revealed that the rich dust years were found between 1984-1985, 1987-1988, 1997-1998, 1999-2000, and 2002-2004. Significantly, poor dust year was found between 2005 and 2006 in all the periods. The study has found strong north-easterly (NE) trade winds were over most of the Sahelianregion of West Africa during the winter months with the maximum wind speed reaching 8.61m/s inJanuary.The strength of NE winds determines the extent of dust transport to the coast of Gulf of Guinea during winter. This study has confirmed that the presence of the Harmattan is strongly dependent on theSST over Atlantic Ocean and ITD position. The locus of the average SST and ITD positions over West Africa could be described by polynomial functions. The study concludes that the evolution of near surface wind field at 925 hpa, and the variations of SST and ITD positions are the major large scale atmospheric circulation systems driving the emission, distribution, and transport of Harmattan dust aerosols over West Africa. However, the influence of NAO was shown to have fewer significance effects on the Harmattan dust transport over the region.

Keywords: atmospheric circulation, dust aerosols, Harmattan, West Africa

Procedia PDF Downloads 292
122 Different Data-Driven Bivariate Statistical Approaches to Landslide Susceptibility Mapping (Uzundere, Erzurum, Turkey)

Authors: Azimollah Aleshzadeh, Enver Vural Yavuz

Abstract:

The main goal of this study is to produce landslide susceptibility maps using different data-driven bivariate statistical approaches; namely, entropy weight method (EWM), evidence belief function (EBF), and information content model (ICM), at Uzundere county, Erzurum province, in the north-eastern part of Turkey. Past landslide occurrences were identified and mapped from an interpretation of high-resolution satellite images, and earlier reports as well as by carrying out field surveys. In total, 42 landslide incidence polygons were mapped using ArcGIS 10.4.1 software and randomly split into a construction dataset 70 % (30 landslide incidences) for building the EWM, EBF, and ICM models and the remaining 30 % (12 landslides incidences) were used for verification purposes. Twelve layers of landslide-predisposing parameters were prepared, including total surface radiation, maximum relief, soil groups, standard curvature, distance to stream/river sites, distance to the road network, surface roughness, land use pattern, engineering geological rock group, topographical elevation, the orientation of slope, and terrain slope gradient. The relationships between the landslide-predisposing parameters and the landslide inventory map were determined using different statistical models (EWM, EBF, and ICM). The model results were validated with landslide incidences, which were not used during the model construction. In addition, receiver operating characteristic curves were applied, and the area under the curve (AUC) was determined for the different susceptibility maps using the success (construction data) and prediction (verification data) rate curves. The results revealed that the AUC for success rates are 0.7055, 0.7221, and 0.7368, while the prediction rates are 0.6811, 0.6997, and 0.7105 for EWM, EBF, and ICM models, respectively. Consequently, landslide susceptibility maps were classified into five susceptibility classes, including very low, low, moderate, high, and very high. Additionally, the portion of construction and verification landslides incidences in high and very high landslide susceptibility classes in each map was determined. The results showed that the EWM, EBF, and ICM models produced satisfactory accuracy. The obtained landslide susceptibility maps may be useful for future natural hazard mitigation studies and planning purposes for environmental protection.

Keywords: entropy weight method, evidence belief function, information content model, landslide susceptibility mapping

Procedia PDF Downloads 106
121 Spectrogram Pre-Processing to Improve Isotopic Identification to Discriminate Gamma and Neutrons Sources

Authors: Mustafa Alhamdi

Abstract:

Industrial application to classify gamma rays and neutron events is investigated in this study using deep machine learning. The identification using a convolutional neural network and recursive neural network showed a significant improvement in predication accuracy in a variety of applications. The ability to identify the isotope type and activity from spectral information depends on feature extraction methods, followed by classification. The features extracted from the spectrum profiles try to find patterns and relationships to present the actual spectrum energy in low dimensional space. Increasing the level of separation between classes in feature space improves the possibility to enhance classification accuracy. The nonlinear nature to extract features by neural network contains a variety of transformation and mathematical optimization, while principal component analysis depends on linear transformations to extract features and subsequently improve the classification accuracy. In this paper, the isotope spectrum information has been preprocessed by finding the frequencies components relative to time and using them as a training dataset. Fourier transform implementation to extract frequencies component has been optimized by a suitable windowing function. Training and validation samples of different isotope profiles interacted with CdTe crystal have been simulated using Geant4. The readout electronic noise has been simulated by optimizing the mean and variance of normal distribution. Ensemble learning by combing voting of many models managed to improve the classification accuracy of neural networks. The ability to discriminate gamma and neutron events in a single predication approach using deep machine learning has shown high accuracy using deep learning. The paper findings show the ability to improve the classification accuracy by applying the spectrogram preprocessing stage to the gamma and neutron spectrums of different isotopes. Tuning deep machine learning models by hyperparameter optimization of neural network models enhanced the separation in the latent space and provided the ability to extend the number of detected isotopes in the training database. Ensemble learning contributed significantly to improve the final prediction.

Keywords: machine learning, nuclear physics, Monte Carlo simulation, noise estimation, feature extraction, classification

Procedia PDF Downloads 123
120 Systematic Evaluation of Convolutional Neural Network on Land Cover Classification from Remotely Sensed Images

Authors: Eiman Kattan, Hong Wei

Abstract:

In using Convolutional Neural Network (CNN) for classification, there is a set of hyperparameters available for the configuration purpose. This study aims to evaluate the impact of a range of parameters in CNN architecture i.e. AlexNet on land cover classification based on four remotely sensed datasets. The evaluation tests the influence of a set of hyperparameters on the classification performance. The parameters concerned are epoch values, batch size, and convolutional filter size against input image size. Thus, a set of experiments were conducted to specify the effectiveness of the selected parameters using two implementing approaches, named pertained and fine-tuned. We first explore the number of epochs under several selected batch size values (32, 64, 128 and 200). The impact of kernel size of convolutional filters (1, 3, 5, 7, 10, 15, 20, 25 and 30) was evaluated against the image size under testing (64, 96, 128, 180 and 224), which gave us insight of the relationship between the size of convolutional filters and image size. To generalise the validation, four remote sensing datasets, AID, RSD, UCMerced and RSCCN, which have different land covers and are publicly available, were used in the experiments. These datasets have a wide diversity of input data, such as number of classes, amount of labelled data, and texture patterns. A specifically designed interactive deep learning GPU training platform for image classification (Nvidia Digit) was employed in the experiments. It has shown efficiency in both training and testing. The results have shown that increasing the number of epochs leads to a higher accuracy rate, as expected. However, the convergence state is highly related to datasets. For the batch size evaluation, it has shown that a larger batch size slightly decreases the classification accuracy compared to a small batch size. For example, selecting the value 32 as the batch size on the RSCCN dataset achieves the accuracy rate of 90.34 % at the 11th epoch while decreasing the epoch value to one makes the accuracy rate drop to 74%. On the other extreme, setting an increased value of batch size to 200 decreases the accuracy rate at the 11th epoch is 86.5%, and 63% when using one epoch only. On the other hand, selecting the kernel size is loosely related to data set. From a practical point of view, the filter size 20 produces 70.4286%. The last performed image size experiment shows a dependency in the accuracy improvement. However, an expensive performance gain had been noticed. The represented conclusion opens the opportunities toward a better classification performance in various applications such as planetary remote sensing.

Keywords: CNNs, hyperparamters, remote sensing, land cover, land use

Procedia PDF Downloads 146
119 Cartographic Depiction and Visualization of Wetlands Changes in the North-Western States of India

Authors: Bansal Ashwani

Abstract:

Cartographic depiction and visualization of wetland changes is an important tool to map spatial-temporal information about the wetland dynamics effectively and to comprehend the response of these water bodies in maintaining the groundwater and surrounding ecosystem. This is true for the states of North Western India, i.e., J&K, Himachal, Punjab, and Haryana that are bestowed upon with several natural wetlands in the flood plains or on the courses of its rivers. Thus, the present study documents, analyses and reconstructs the lost wetlands, which existed in the flood plains of the major river basins of these states, i.e., Chenab, Jhelum, Satluj, Beas, Ravi, and Ghagar, in the beginning of the 20th century. To achieve the objective, the study has used multi-temporal datasets since the 1960s using high to medium resolution satellite datasets, e.g., Corona (1960s/70s), Landsat (1990s-2017) and Sentinel (2017). The Sentinel (2017) satellite image has been used for making the wetland inventory owing to its comparatively higher spatial resolution with multi-spectral bands. In addition, historical records, repeated photographs, historical maps, field observations including geomorphological evidence were also used. The water index techniques, i.e., band rationing, normalized difference water index (NDWI), modified NDWI (MNDWI) have been compared and used to map the wetlands. The wetland types found in the north-western states have been categorized under 19 classes suggested by Space Application Centre, India. These enable the researcher to provide with the wetlands inventory and a series of cartographic representation that includes overlaying multiple temporal wetlands extent vectors. A preliminary result shows the general state of wetland shrinkage since the 1960s with varying area shrinkage rate from one wetland to another. In addition, it is observed that majority of wetlands have not been documented so far and even do not have names. Moreover, the purpose is to emphasize their elimination in addition to establishing a baseline dataset that can be a tool for wetland planning and management. Finally, the applicability of cartographic depiction and visualization, historical map sources, repeated photographs and remote sensing data for reconstruction of long term wetlands fluctuations, especially in the northern part of India, will be addressed.

Keywords: cartographic depiction and visualization, wetland changes, NDWI/MDWI, geomorphological evidence and remote sensing

Procedia PDF Downloads 232
118 Escalation of Commitment and Turnover in Top Management Teams

Authors: Dmitriy V. Chulkov

Abstract:

Escalation of commitment is defined as continuation of a project after receiving negative information about it. While literature in management and psychology identified various factors contributing to escalation behavior, this phenomenon has received little analysis in economics, potentially due to the apparent irrationality of escalation. In this study, we present an economic model of escalation with asymmetric information in a principal-agent setup where the agents are responsible for a project selection decision and discover the outcome of the project before the principal. Our theoretical model complements the existing literature on several accounts. First, we link the incentive to escalate commitment to a project with the turnover decision by the manager. When a manager learns the outcome of the project and stops it that reveals that a mistake was made. There is an incentive to continue failing projects and avoid admitting the mistake. This incentive is enhanced when the agent may voluntarily resign from the firm before the outcome of the failing project is revealed, and thus not bear the full extent of reputation damage due to project failure. As long as some successful managers leave the firm for extraneous reasons, outside firms find it difficult to link failing projects with certainty to managers that left a firm. Second, we demonstrate that non-CEO managers have reputation concerns separate from those of the CEO, and thus may escalate commitment to projects they oversee, when such escalation can attenuate damage to reputation from impending project failure. Such incentive for escalation will be present for non-CEO managers if the CEO delegates responsibility for a project to a non-CEO executive. If reputation matters for promotion to the CEO, the incentive for a rising executive to escalate in order to protect reputation is distinct from that of a CEO. Third, our theoretical model is supported by empirical analysis of changes in the firm’s operations measured by the presence of discontinued operations at the time of turnover among the top four members of the top management team. Discontinued operations are indicative of termination of failing projects at a firm. The empirical results demonstrate that in a large dataset of over three thousand publicly traded U.S. firms for a period from 1993 to 2014 turnover by top executives significantly increases the likelihood that the firm discontinues operations. Furthermore, the type of turnover matters as this effect is strongest when at least one non-CEO member of the top management team leaves the firm and when the CEO departure is due to a voluntary resignation and not to a retirement or illness. Empirical results are consistent with the predictions of the theoretical model and suggest that escalation of commitment is primarily observed in decisions by non-CEO members of the top management team.

Keywords: discontinued operations, escalation of commitment, executive turnover, top management teams

Procedia PDF Downloads 346
117 Application of Improved Semantic Communication Technology in Remote Sensing Data Transmission

Authors: Tingwei Shu, Dong Zhou, Chengjun Guo

Abstract:

Semantic communication is an emerging form of communication that realize intelligent communication by extracting semantic information of data at the source and transmitting it, and recovering the data at the receiving end. It can effectively solve the problem of data transmission under the situation of large data volume, low SNR and restricted bandwidth. With the development of Deep Learning, semantic communication further matures and is gradually applied in the fields of the Internet of Things, Uumanned Air Vehicle cluster communication, remote sensing scenarios, etc. We propose an improved semantic communication system for the situation where the data volume is huge and the spectrum resources are limited during the transmission of remote sensing images. At the transmitting, we need to extract the semantic information of remote sensing images, but there are some problems. The traditional semantic communication system based on Convolutional Neural Network cannot take into account the global semantic information and local semantic information of the image, which results in less-than-ideal image recovery at the receiving end. Therefore, we adopt the improved vision-Transformer-based structure as the semantic encoder instead of the mainstream one using CNN to extract the image semantic features. In this paper, we first perform pre-processing operations on remote sensing images to improve the resolution of the images in order to obtain images with more semantic information. We use wavelet transform to decompose the image into high-frequency and low-frequency components, perform bilinear interpolation on the high-frequency components and bicubic interpolation on the low-frequency components, and finally perform wavelet inverse transform to obtain the preprocessed image. We adopt the improved Vision-Transformer structure as the semantic coder to extract and transmit the semantic information of remote sensing images. The Vision-Transformer structure can better train the huge data volume and extract better image semantic features, and adopt the multi-layer self-attention mechanism to better capture the correlation between semantic features and reduce redundant features. Secondly, to improve the coding efficiency, we reduce the quadratic complexity of the self-attentive mechanism itself to linear so as to improve the image data processing speed of the model. We conducted experimental simulations on the RSOD dataset and compared the designed system with a semantic communication system based on CNN and image coding methods such as BGP and JPEG to verify that the method can effectively alleviate the problem of excessive data volume and improve the performance of image data communication.

Keywords: semantic communication, transformer, wavelet transform, data processing

Procedia PDF Downloads 53
116 Applying GIS Geographic Weighted Regression Analysis to Assess Local Factors Impeding Smallholder Farmers from Participating in Agribusiness Markets: A Case Study of Vihiga County, Western Kenya

Authors: Mwehe Mathenge, Ben G. J. S. Sonneveld, Jacqueline E. W. Broerse

Abstract:

Smallholder farmers are important drivers of agriculture productivity, food security, and poverty reduction in Sub-Saharan Africa. However, they are faced with myriad challenges in their efforts at participating in agribusiness markets. How the geographic explicit factors existing at the local level interact to impede smallholder farmers' decision to participates (or not) in agribusiness markets is not well understood. Deconstructing the spatial complexity of the local environment could provide a deeper insight into how geographically explicit determinants promote or impede resource-poor smallholder farmers from participating in agribusiness. This paper’s objective was to identify, map, and analyze local spatial autocorrelation in factors that impede poor smallholders from participating in agribusiness markets. Data were collected using geocoded researcher-administered survey questionnaires from 392 households in Western Kenya. Three spatial statistics methods in geographic information system (GIS) were used to analyze data -Global Moran’s I, Cluster and Outliers Analysis (Anselin Local Moran’s I), and geographically weighted regression. The results of Global Moran’s I reveal the presence of spatial patterns in the dataset that was not caused by spatial randomness of data. Subsequently, Anselin Local Moran’s I result identified spatially and statistically significant local spatial clustering (hot spots and cold spots) in factors hindering smallholder participation. Finally, the geographically weighted regression results unearthed those specific geographic explicit factors impeding market participation in the study area. The results confirm that geographically explicit factors are indispensable in influencing the smallholder farming decisions, and policymakers should take cognizance of them. Additionally, this research demonstrated how geospatial explicit analysis conducted at the local level, using geographically disaggregated data, could help in identifying households and localities where the most impoverished and resource-poor smallholder households reside. In designing spatially targeted interventions, policymakers could benefit from geospatial analysis methods in understanding complex geographic factors and processes that interact to influence smallholder farmers' decision-making processes and choices.

Keywords: agribusiness markets, GIS, smallholder farmers, spatial statistics, disaggregated spatial data

Procedia PDF Downloads 114
115 Real Estate Trend Prediction with Artificial Intelligence Techniques

Authors: Sophia Liang Zhou

Abstract:

For investors, businesses, consumers, and governments, an accurate assessment of future housing prices is crucial to critical decisions in resource allocation, policy formation, and investment strategies. Previous studies are contradictory about macroeconomic determinants of housing price and largely focused on one or two areas using point prediction. This study aims to develop data-driven models to accurately predict future housing market trends in different markets. This work studied five different metropolitan areas representing different market trends and compared three-time lagging situations: no lag, 6-month lag, and 12-month lag. Linear regression (LR), random forest (RF), and artificial neural network (ANN) were employed to model the real estate price using datasets with S&P/Case-Shiller home price index and 12 demographic and macroeconomic features, such as gross domestic product (GDP), resident population, personal income, etc. in five metropolitan areas: Boston, Dallas, New York, Chicago, and San Francisco. The data from March 2005 to December 2018 were collected from the Federal Reserve Bank, FBI, and Freddie Mac. In the original data, some factors are monthly, some quarterly, and some yearly. Thus, two methods to compensate missing values, backfill or interpolation, were compared. The models were evaluated by accuracy, mean absolute error, and root mean square error. The LR and ANN models outperformed the RF model due to RF’s inherent limitations. Both ANN and LR methods generated predictive models with high accuracy ( > 95%). It was found that personal income, GDP, population, and measures of debt consistently appeared as the most important factors. It also showed that technique to compensate missing values in the dataset and implementation of time lag can have a significant influence on the model performance and require further investigation. The best performing models varied for each area, but the backfilled 12-month lag LR models and the interpolated no lag ANN models showed the best stable performance overall, with accuracies > 95% for each city. This study reveals the influence of input variables in different markets. It also provides evidence to support future studies to identify the optimal time lag and data imputing methods for establishing accurate predictive models.

Keywords: linear regression, random forest, artificial neural network, real estate price prediction

Procedia PDF Downloads 80
114 Frequent Pattern Mining for Digenic Human Traits

Authors: Atsuko Okazaki, Jurg Ott

Abstract:

Some genetic diseases (‘digenic traits’) are due to the interaction between two DNA variants. For example, certain forms of Retinitis Pigmentosa (a genetic form of blindness) occur in the presence of two mutant variants, one in the ROM1 gene and one in the RDS gene, while the occurrence of only one of these mutant variants leads to a completely normal phenotype. Detecting such digenic traits by genetic methods is difficult. A common approach to finding disease-causing variants is to compare 100,000s of variants between individuals with a trait (cases) and those without the trait (controls). Such genome-wide association studies (GWASs) have been very successful but hinge on genetic effects of single variants, that is, there should be a difference in allele or genotype frequencies between cases and controls at a disease-causing variant. Frequent pattern mining (FPM) methods offer an avenue at detecting digenic traits even in the absence of single-variant effects. The idea is to enumerate pairs of genotypes (genotype patterns) with each of the two genotypes originating from different variants that may be located at very different genomic positions. What is needed is for genotype patterns to be significantly more common in cases than in controls. Let Y = 2 refer to cases and Y = 1 to controls, with X denoting a specific genotype pattern. We are seeking association rules, ‘X → Y’, with high confidence, P(Y = 2|X), significantly higher than the proportion of cases, P(Y = 2) in the study. Clearly, generally available FPM methods are very suitable for detecting disease-associated genotype patterns. We use fpgrowth as the basic FPM algorithm and built a framework around it to enumerate high-frequency digenic genotype patterns and to evaluate their statistical significance by permutation analysis. Application to a published dataset on opioid dependence furnished results that could not be found with classical GWAS methodology. There were 143 cases and 153 healthy controls, each genotyped for 82 variants in eight genes of the opioid system. The aim was to find out whether any of these variants were disease-associated. The single-variant analysis did not lead to significant results. Application of our FPM implementation resulted in one significant (p < 0.01) genotype pattern with both genotypes in the pattern being heterozygous and originating from two variants on different chromosomes. This pattern occurred in 14 cases and none of the controls. Thus, the pattern seems quite specific to this form of substance abuse and is also rather predictive of disease. An algorithm called Multifactor Dimension Reduction (MDR) was developed some 20 years ago and has been in use in human genetics ever since. This and our algorithms share some similar properties, but they are also very different in other respects. The main difference seems to be that our algorithm focuses on patterns of genotypes while the main object of inference in MDR is the 3 × 3 table of genotypes at two variants.

Keywords: digenic traits, DNA variants, epistasis, statistical genetics

Procedia PDF Downloads 102
113 Investigating the Impacts on Cyclist Casualty Severity at Roundabouts: A UK Case Study

Authors: Nurten Akgun, Dilum Dissanayake, Neil Thorpe, Margaret C. Bell

Abstract:

Cycling has gained a great attention with comparable speeds, low cost, health benefits and reducing the impact on the environment. The main challenge associated with cycling is the provision of safety for the people choosing to cycle as their main means of transport. From the road safety point of view, cyclists are considered as vulnerable road users because they are at higher risk of serious casualty in the urban network but more specifically at roundabouts. This research addresses the development of an enhanced mathematical model by including a broad spectrum of casualty related variables. These variables were geometric design measures (approach number of lanes and entry path radius), speed limit, meteorological condition variables (light, weather, road surface) and socio-demographic characteristics (age and gender), as well as contributory factors. Contributory factors included driver’s behavior related variables such as failed to look properly, sudden braking, a vehicle passing too close to a cyclist, junction overshot, failed to judge other person’s path, restart moving off at the junction, poor turn or manoeuvre and disobeyed give-way. Tyne and Wear in the UK were selected as a case study area. The cyclist casualty data was obtained from UK STATS19 National dataset. The reference categories for the regression model were set to slight and serious cyclist casualties. Therefore, binary logistic regression was applied. Binary logistic regression analysis showed that approach number of lanes was statistically significant at the 95% level of confidence. A higher number of approach lanes increased the probability of severity of cyclist casualty occurrence. In addition, sudden braking statistically significantly increased the cyclist casualty severity at the 95% level of confidence. The result concluded that cyclist casualty severity was highly related to approach a number of lanes and sudden braking. Further research should be carried out an in-depth analysis to explore connectivity of sudden braking and approach number of lanes in order to investigate the driver’s behavior at approach locations. The output of this research will inform investment in measure to improve the safety of cyclists at roundabouts.

Keywords: binary logistic regression, casualty severity, cyclist safety, roundabout

Procedia PDF Downloads 160
112 The Trade Flow of Small Association Agreements When Rules of Origin Are Relaxed

Authors: Esmat Kamel

Abstract:

This paper aims to shed light on the extent to which the Agadir Association agreement has fostered inter regional trade between the E.U_26 and the Agadir_4 countries; once that we control for the evolution of Agadir agreement’s exports to the rest of the world. The next valid question will be regarding any remarkable variation in the spatial/sectoral structure of exports, and to what extent has it been induced by the Agadir agreement itself and precisely after the adoption of rules of origin and the PANEURO diagonal cumulative scheme? The paper’s empirical dataset covering a timeframe from [2000 -2009] was designed to account for sector specific export and intermediate flows and the bilateral structured gravity model was custom tailored to capture sector and regime specific rules of origin and the Poisson Pseudo Maximum Likelihood Estimator was used to calculate the gravity equation. The methodological approach of this work is considered to be a threefold one which starts first by conducting a ‘Hierarchal Cluster Analysis’ to classify final export flows showing a certain degree of linkage between each other. The analysis resulted in three main sectoral clusters of exports between Agadir_4 and E.U_26: cluster 1 for Petrochemical related sectors, cluster 2 durable goods and finally cluster 3 for heavy duty machinery and spare parts sectors. Second step continues by taking export flows resulting from the 3 clusters to be subject to treatment with diagonal Rules of origin through ‘The Double Differences Approach’, versus an equally comparable untreated control group. Third step is to verify results through a robustness check applied by ‘Propensity Score Matching’ to validate that the same sectoral final export and intermediate flows increased when rules of origin were relaxed. Through all the previous analysis, a remarkable and partial significance of the interaction term combining both treatment effects and time for the coefficients of 13 out of the 17 covered sectors turned out to be partially significant and it further asserted that treatment with diagonal rules of origin contributed in increasing Agadir’s_4 final and intermediate exports to the E.U._26 on average by 335% and in changing Agadir_4 exports structure and composition to the E.U._26 countries.

Keywords: agadir association agreement, structured gravity model, hierarchal cluster analysis, double differences estimation, propensity score matching, diagonal and relaxed rules of origin

Procedia PDF Downloads 299
111 Development of a Bi-National Thyroid Cancer Clinical Quality Registry

Authors: Liane J. Ioannou, Jonathan Serpell, Joanne Dean, Cino Bendinelli, Jenny Gough, Dean Lisewski, Julie Miller, Win Meyer-Rochow, Stan Sidhu, Duncan Topliss, David Walters, John Zalcberg, Susannah Ahern

Abstract:

Background: The occurrence of thyroid cancer is increasing throughout the developed world, including Australia and New Zealand, and since the 1990s has become the fastest increasing malignancy. Following the success of a number of institutional databases that monitor outcomes after thyroid surgery, the Australian and New Zealand Endocrine Surgeons (ANZES) agreed to auspice the development of a bi-national thyroid cancer registry. Objectives: To establish a bi-national population-based clinical quality registry with the aim of monitoring and improving the quality of care provided to patients diagnosed with thyroid cancer in Australia and New Zealand. Patients and Methods: The Australian and New Zealand Thyroid Cancer Registry (ANZTCR) captures clinical data for all patients, over the age of 18 years, diagnosed with thyroid cancer, confirmed by histopathology report, that have been diagnosed, assessed or treated at a contributing hospital. Data is collected by endocrine surgeons using a web-based interface, REDCap, primarily via direct data entry. Results: A multi-disciplinary Steering Committee was formed, and with operational support from Monash University the ANZTCR was established in early 2017. The pilot phase of the registry is currently operating in Victoria, New South Wales, Queensland, Western Australia and South Australia, with over 30 sites expected to come on board across Australia and New Zealand in 2018. A modified-Delphi process was undertaken to determine the key quality indicators to be reported by the registry, and a minimum dataset was developed comprising information regarding thyroid cancer diagnosis, pathology, surgery, and 30-day follow up. Conclusion: There are very few established thyroid cancer registries internationally, yet clinical quality registries have shown valuable outcomes and patient benefits in other cancers. The establishment of the ANZTCR provides the opportunity for Australia and New Zealand to further understand the current practice in the treatment of thyroid cancer and reasons for variation in outcomes. The engagement of endocrine surgeons in supporting this initiative is crucial. While the pilot registry has a focus on early clinical outcomes, it is anticipated that future collection of longer-term outcome data particularly for patients with the poor prognostic disease will add significant further value to the registry.

Keywords: thyroid cancer, clinical registry, population health, quality improvement

Procedia PDF Downloads 170
110 Knowledge Graph Development to Connect Earth Metadata and Standard English Queries

Authors: Gabriel Montague, Max Vilgalys, Catherine H. Crawford, Jorge Ortiz, Dava Newman

Abstract:

There has never been so much publicly accessible atmospheric and environmental data. The possibilities of these data are exciting, but the sheer volume of available datasets represents a new challenge for researchers. The task of identifying and working with a new dataset has become more difficult with the amount and variety of available data. Datasets are often documented in ways that differ substantially from the common English used to describe the same topics. This presents a barrier not only for new scientists, but for researchers looking to find comparisons across multiple datasets or specialists from other disciplines hoping to collaborate. This paper proposes a method for addressing this obstacle: creating a knowledge graph to bridge the gap between everyday English language and the technical language surrounding these datasets. Knowledge graph generation is already a well-established field, although there are some unique challenges posed by working with Earth data. One is the sheer size of the databases – it would be infeasible to replicate or analyze all the data stored by an organization like The National Aeronautics and Space Administration (NASA) or the European Space Agency. Instead, this approach identifies topics from metadata available for datasets in NASA’s Earthdata database, which can then be used to directly request and access the raw data from NASA. By starting with a single metadata standard, this paper establishes an approach that can be generalized to different databases, but leaves the challenge of metadata harmonization for future work. Topics generated from the metadata are then linked to topics from a collection of English queries through a variety of standard and custom natural language processing (NLP) methods. The results from this method are then compared to a baseline of elastic search applied to the metadata. This comparison shows the benefits of the proposed knowledge graph system over existing methods, particularly in interpreting natural language queries and interpreting topics in metadata. For the research community, this work introduces an application of NLP to the ecological and environmental sciences, expanding the possibilities of how machine learning can be applied in this discipline. But perhaps more importantly, it establishes the foundation for a platform that can enable common English to access knowledge that previously required considerable effort and experience. By making this public data accessible to the full public, this work has the potential to transform environmental understanding, engagement, and action.

Keywords: earth metadata, knowledge graphs, natural language processing, question-answer systems

Procedia PDF Downloads 123
109 Multicollinearity and MRA in Sustainability: Application of the Raise Regression

Authors: Claudia García-García, Catalina B. García-García, Román Salmerón-Gómez

Abstract:

Much economic-environmental research includes the analysis of possible interactions by using Moderated Regression Analysis (MRA), which is a specific application of multiple linear regression analysis. This methodology allows analyzing how the effect of one of the independent variables is moderated by a second independent variable by adding a cross-product term between them as an additional explanatory variable. Due to the very specification of the methodology, the moderated factor is often highly correlated with the constitutive terms. Thus, great multicollinearity problems arise. The appearance of strong multicollinearity in a model has important consequences. Inflated variances of the estimators may appear, there is a tendency to consider non-significant regressors that they probably are together with a very high coefficient of determination, incorrect signs of our coefficients may appear and also the high sensibility of the results to small changes in the dataset. Finally, the high relationship among explanatory variables implies difficulties in fixing the individual effects of each one on the model under study. These consequences shifted to the moderated analysis may imply that it is not worth including an interaction term that may be distorting the model. Thus, it is important to manage the problem with some methodology that allows for obtaining reliable results. After a review of those works that applied the MRA among the ten top journals of the field, it is clear that multicollinearity is mostly disregarded. Less than 15% of the reviewed works take into account potential multicollinearity problems. To overcome the issue, this work studies the possible application of recent methodologies to MRA. Particularly, the raised regression is analyzed. This methodology mitigates collinearity from a geometrical point of view: the collinearity problem arises because the variables under study are very close geometrically, so by separating both variables, the problem can be mitigated. Raise regression maintains the available information and modifies the problematic variables instead of deleting variables, for example. Furthermore, the global characteristics of the initial model are also maintained (sum of squared residuals, estimated variance, coefficient of determination, global significance test and prediction). The proposal is implemented to data from countries of the European Union during the last year available regarding greenhouse gas emissions, per capita GDP and a dummy variable that represents the topography of the country. The use of a dummy variable as the moderator is a special variant of MRA, sometimes called “subgroup regression analysis.” The main conclusion of this work is that applying new techniques to the field can improve in a substantial way the results of the analysis. Particularly, the use of raised regression mitigates great multicollinearity problems, so the researcher is able to rely on the interaction term when interpreting the results of a particular study.

Keywords: multicollinearity, MRA, interaction, raise

Procedia PDF Downloads 75
108 R Statistical Software Applied in Reliability Analysis: Case Study of Diesel Generator Fans

Authors: Jelena Vucicevic

Abstract:

Reliability analysis represents a very important task in different areas of work. In any industry, this is crucial for maintenance, efficiency, safety and monetary costs. There are ways to calculate reliability, unreliability, failure density and failure rate. This paper will try to introduce another way of calculating reliability by using R statistical software. R is a free software environment for statistical computing and graphics. It compiles and runs on a wide variety of UNIX platforms, Windows and MacOS. The R programming environment is a widely used open source system for statistical analysis and statistical programming. It includes thousands of functions for the implementation of both standard and new statistical methods. R does not limit user only to operation related only to these functions. This program has many benefits over other similar programs: it is free and, as an open source, constantly updated; it has built-in help system; the R language is easy to extend with user-written functions. The significance of the work is calculation of time to failure or reliability in a new way, using statistic. Another advantage of this calculation is that there is no need for technical details and it can be implemented in any part for which we need to know time to fail in order to have appropriate maintenance, but also to maximize usage and minimize costs. In this case, calculations have been made on diesel generator fans but the same principle can be applied to any other part. The data for this paper came from a field engineering study of the time to failure of diesel generator fans. The ultimate goal was to decide whether or not to replace the working fans with a higher quality fan to prevent future failures. Seventy generators were studied. For each one, the number of hours of running time from its first being put into service until fan failure or until the end of the study (whichever came first) was recorded. Dataset consists of two variables: hours and status. Hours show the time of each fan working and status shows the event: 1- failed, 0- censored data. Censored data represent cases when we cannot track the specific case, so it could fail or success. Gaining the result by using R was easy and quick. The program will take into consideration censored data and include this into the results. This is not so easy in hand calculation. For the purpose of the paper results from R program have been compared to hand calculations in two different cases: censored data taken as a failure and censored data taken as a success. In all three cases, results are significantly different. If user decides to use the R for further calculations, it will give more precise results with work on censored data than the hand calculation.

Keywords: censored data, R statistical software, reliability analysis, time to failure

Procedia PDF Downloads 379
107 Secondary Prisonization and Mental Health: A Comparative Study with Elderly Parents of Prisoners Incarcerated in Remote Jails

Authors: Luixa Reizabal, Inaki Garcia, Eneko Sansinenea, Ainize Sarrionandia, Karmele Lopez De Ipina, Elsa Fernandez

Abstract:

Although the effects of incarceration in prisons close to prisoners’ and their families’ residences have been studied, little is known about the effects of remote incarceration. The present study shows the impact of secondary prisonization on mental health of elderly parents of Basque prisoners who are incarcerated in prisons located far away from prisoners’ and their families’ residences. Secondary prisonization refers to the effects that imprisonment of a family member has on relatives. In the study, psychological effects are analyzed by means of comparative methodology. Specifically, levels of psychopathology (depression, anxiety, and stress) and positive mental health (psychological, social, and emotional well-being) are studied in a sample of parents over 65 years old of prisoners incarcerated in prisons located a long distance away (concretely, some of them in a distance of less than 400 km, while others farther than 400 km) from the Basque Country. The dataset consists of data collected through a questionnaire and from a spontaneous speech recording. The statistical and automatic analyses show that levels of psychopathology and positive mental health of elderly parents of prisoners incarcerated in remote jails are affected by the incarceration of their sons or daughters. Concretely, these parents show higher levels of depression, anxiety, and stress and lower levels of emotional (but not psychological or social) wellbeing than parents with no imprisoned daughters or sons. These findings suggest that parents with imprisoned sons or daughters suffer the impact of secondary prisonization on their mental health. When comparing parents with sons or daughters incarcerated within 400 kilometers from home and parents whose sons or daughters are incarcerated farther than 400 kilometers from home, the latter present higher levels of psychopathology, but also higher levels of positive mental health (although the difference between the two groups is not statistically significant). These findings might be explained by resilience. In fact, in traumatic situations, people can develop a force to cope with the situation, and even present a posttraumatic growth. Bearing in mind all these findings, it could be concluded that secondary prisonization implies for elderly parents with sons or daughters incarcerated in remote jails suffering and, in consequence, that changes in the penitentiary policy applied to Basque prisoners are required in order to finish this suffering.

Keywords: automatic spontaneous speech analysis, elderly parents, machine learning, positive mental health, psychopathology, remote incarceration, secondary prisonization

Procedia PDF Downloads 256
106 Predicting Resistance of Commonly Used Antimicrobials in Urinary Tract Infections: A Decision Tree Analysis

Authors: Meera Tandan, Mohan Timilsina, Martin Cormican, Akke Vellinga

Abstract:

Background: In general practice, many infections are treated empirically without microbiological confirmation. Understanding susceptibility of antimicrobials during empirical prescribing can be helpful to reduce inappropriate prescribing. This study aims to apply a prediction model using a decision tree approach to predict the antimicrobial resistance (AMR) of urinary tract infections (UTI) based on non-clinical features of patients over 65 years. Decision tree models are a novel idea to predict the outcome of AMR at an initial stage. Method: Data was extracted from the database of the microbiological laboratory of the University Hospitals Galway on all antimicrobial susceptibility testing (AST) of urine specimens from patients over the age of 65 from January 2011 to December 2014. The primary endpoint was resistance to common antimicrobials (Nitrofurantoin, trimethoprim, ciprofloxacin, co-amoxiclav and amoxicillin) used to treat UTI. A classification and regression tree (CART) model was generated with the outcome ‘resistant infection’. The importance of each predictor (the number of previous samples, age, gender, location (nursing home, hospital, community) and causative agent) on antimicrobial resistance was estimated. Sensitivity, specificity, negative predictive (NPV) and positive predictive (PPV) values were used to evaluate the performance of the model. Seventy-five percent (75%) of the data were used as a training set and validation of the model was performed with the remaining 25% of the dataset. Results: A total of 9805 UTI patients over 65 years had their urine sample submitted for AST at least once over the four years. E.coli, Klebsiella, Proteus species were the most commonly identified pathogens among the UTI patients without catheter whereas Sertia, Staphylococcus aureus; Enterobacter was common with the catheter. The validated CART model shows slight differences in the sensitivity, specificity, PPV and NPV in between the models with and without the causative organisms. The sensitivity, specificity, PPV and NPV for the model with non-clinical predictors was between 74% and 88% depending on the antimicrobial. Conclusion: The CART models developed using non-clinical predictors have good performance when predicting antimicrobial resistance. These models predict which antimicrobial may be the most appropriate based on non-clinical factors. Other CART models, prospective data collection and validation and an increasing number of non-clinical factors will improve model performance. The presented model provides an alternative approach to decision making on antimicrobial prescribing for UTIs in older patients.

Keywords: antimicrobial resistance, urinary tract infection, prediction, decision tree

Procedia PDF Downloads 230
105 Evaluation of Soil Erosion Risk and Prioritization for Implementation of Management Strategies in Morocco

Authors: Lahcen Daoudi, Fatima Zahra Omdi, Abldelali Gourfi

Abstract:

In Morocco, as in most Mediterranean countries, water scarcity is a common situation because of low and unevenly distributed rainfall. The expansions of irrigated lands, as well as the growth of urban and industrial areas and tourist resorts, contribute to an increase of water demand. Therefore in the 1960s Morocco embarked on an ambitious program to increase the number of dams to boost water retention capacity. However, the decrease in the capacity of these reservoirs caused by sedimentation is a major problem; it is estimated at 75 million m3/year. Dams and reservoirs became unusable for their intended purposes due to sedimentation in large rivers that result from soil erosion. Soil erosion presents an important driving force in the process affecting the landscape. It has become one of the most serious environmental problems that raised much interest throughout the world. Monitoring soil erosion risk is an important part of soil conservation practices. The estimation of soil loss risk is the first step for a successful control of water erosion. The aim of this study is to estimate the soil loss risk and its spatial distribution in the different fields of Morocco and to prioritize areas for soil conservation interventions. The approach followed is the Revised Universal Soil Loss Equation (RUSLE) using remote sensing and GIS, which is the most popular empirically based model used globally for erosion prediction and control. This model has been tested in many agricultural watersheds in the world, particularly for large-scale basins due to the simplicity of the model formulation and easy availability of the dataset. The spatial distribution of the annual soil loss was elaborated by the combination of several factors: rainfall erosivity, soil erodability, topography, and land cover. The average annual soil loss estimated in several basins watershed of Morocco varies from 0 to 50t/ha/year. Watersheds characterized by high-erosion-vulnerability are located in the North (Rif Mountains) and more particularly in the Central part of Morocco (High Atlas Mountains). This variation of vulnerability is highly correlated to slope variation which indicates that the topography factor is the main agent of soil erosion within these basin catchments. These results could be helpful for the planning of natural resources management and for implementing sustainable long-term management strategies which are necessary for soil conservation and for increasing over the projected economic life of the dam implemented.

Keywords: soil loss, RUSLE, GIS-remote sensing, watershed, Morocco

Procedia PDF Downloads 435