Search results for: custom dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1306

Search results for: custom dataset

346 Automated Natural Hazard Zonation System with Internet-SMS Warning: Distributed GIS for Sustainable Societies Creating Schema and Interface for Mapping and Communication

Authors: Devanjan Bhattacharya, Jitka Komarkova

Abstract:

The research describes the implementation of a novel and stand-alone system for dynamic hazard warning. The system uses all existing infrastructure already in place like mobile networks, a laptop/PC and the small installation software. The geospatial dataset are the maps of a region which are again frugal. Hence there is no need to invest and it reaches everyone with a mobile. A novel architecture of hazard assessment and warning introduced where major technologies in ICT interfaced to give a unique WebGIS based dynamic real time geohazard warning communication system. A never before architecture introduced for integrating WebGIS with telecommunication technology. Existing technologies interfaced in a novel architectural design to address a neglected domain in a way never done before–through dynamically updatable WebGIS based warning communication. The work publishes new architecture and novelty in addressing hazard warning techniques in sustainable way and user friendly manner. Coupling of hazard zonation and hazard warning procedures into a single system has been shown. Generalized architecture for deciphering a range of geo-hazards has been developed. Hence the developmental work presented here can be summarized as the development of internet-SMS based automated geo-hazard warning communication system; integrating a warning communication system with a hazard evaluation system; interfacing different open-source technologies towards design and development of a warning system; modularization of different technologies towards development of a warning communication system; automated data creation, transformation and dissemination over different interfaces. The architecture of the developed warning system has been functionally automated as well as generalized enough that can be used for any hazard and setup requirement has been kept to a minimum.

Keywords: geospatial, web-based GIS, geohazard, warning system

Procedia PDF Downloads 380
345 Effects of Different Meteorological Variables on Reference Evapotranspiration Modeling: Application of Principal Component Analysis

Authors: Akinola Ikudayisi, Josiah Adeyemo

Abstract:

The correct estimation of reference evapotranspiration (ETₒ) is required for effective irrigation water resources planning and management. However, there are some variables that must be considered while estimating and modeling ETₒ. This study therefore determines the multivariate analysis of correlated variables involved in the estimation and modeling of ETₒ at Vaalharts irrigation scheme (VIS) in South Africa using Principal Component Analysis (PCA) technique. Weather and meteorological data between 1994 and 2014 were obtained both from South African Weather Service (SAWS) and Agricultural Research Council (ARC) in South Africa for this study. Average monthly data of minimum and maximum temperature (°C), rainfall (mm), relative humidity (%), and wind speed (m/s) were the inputs to the PCA-based model, while ETₒ is the output. PCA technique was adopted to extract the most important information from the dataset and also to analyze the relationship between the five variables and ETₒ. This is to determine the most significant variables affecting ETₒ estimation at VIS. From the model performances, two principal components with a variance of 82.7% were retained after the eigenvector extraction. The results of the two principal components were compared and the model output shows that minimum temperature, maximum temperature and windspeed are the most important variables in ETₒ estimation and modeling at VIS. In order words, ETₒ increases with temperature and windspeed. Other variables such as rainfall and relative humidity are less important and cannot be used to provide enough information about ETₒ estimation at VIS. The outcome of this study has helped to reduce input variable dimensionality from five to the three most significant variables in ETₒ modelling at VIS, South Africa.

Keywords: irrigation, principal component analysis, reference evapotranspiration, Vaalharts

Procedia PDF Downloads 227
344 The Determinants of Corporate Hedging Strategy

Authors: Ademola Ajibade

Abstract:

Previous studies have explored several rationales for hedging strategies, but the evidence provided by these studies remains ambiguous. Using a hand-collected dataset of 2460 observations of non-financial firms in eight African countries covering 2013-2022, this paper investigates the determinants and extent of corporate hedge use. In particular, this paper focuses on the link between country-specific conditions and the corporate hedging behaviour of firms. To our knowledge, this represents the first African studies investigating the association between country-specific factors and corporate hedging policy. The evidence based on both univariate and multivariate reveal that country-level corruption and government quality are important indicators of the decisions and extent of hedge use among African firms. However, the connection between country-specific factors as a rationale for corporate hedge use is stronger for firms located in highly corrupt countries. This suggest that firms located in corrupt countries are more motivated to hedge due to the large exposure they face. In addition, we test the risk management theories and observe that CEOs educational qualification and experience shape corporate hedge behaviour. We implement a lagged variables in a panel data setting to address endogeneity concern and implement an interaction term between governance indices and firm-specific variables to test for robustness. Generally, our findings reveal that institutional factors shape risk management decisions and have a predictive power in explaining corporate hedging strategy.

Keywords: corporate hedging, governance quality, corruption, derivatives

Procedia PDF Downloads 63
343 Identification of Spam Keywords Using Hierarchical Category in C2C E-Commerce

Authors: Shao Bo Cheng, Yong-Jin Han, Se Young Park, Seong-Bae Park

Abstract:

Consumer-to-Consumer (C2C) E-commerce has been growing at a very high speed in recent years. Since identical or nearly-same kinds of products compete one another by relying on keyword search in C2C E-commerce, some sellers describe their products with spam keywords that are popular but are not related to their products. Though such products get more chances to be retrieved and selected by consumers than those without spam keywords, the spam keywords mislead the consumers and waste their time. This problem has been reported in many commercial services like e-bay and taobao, but there have been little research to solve this problem. As a solution to this problem, this paper proposes a method to classify whether keywords of a product are spam or not. The proposed method assumes that a keyword for a given product is more reliable if the keyword is observed commonly in specifications of products which are the same or the same kind as the given product. This is because that a hierarchical category of a product in general determined precisely by a seller of the product and so is the specification of the product. Since higher layers of the hierarchical category represent more general kinds of products, a reliable degree is differently determined according to the layers. Hence, reliable degrees from different layers of a hierarchical category become features for keywords and they are used together with features only from specifications for classification of the keywords. Support Vector Machines are adopted as a basic classifier using the features, since it is powerful, and widely used in many classification tasks. In the experiments, the proposed method is evaluated with a golden standard dataset from Yi-han-wang, a Chinese C2C e-commerce, and is compared with a baseline method that does not consider the hierarchical category. The experimental results show that the proposed method outperforms the baseline in F1-measure, which proves that spam keywords are effectively identified by a hierarchical category in C2C e-commerce.

Keywords: spam keyword, e-commerce, keyword features, spam filtering

Procedia PDF Downloads 272
342 Machine Learning Facing Behavioral Noise Problem in an Imbalanced Data Using One Side Behavioral Noise Reduction: Application to a Fraud Detection

Authors: Salma El Hajjami, Jamal Malki, Alain Bouju, Mohammed Berrada

Abstract:

With the expansion of machine learning and data mining in the context of Big Data analytics, the common problem that affects data is class imbalance. It refers to an imbalanced distribution of instances belonging to each class. This problem is present in many real world applications such as fraud detection, network intrusion detection, medical diagnostics, etc. In these cases, data instances labeled negatively are significantly more numerous than the instances labeled positively. When this difference is too large, the learning system may face difficulty when tackling this problem, since it is initially designed to work in relatively balanced class distribution scenarios. Another important problem, which usually accompanies these imbalanced data, is the overlapping instances between the two classes. It is commonly referred to as noise or overlapping data. In this article, we propose an approach called: One Side Behavioral Noise Reduction (OSBNR). This approach presents a way to deal with the problem of class imbalance in the presence of a high noise level. OSBNR is based on two steps. Firstly, a cluster analysis is applied to groups similar instances from the minority class into several behavior clusters. Secondly, we select and eliminate the instances of the majority class, considered as behavioral noise, which overlap with behavior clusters of the minority class. The results of experiments carried out on a representative public dataset confirm that the proposed approach is efficient for the treatment of class imbalances in the presence of noise.

Keywords: machine learning, imbalanced data, data mining, big data

Procedia PDF Downloads 108
341 Towards Law Data Labelling Using Topic Modelling

Authors: Daniel Pinheiro Da Silva Junior, Aline Paes, Daniel De Oliveira, Christiano Lacerda Ghuerren, Marcio Duran

Abstract:

The Courts of Accounts are institutions responsible for overseeing and point out irregularities of Public Administration expenses. They have a high demand for processes to be analyzed, whose decisions must be grounded on severity laws. Despite the existing large amount of processes, there are several cases reporting similar subjects. Thus, previous decisions on already analyzed processes can be a precedent for current processes that refer to similar topics. Identifying similar topics is an open, yet essential task for identifying similarities between several processes. Since the actual amount of topics is considerably large, it is tedious and error-prone to identify topics using a pure manual approach. This paper presents a tool based on Machine Learning and Natural Language Processing to assists in building a labeled dataset. The tool relies on Topic Modelling with Latent Dirichlet Allocation to find the topics underlying a document followed by Jensen Shannon distance metric to generate a probability of similarity between documents pairs. Furthermore, in a case study with a corpus of decisions of the Rio de Janeiro State Court of Accounts, it was noted that data pre-processing plays an essential role in modeling relevant topics. Also, the combination of topic modeling and a calculated distance metric over document represented among generated topics has been proved useful in helping to construct a labeled base of similar and non-similar document pairs.

Keywords: courts of accounts, data labelling, document similarity, topic modeling

Procedia PDF Downloads 152
340 Time Series Analysis the Case of China and USA Trade Examining during Covid-19 Trade Enormity of Abnormal Pricing with the Exchange rate

Authors: Md. Mahadi Hasan Sany, Mumenunnessa Keya, Sharun Khushbu, Sheikh Abujar

Abstract:

Since the beginning of China's economic reform, trade between the U.S. and China has grown rapidly, and has increased since China's accession to the World Trade Organization in 2001. The US imports more than it exports from China, reducing the trade war between China and the U.S. for the 2019 trade deficit, but in 2020, the opposite happens. In international and U.S. trade, Washington launched a full-scale trade war against China in March 2016, which occurred a catastrophic epidemic. The main goal of our study is to measure and predict trade relations between China and the U.S., before and after the arrival of the COVID epidemic. The ML model uses different data as input but has no time dimension that is present in the time series models and is only able to predict the future from previously observed data. The LSTM (a well-known Recurrent Neural Network) model is applied as the best time series model for trading forecasting. We have been able to create a sustainable forecasting system in trade between China and the US by closely monitoring a dataset published by the State Website NZ Tatauranga Aotearoa from January 1, 2015, to April 30, 2021. Throughout the survey, we provided a 180-day forecast that outlined what would happen to trade between China and the US during COVID-19. In addition, we have illustrated that the LSTM model provides outstanding outcome in time series data analysis rather than RFR and SVR (e.g., both ML models). The study looks at how the current Covid outbreak affects China-US trade. As a comparative study, RMSE transmission rate is calculated for LSTM, RFR and SVR. From our time series analysis, it can be said that the LSTM model has given very favorable thoughts in terms of China-US trade on the future export situation.

Keywords: RFR, China-U.S. trade war, SVR, LSTM, deep learning, Covid-19, export value, forecasting, time series analysis

Procedia PDF Downloads 174
339 Feature Selection of Personal Authentication Based on EEG Signal for K-Means Cluster Analysis Using Silhouettes Score

Authors: Jianfeng Hu

Abstract:

Personal authentication based on electroencephalography (EEG) signals is one of the important field for the biometric technology. More and more researchers have used EEG signals as data source for biometric. However, there are some disadvantages for biometrics based on EEG signals. The proposed method employs entropy measures for feature extraction from EEG signals. Four type of entropies measures, sample entropy (SE), fuzzy entropy (FE), approximate entropy (AE) and spectral entropy (PE), were deployed as feature set. In a silhouettes calculation, the distance from each data point in a cluster to all another point within the same cluster and to all other data points in the closest cluster are determined. Thus silhouettes provide a measure of how well a data point was classified when it was assigned to a cluster and the separation between them. This feature renders silhouettes potentially well suited for assessing cluster quality in personal authentication methods. In this study, “silhouettes scores” was used for assessing the cluster quality of k-means clustering algorithm is well suited for comparing the performance of each EEG dataset. The main goals of this study are: (1) to represent each target as a tuple of multiple feature sets, (2) to assign a suitable measure to each feature set, (3) to combine different feature sets, (4) to determine the optimal feature weighting. Using precision/recall evaluations, the effectiveness of feature weighting in clustering was analyzed. EEG data from 22 subjects were collected. Results showed that: (1) It is possible to use fewer electrodes (3-4) for personal authentication. (2) There was the difference between each electrode for personal authentication (p<0.01). (3) There is no significant difference for authentication performance among feature sets (except feature PE). Conclusion: The combination of k-means clustering algorithm and silhouette approach proved to be an accurate method for personal authentication based on EEG signals.

Keywords: personal authentication, K-mean clustering, electroencephalogram, EEG, silhouettes

Procedia PDF Downloads 260
338 Copper Price Prediction Model for Various Economic Situations

Authors: Haidy S. Ghali, Engy Serag, A. Samer Ezeldin

Abstract:

Copper is an essential raw material used in the construction industry. During the year 2021 and the first half of 2022, the global market suffered from a significant fluctuation in copper raw material prices due to the aftermath of both the COVID-19 pandemic and the Russia-Ukraine war, which exposed its consumers to an unexpected financial risk. Thereto, this paper aims to develop two ANN-LSTM price prediction models, using Python, that can forecast the average monthly copper prices traded in the London Metal Exchange; the first model is a multivariate model that forecasts the copper price of the next 1-month and the second is a univariate model that predicts the copper prices of the upcoming three months. Historical data of average monthly London Metal Exchange copper prices are collected from January 2009 till July 2022, and potential external factors are identified and employed in the multivariate model. These factors lie under three main categories: energy prices and economic indicators of the three major exporting countries of copper, depending on the data availability. Before developing the LSTM models, the collected external parameters are analyzed with respect to the copper prices using correlation and multicollinearity tests in R software; then, the parameters are further screened to select the parameters that influence the copper prices. Then, the two LSTM models are developed, and the dataset is divided into training, validation, and testing sets. The results show that the performance of the 3-Month prediction model is better than the 1-Month prediction model, but still, both models can act as predicting tools for diverse economic situations.

Keywords: copper prices, prediction model, neural network, time series forecasting

Procedia PDF Downloads 86
337 Environmental Controls on the Distribution of Intertidal Foraminifers in Sabkha Al-Kharrar, Saudi Arabia: Implications for Sea-Level Changes

Authors: Talha A. Al-Dubai, Rashad A. Bantan, Ramadan H. Abu-Zied, Brian G. Jones, Aaid G. Al-Zubieri

Abstract:

Contemporary foraminiferal samples sediments were collected from the intertidal sabkha of Al-Kharrar Lagoon, Saudi Arabia, to study the vertical distribution of Foraminifera and, based on a modern training set, their potential to develop a predictor of former sea-level changes in the area. Based on hierarchical cluster analysis, the intertidal sabkha is divided into three vertical zones (A, B & C) represented by three foraminiferal assemblages, where agglutinated species occupied Zone A and calcareous species occupied the other two zones. In Zone A (high intertidal), Agglutinella compressa, Clavulina angularis and C. multicamerata are dominant species with a minor presence of Peneroplis planatus, Coscinospira hemprichii, Sorites orbiculus, Quinqueloculina lamarckiana, Q. seminula, Ammonia convexa and A. tepida. In contrast, in Zone B (middle intertidal) the most abundant species are P. planatus, C. hemprichii, S. orbiculus, Q. lamarckiana, Q. seminula and Q. laevigata, while Zone C (low intertidal) is characterised by C. hemprichii, Q. costata, S. orbiculus, P. planatus, A. convexa, A. tepida, Spiroloculina communis and S. costigera. A transfer function for sea-level reconstruction was developed using a modern dataset of 75 contemporary sediment samples and 99 species collected from several transects across the sabkha. The model provided an error of 0.12m, suggesting that intertidal foraminifers are able to predict the past sea-level changes with high precision in Al-Kharrar Lagoon, and thus the future prediction of those changes in the area.

Keywords: Lagoonal foraminifers, intertidal sabkha, vertical zonation, transfer function, sea level

Procedia PDF Downloads 149
336 Improving Our Understanding of the in vivo Modelling of Psychotic Disorders

Authors: Zsanett Bahor, Cristina Nunes-Fonseca, Gillian L. Currie, Emily S. Sena, Lindsay D.G. Thomson, Malcolm R. Macleod

Abstract:

Psychosis is ranked as the third most disabling medical condition in the world by the World Health Organization. Despite a substantial amount of research in recent years, available treatments are not universally effective and have a wide range of adverse side effects. Since many clinical drug candidates are identified through in vivo modelling, a deeper understanding of these models, and their strengths and limitations, might help us understand reasons for difficulties in psychosis drug development. To provide an unbiased summary of the preclinical psychosis literature we performed a systematic electronic search of PubMed for publications modelling a psychotic disorder in vivo, identifying 14,721 relevant studies. Double screening of 11,000 publications from this dataset so far established 2403 animal studies of psychosis, with the most common model being schizophrenia (95%). 61% of these models are induced using pharmacological agents. For all the models only 56% of publications test a therapeutic treatment. We propose a systematic review of these studies to assess the prevalence of reporting of measures to reduce risk of bias, and a meta-analysis to assess the internal and external validity of these animal models. Our findings are likely to be relevant to future preclinical studies of psychosis as this generation of strong empirical evidence has the potential to identify weaknesses, areas for improvement and make suggestions on refinement of experimental design. Such a detailed understanding of the data which inform what we think we know will help improve the current attrition rate between bench and bedside in psychosis research.

Keywords: animal models, psychosis, systematic review, schizophrenia

Procedia PDF Downloads 266
335 The Impact of Food Inflation on Poverty: An Analysis of the Different Households in the Philippines

Authors: Kara Gianina D. Rosas, Jade Emily L. Tong

Abstract:

This study assesses the vulnerability of households to food price shocks. Using the Philippines as a case study, the researchers aim to understand how such shocks can cause food insecurity in different types of households. This paper measures the impact of actual food price changes during the food crisis of 2006-2009 on poverty in relation to their spatial location. Households are classified as rural or urban and agricultural or non-agricultural. By treating food prices and consumption patterns as heterogeneous, this study differs from conventional poverty analysis as actual prices are used. Merging the Family, Income and Expenditure Survey (FIES) with the Consumer Price Index dataset (CPI), the researchers were able to determine the effects on poverty measures, specifically, headcount index, poverty gap, and poverty severity. The study finds that, without other interventions, food inflation would lead to a significant increase in the number of households that fall below the poverty threshold, except for households whose income is derived from agricultural activities. It also finds that much of the inflation during these years was fueled by the rise in staple food prices. Essentially, this paper aims to broaden the economic perspective of policymakers with regard to the heterogeneity of impacts of inflation through analyzing the deeper microeconomic levels of different subgroups. In hopes of finding a solution to lessen the inequality gap of poverty between the rural and urban poor, this paper aims to aid policymakers in creating projects targeted towards food insecurity.

Keywords: poverty, food inflation, agricultural households, non-agricultural households, net consumption ratio, urban poor, rural poor, head count index, poverty gap, poverty severity

Procedia PDF Downloads 215
334 Biophysical Analysis of the Interaction of Polymeric Nanoparticles with Biomimetic Models of the Lung Surfactant

Authors: Weiam Daear, Patrick Lai, Elmar Prenner

Abstract:

The human body offers many avenues that could be used for drug delivery. The pulmonary route, which is delivered through the lungs, presents many advantages that have sparked interested in the field. These advantages include; 1) direct access to the lungs and the large surface area it provides, and 2) close proximity to the blood circulation. The air-blood barrier of the alveoli is about 500 nm thick. The air-blood barrier consist of a monolayer of lipids and few proteins called the lung surfactant and cells. This monolayer consists of ~90% lipids and ~10% proteins that are produced by the alveolar epithelial cells. The two major lipid classes constitutes of various saturation and chain length of phosphatidylcholine (PC) and phosphatidylglycerol (PG) representing 80% of total lipid component. The major role of the lung surfactant monolayer is to reduce surface tension experienced during breathing cycles in order to prevent lung collapse. In terms of the pulmonary drug delivery route, drugs pass through various parts of the respiratory system before reaching the alveoli. It is at this location that the lung surfactant functions as the air-blood barrier for drugs. As the field of nanomedicine advances, the use of nanoparticles (NPs) as drug delivery vehicles is becoming very important. This is due to the advantages NPs provide with their large surface area and potential specific targeting. Therefore, studying the interaction of NPs with lung surfactant and whether they affect its stability becomes very essential. The aim of this research is to develop a biomimetic model of the human lung surfactant followed by a biophysical analysis of the interaction of polymeric NPs. This biomimetic model will function as a fast initial mode of testing for whether NPs affect the stability of the human lung surfactant. The model developed thus far is an 8-component lipid system that contains major PC and PG lipids. Recently, a custom made 16:0/16:1 PC and PG lipids were added to the model system. In the human lung surfactant, these lipids constitute 16% of the total lipid component. According to the author’s knowledge, there is not much monolayer data on the biophysical analysis of the 16:0/16:1 lipids, therefore more analysis will be discussed here. Biophysical techniques such as the Langmuir Trough is used for stability measurements which monitors changes to a monolayer's surface pressure upon NP interaction. Furthermore, Brewster Angle Microscopy (BAM) employed to visualize changes to the lateral domain organization. Results show preferential interactions of NPs with different lipid groups that is also dependent on the monolayer fluidity. Furthermore, results show that the film stability upon compression is unaffected, but there are significant changes in the lateral domain organization of the lung surfactant upon NP addition. This research is significant in the field of pulmonary drug delivery. It is shown that NPs within a certain size range are safe for the pulmonary route, but little is known about the mode of interaction of those polymeric NPs. Moreover, this work will provide additional information about the nanotoxicology of NPs tested.

Keywords: Brewster angle microscopy, lipids, lung surfactant, nanoparticles

Procedia PDF Downloads 152
333 A Generalized Framework for Adaptive Machine Learning Deployments in Algorithmic Trading

Authors: Robert Caulk

Abstract:

A generalized framework for adaptive machine learning deployments in algorithmic trading is introduced, tested, and released as open-source code. The presented software aims to test the hypothesis that recent data contains enough information to form a probabilistically favorable short-term price prediction. Further, the framework contains various adaptive machine learning techniques that are geared toward generating profit during strong trends and minimizing losses during trend changes. Results demonstrate that this adaptive machine learning approach is capable of capturing trends and generating profit. The presentation also discusses the importance of defining the parameter space associated with the dynamic training data-set and using the parameter space to identify and remove outliers from prediction data points. Meanwhile, the generalized architecture enables common users to exploit the powerful machinery while focusing on high-level feature engineering and model testing. The presentation also highlights common strengths and weaknesses associated with the presented technique and presents a broad range of well-tested starting points for feature set construction, target setting, and statistical methods for enforcing risk management and maintaining probabilistically favorable entry and exit points. The presentation also describes the end-to-end data processing tools associated with FreqAI, including automatic data fetching, data aggregation, feature engineering, safe and robust data pre-processing, outlier detection, custom machine learning and statistical tools, data post-processing, and adaptive training backtest emulation, and deployment of adaptive training in live environments. Finally, the generalized user interface is also discussed in the presentation. Feature engineering is simplified so that users can seed their feature sets with common indicator libraries (e.g. TA-lib, pandas-ta). The user also feeds data expansion parameters to fill out a large feature set for the model, which can contain as many as 10,000+ features. The presentation describes the various object-oriented programming techniques employed to make FreqAI agnostic to third-party libraries and external data sources. In other words, the back-end is constructed in such a way that users can leverage a broad range of common regression libraries (Catboost, LightGBM, Sklearn, etc) as well as common Neural Network libraries (TensorFlow, PyTorch) without worrying about the logistical complexities associated with data handling and API interactions. The presentation finishes by drawing conclusions about the most important parameters associated with a live deployment of the adaptive learning framework and provides the road map for future development in FreqAI.

Keywords: machine learning, market trend detection, open-source, adaptive learning, parameter space exploration

Procedia PDF Downloads 69
332 The Role of Urban Development Patterns for Mitigating Extreme Urban Heat: The Case Study of Doha, Qatar

Authors: Yasuyo Makido, Vivek Shandas, David J. Sailor, M. Salim Ferwati

Abstract:

Mitigating extreme urban heat is challenging in a desert climate such as Doha, Qatar, since outdoor daytime temperature area often too high for the human body to tolerate. Recent studies demonstrate that cities in arid and semiarid areas can exhibit ‘urban cool islands’ - urban areas that are cooler than the surrounding desert. However, the variation of temperatures as a result of the time of day and factors leading to temperature change remain at the question. To address these questions, we examined the spatial and temporal variation of air temperature in Doha, Qatar by conducting multiple vehicle-base local temperature observations. We also employed three statistical approaches to model surface temperatures using relevant predictors: (1) Ordinary Least Squares, (2) Regression Tree Analysis and (3) Random Forest for three time periods. Although the most important determinant factors varied by day and time, distance to the coast was the significant determinant at midday. A 70%/30% holdout method was used to create a testing dataset to validate the results through Pearson’s correlation coefficient. The Pearson’s analysis suggests that the Random Forest model more accurately predicts the surface temperatures than the other methods. We conclude with recommendations about the types of development patterns that show the greatest potential for reducing extreme heat in air climates.

Keywords: desert cities, tree-structure regression model, urban cool Island, vehicle temperature traverse

Procedia PDF Downloads 371
331 An End-to-end Piping and Instrumentation Diagram Information Recognition System

Authors: Taekyong Lee, Joon-Young Kim, Jae-Min Cha

Abstract:

Piping and instrumentation diagram (P&ID) is an essential design drawing describing the interconnection of process equipment and the instrumentation installed to control the process. P&IDs are modified and managed throughout a whole life cycle of a process plant. For the ease of data transfer, P&IDs are generally handed over from a design company to an engineering company as portable document format (PDF) which is hard to be modified. Therefore, engineering companies have to deploy a great deal of time and human resources only for manually converting P&ID images into a computer aided design (CAD) file format. To reduce the inefficiency of the P&ID conversion, various symbols and texts in P&ID images should be automatically recognized. However, recognizing information in P&ID images is not an easy task. A P&ID image usually contains hundreds of symbol and text objects. Most objects are pretty small compared to the size of a whole image and are densely packed together. Traditional recognition methods based on geometrical features are not capable enough to recognize every elements of a P&ID image. To overcome these difficulties, state-of-the-art deep learning models, RetinaNet and connectionist text proposal network (CTPN) were used to build a system for recognizing symbols and texts in a P&ID image. Using the RetinaNet and the CTPN model carefully modified and tuned for P&ID image dataset, the developed system recognizes texts, equipment symbols, piping symbols and instrumentation symbols from an input P&ID image and save the recognition results as the pre-defined extensible markup language format. In the test using a commercial P&ID image, the P&ID information recognition system correctly recognized 97% of the symbols and 81.4% of the texts.

Keywords: object recognition system, P&ID, symbol recognition, text recognition

Procedia PDF Downloads 128
330 A Survey of Skin Cancer Detection and Classification from Skin Lesion Images Using Deep Learning

Authors: Joseph George, Anne Kotteswara Roa

Abstract:

Skin disease is one of the most common and popular kinds of health issues faced by people nowadays. Skin cancer (SC) is one among them, and its detection relies on the skin biopsy outputs and the expertise of the doctors, but it consumes more time and some inaccurate results. At the early stage, skin cancer detection is a challenging task, and it easily spreads to the whole body and leads to an increase in the mortality rate. Skin cancer is curable when it is detected at an early stage. In order to classify correct and accurate skin cancer, the critical task is skin cancer identification and classification, and it is more based on the cancer disease features such as shape, size, color, symmetry and etc. More similar characteristics are present in many skin diseases; hence it makes it a challenging issue to select important features from a skin cancer dataset images. Hence, the skin cancer diagnostic accuracy is improved by requiring an automated skin cancer detection and classification framework; thereby, the human expert’s scarcity is handled. Recently, the deep learning techniques like Convolutional neural network (CNN), Deep belief neural network (DBN), Artificial neural network (ANN), Recurrent neural network (RNN), and Long and short term memory (LSTM) have been widely used for the identification and classification of skin cancers. This survey reviews different DL techniques for skin cancer identification and classification. The performance metrics such as precision, recall, accuracy, sensitivity, specificity, and F-measures are used to evaluate the effectiveness of SC identification using DL techniques. By using these DL techniques, the classification accuracy increases along with the mitigation of computational complexities and time consumption.

Keywords: skin cancer, deep learning, performance measures, accuracy, datasets

Procedia PDF Downloads 103
329 Physics-Informed Convolutional Neural Networks for Reservoir Simulation

Authors: Jiangxia Han, Liang Xue, Keda Chen

Abstract:

Despite the significant progress over the last decades in reservoir simulation using numerical discretization, meshing is complex. Moreover, the high degree of freedom of the space-time flow field makes the solution process very time-consuming. Therefore, we present Physics-Informed Convolutional Neural Networks(PICNN) as a hybrid scientific theory and data method for reservoir modeling. Besides labeled data, the model is driven by the scientific theories of the underlying problem, such as governing equations, boundary conditions, and initial conditions. PICNN integrates governing equations and boundary conditions into the network architecture in the form of a customized convolution kernel. The loss function is composed of data matching, initial conditions, and other measurable prior knowledge. By customizing the convolution kernel and minimizing the loss function, the neural network parameters not only fit the data but also honor the governing equation. The PICNN provides a methodology to model and history-match flow and transport problems in porous media. Numerical results demonstrate that the proposed PICNN can provide an accurate physical solution from a limited dataset. We show how this method can be applied in the context of a forward simulation for continuous problems. Furthermore, several complex scenarios are tested, including the existence of data noise, different work schedules, and different good patterns.

Keywords: convolutional neural networks, deep learning, flow and transport in porous media, physics-informed neural networks, reservoir simulation

Procedia PDF Downloads 108
328 Mapping of Geological Structures Using Aerial Photography

Authors: Ankit Sharma, Mudit Sachan, Anurag Prakash

Abstract:

Rapid growth in data acquisition technologies through drones, have led to advances and interests in collecting high-resolution images of geological fields. Being advantageous in capturing high volume of data in short flights, a number of challenges have to overcome for efficient analysis of this data, especially while data acquisition, image interpretation and processing. We introduce a method that allows effective mapping of geological fields using photogrammetric data of surfaces, drainage area, water bodies etc, which will be captured by airborne vehicles like UAVs, we are not taking satellite images because of problems in adequate resolution, time when it is captured may be 1 yr back, availability problem, difficult to capture exact image, then night vision etc. This method includes advanced automated image interpretation technology and human data interaction to model structures and. First Geological structures will be detected from the primary photographic dataset and the equivalent three dimensional structures would then be identified by digital elevation model. We can calculate dip and its direction by using the above information. The structural map will be generated by adopting a specified methodology starting from choosing the appropriate camera, camera’s mounting system, UAVs design ( based on the area and application), Challenge in air borne systems like Errors in image orientation, payload problem, mosaicing and geo referencing and registering of different images to applying DEM. The paper shows the potential of using our method for accurate and efficient modeling of geological structures, capture particularly from remote, of inaccessible and hazardous sites.

Keywords: digital elevation model, mapping, photogrammetric data analysis, geological structures

Procedia PDF Downloads 663
327 Analysing Trends in Rice Cropping Intensity and Seasonality across the Philippines Using 14 Years of Moderate Resolution Remote Sensing Imagery

Authors: Bhogendra Mishra, Andy Nelson, Mirco Boschetti, Lorenzo Busetto, Alice Laborte

Abstract:

Rice is grown on over 100 million hectares in almost every country of Asia. It is the most important staple crop for food security and has high economic and cultural importance in Asian societies. The combination of genetic diversity and management options, coupled with the large geographic extent means that there is a large variation in seasonality (when it is grown) and cropping intensity (how often it is grown per year on the same plot of land), even over relatively small distances. Seasonality and intensity can and do change over time depending on climatic, environmental and economic factors. Detecting where and when these changes happen can provide information to better understand trends in regional and even global rice production. Remote sensing offers a unique opportunity to estimate these trends. We apply the recently published PhenoRice algorithm to 14 years of moderate resolution remote sensing (MODIS) data (utilizing 250m resolution 16 day composites from Terra and Aqua) to estimate seasonality and cropping intensity per year and changes over time. We compare the results to the surveyed data collected by International Rice Research Institute (IRRI). The study results in a unique and validated dataset on the extent and change of extent, the seasonality and change in seasonality and the cropping intensity and change in cropping intensity between 2003 and 2016 for the Philippines. Observed trends and their implications for food security and trade policies are also discussed.

Keywords: rice, cropping intensity, moderate resolution remote sensing (MODIS), phenology, seasonality

Procedia PDF Downloads 274
326 Lung HRCT Pattern Classification for Cystic Fibrosis Using a Convolutional Neural Network

Authors: Parisa Mansour

Abstract:

Cystic fibrosis (CF) is one of the most common autosomal recessive diseases among whites. It mostly affects the lungs, causing infections and inflammation that account for 90% of deaths in CF patients. Because of this high variability in clinical presentation and organ involvement, investigating treatment responses and evaluating lung changes over time is critical to preventing CF progression. High-resolution computed tomography (HRCT) greatly facilitates the assessment of lung disease progression in CF patients. Recently, artificial intelligence was used to analyze chest CT scans of CF patients. In this paper, we propose a convolutional neural network (CNN) approach to classify CF lung patterns in HRCT images. The proposed network consists of two convolutional layers with 3 × 3 kernels and maximally connected in each layer, followed by two dense layers with 1024 and 10 neurons, respectively. The softmax layer prepares a predicted output probability distribution between classes. This layer has three exits corresponding to the categories of normal (healthy), bronchitis and inflammation. To train and evaluate the network, we constructed a patch-based dataset extracted from more than 1100 lung HRCT slices obtained from 45 CF patients. Comparative evaluation showed the effectiveness of the proposed CNN compared to its close peers. Classification accuracy, average sensitivity and specificity of 93.64%, 93.47% and 96.61% were achieved, indicating the potential of CNNs in analyzing lung CF patterns and monitoring lung health. In addition, the visual features extracted by our proposed method can be useful for automatic measurement and finally evaluation of the severity of CF patterns in lung HRCT images.

Keywords: HRCT, CF, cystic fibrosis, chest CT, artificial intelligence

Procedia PDF Downloads 40
325 Development of Energy Benchmarks Using Mandatory Energy and Emissions Reporting Data: Ontario Post-Secondary Residences

Authors: C. Xavier Mendieta, J. J McArthur

Abstract:

Governments are playing an increasingly active role in reducing carbon emissions, and a key strategy has been the introduction of mandatory energy disclosure policies. These policies have resulted in a significant amount of publicly available data, providing researchers with a unique opportunity to develop location-specific energy and carbon emission benchmarks from this data set, which can then be used to develop building archetypes and used to inform urban energy models. This study presents the development of such a benchmark using the public reporting data. The data from Ontario’s Ministry of Energy for Post-Secondary Educational Institutions are being used to develop a series of building archetype dynamic building loads and energy benchmarks to fill a gap in the currently available building database. This paper presents the development of a benchmark for college and university residences within ASHRAE climate zone 6 areas in Ontario using the mandatory disclosure energy and greenhouse gas emissions data. The methodology presented includes data cleaning, statistical analysis, and benchmark development, and lessons learned from this investigation are presented and discussed to inform the development of future energy benchmarks from this larger data set. The key findings from this initial benchmarking study are: (1) the importance of careful data screening and outlier identification to develop a valid dataset; (2) the key features used to develop a model of the data are building age, size, and occupancy schedules and these can be used to estimate energy consumption; and (3) policy changes affecting the primary energy generation significantly affected greenhouse gas emissions, and consideration of these factors was critical to evaluate the validity of the reported data.

Keywords: building archetypes, data analysis, energy benchmarks, GHG emissions

Procedia PDF Downloads 281
324 Educational Experience, Record Keeping, Genetic Selection and Herd Management Effects on Monthly Milk Yield and Revenues of Dairy Farms in Southern Vietnam

Authors: Ngoc-Hieu Vu

Abstract:

A study was conducted to estimate the record keeping, genetic selection, educational experience, and farm management effect on monthly milk yield per farm, average milk yield per cow, monthly milk revenue per farm, and monthly milk revenue per cow of dairy farms in the Southern region of Vietnam. The dataset contained 5448 monthly record collected from January 2013 to May 2015. Results showed that longer experience increased (P < 0.001) monthly milk yields and revenues. Better educated farmers produced more monthly milk per farm and monthly milk per cow and revenues (P < 0.001) than lower educated farmers. Farm that kept records on individual animals had higher (P < 0.001) for monthly milk yields and revenues than farms that did not. Farms that used hired people produced the highest (p < 0.05) monthly milk yield per farm, milk yield per cow and revenues, followed by farms that used both hire and family members, and lowest values were for farms that used family members only. Farms that used crosses Holstein in herd were higher performance (p < 0.001) for all traits than farms that used purebred Holstein and other breeds. Farms that used genetic information and phenotypes when selecting sires were higher (p < 0.05) for all traits than farms that used only phenotypes and personal option. Farms that received help from Vet, organization staff, or government officials had higher monthly milk yield and revenues than those that decided by owner. These findings suggest that dairy farmers should be training in systematic, must be considered and continuous support to improve farm milk production and revenues, to increase the likelihood of adoption on a sustainable way.

Keywords: dairy farming, education, milk yield, Southern Vietnam

Procedia PDF Downloads 302
323 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score

Procedia PDF Downloads 112
322 An Efficient Machine Learning Model to Detect Metastatic Cancer in Pathology Scans Using Principal Component Analysis Algorithm, Genetic Algorithm, and Classification Algorithms

Authors: Bliss Singhal

Abstract:

Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the stage where cancer has spread to other parts of the body and is the cause of approximately 90% of cancer-related deaths. Normally, pathologists spend hours each day to manually classifying whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of the time and emphasizes the importance of being aware of human error and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer, saving thousands of lives and can also improve the speed and efficiency of the process, thereby taking fewer resources and time. So far, the deep learning methodology of AI has been used in research to detect cancer. This study is a novel approach to determining the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm, to reduce the dimensionality of the dataset and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbor algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.

Keywords: breast cancer, principal component analysis, genetic algorithm, k-nearest neighbors, decision tree classifier, logistic regression

Procedia PDF Downloads 62
321 Electro-Hydrodynamic Effects Due to Plasma Bullet Propagation

Authors: Panagiotis Svarnas, Polykarpos Papadopoulos

Abstract:

Atmospheric-pressure cold plasmas continue to gain increasing interest for various applications due to their unique properties, like cost-efficient production, high chemical reactivity, low gas temperature, adaptability, etc. Numerous designs have been proposed for these plasmas production in terms of electrode configuration, driving voltage waveform and working gas(es). However, in order to exploit most of the advantages of these systems, the majority of the designs are based on dielectric-barrier discharges (DBDs) either in filamentary or glow regimes. A special category of the DBD-based atmospheric-pressure cold plasmas refers to the so-called plasma jets, where a carrier noble gas is guided by the dielectric barrier (usually a hollow cylinder) and left to flow up to the atmospheric air where a complicated hydrodynamic interplay takes place. Although it is now well established that these plasmas are generated due to ionizing waves reminding in many ways streamer propagation, they exhibit discrete characteristics which are better mirrored on the terms 'guided streamers' or 'plasma bullets'. These 'bullets' travel with supersonic velocities both inside the dielectric barrier and the channel formed by the noble gas during its penetration into the air. The present work is devoted to the interpretation of the electro-hydrodynamic effects that take place downstream of the dielectric barrier opening, i.e., in the noble gas-air mixing area where plasma bullet propagate under the influence of local electric fields in regions of variable noble gas concentration. Herein, we focus on the role of the local space charge and the residual ionic charge left behind after the bullet propagation in the gas flow field modification. The study communicates both experimental and numerical results, coupled in a comprehensive manner. The plasma bullets are here produced by a custom device having a quartz tube as a dielectric barrier and two external ring-type electrodes driven by sinusoidal high voltage at 10 kHz. Helium gas is fed to the tube and schlieren photography is employed for mapping the flow field downstream of the tube orifice. Mixture mass conservation equation, momentum conservation equation, energy conservation equation in terms of temperature and helium transfer equation are simultaneously solved, leading to the physical mechanisms that govern the experimental results. Namely, we deal with electro-hydrodynamic effects mainly due to momentum transfer from atomic ions to neutrals. The atomic ions are left behind as residual charge after the bullet propagation and gain energy from the locally created electric field. The electro-hydrodynamic force is eventually evaluated.

Keywords: atmospheric-pressure plasmas, dielectric-barrier discharges, schlieren photography, electro-hydrodynamic force

Procedia PDF Downloads 123
320 Evaluating the Validity of CFD Model of Dispersion in a Complex Urban Geometry Using Two Sets of Experimental Measurements

Authors: Mohammad R. Kavian Nezhad, Carlos F. Lange, Brian A. Fleck

Abstract:

This research presents the validation study of a computational fluid dynamics (CFD) model developed to simulate the scalar dispersion emitted from rooftop sources around the buildings at the University of Alberta North Campus. The ANSYS CFX code was used to perform the numerical simulation of the wind regime and pollutant dispersion by solving the 3D steady Reynolds-averaged Navier-Stokes (RANS) equations on a building-scale high-resolution grid. The validation study was performed in two steps. First, the CFD model performance in 24 cases (eight wind directions and three wind speeds) was evaluated by comparing the predicted flow fields with the available data from the previous measurement campaign designed at the North Campus, using the standard deviation method (SDM), while the estimated results of the numerical model showed maximum average percent errors of approximately 53% and 37% for wind incidents from the North and Northwest, respectively. Good agreement with the measurements was observed for the other six directions, with an average error of less than 30%. In the second step, the reliability of the implemented turbulence model, numerical algorithm, modeling techniques, and the grid generation scheme was further evaluated using the Mock Urban Setting Test (MUST) dispersion dataset. Different statistical measures, including the fractional bias (FB), the geometric mean bias (MG), and the normalized mean square error (NMSE), were used to assess the accuracy of the predicted dispersion field. Our CFD results are in very good agreement with the field measurements.

Keywords: CFD, plume dispersion, complex urban geometry, validation study, wind flow

Procedia PDF Downloads 113
319 Supervised Machine Learning Approach for Studying the Effect of Different Joint Sets on Stability of Mine Pit Slopes Under the Presence of Different External Factors

Authors: Sudhir Kumar Singh, Debashish Chakravarty

Abstract:

Slope stability analysis is an important aspect in the field of geotechnical engineering. It is also important from safety, and economic point of view as any slope failure leads to loss of valuable lives and damage to property worth millions. This paper aims at mitigating the risk of slope failure by studying the effect of different joint sets on the stability of mine pit slopes under the influence of various external factors, namely degree of saturation, rainfall intensity, and seismic coefficients. Supervised machine learning approach has been utilized for making accurate and reliable predictions regarding the stability of slopes based on the value of Factor of Safety. Numerous cases have been studied for analyzing the stability of slopes using the popular Finite Element Method, and the data thus obtained has been used as training data for the supervised machine learning models. The input data has been trained on different supervised machine learning models, namely Random Forest, Decision Tree, Support vector Machine, and XGBoost. Distinct test data that is not present in training data has been used for measuring the performance and accuracy of different models. Although all models have performed well on the test dataset but Random Forest stands out from others due to its high accuracy of greater than 95%, thus helping us by providing a valuable tool at our disposition which is neither computationally expensive nor time consuming and in good accordance with the numerical analysis result.

Keywords: finite element method, geotechnical engineering, machine learning, slope stability

Procedia PDF Downloads 83
318 Non-Uniform Filter Banks-based Minimum Distance to Riemannian Mean Classifition in Motor Imagery Brain-Computer Interface

Authors: Ping Tan, Xiaomeng Su, Yi Shen

Abstract:

The motion intention in the motor imagery braincomputer interface is identified by classifying the event-related desynchronization (ERD) and event-related synchronization ERS characteristics of sensorimotor rhythm (SMR) in EEG signals. When the subject imagines different limbs or different parts moving, the rhythm components and bandwidth will change, which varies from person to person. How to find the effective sensorimotor frequency band of subjects is directly related to the classification accuracy of brain-computer interface. To solve this problem, this paper proposes a Minimum Distance to Riemannian Mean Classification method based on Non-Uniform Filter Banks. During the training phase, the EEG signals are decomposed into multiple different bandwidt signals by using multiple band-pass filters firstly; Then the spatial covariance characteristics of each frequency band signal are computered to be as the feature vectors. these feature vectors will be classified by the MDRM (Minimum Distance to Riemannian Mean) method, and cross validation is employed to obtain the effective sensorimotor frequency bands. During the test phase, the test signals are filtered by the bandpass filter of the effective sensorimotor frequency bands, and the extracted spatial covariance feature vectors will be classified by using the MDRM. Experiments on the BCI competition IV 2a dataset show that the proposed method is superior to other classification methods.

Keywords: non-uniform filter banks, motor imagery, brain-computer interface, minimum distance to Riemannian mean

Procedia PDF Downloads 91
317 Reducing CO2 Emission Using EDA and Weighted Sum Model in Smart Parking System

Authors: Rahman Ali, Muhammad Sajjad, Farkhund Iqbal, Muhammad Sadiq Hassan Zada, Mohammed Hussain

Abstract:

Emission of Carbon Dioxide (CO2) has adversely affected the environment. One of the major sources of CO2 emission is transportation. In the last few decades, the increase in mobility of people using vehicles has enormously increased the emission of CO2 in the environment. To reduce CO2 emission, sustainable transportation system is required in which smart parking is one of the important measures that need to be established. To contribute to the issue of reducing the amount of CO2 emission, this research proposes a smart parking system. A cloud-based solution is provided to the drivers which automatically searches and recommends the most preferred parking slots. To determine preferences of the parking areas, this methodology exploits a number of unique parking features which ultimately results in the selection of a parking that leads to minimum level of CO2 emission from the current position of the vehicle. To realize the methodology, a scenario-based implementation is considered. During the implementation, a mobile application with GPS signals, vehicles with a number of vehicle features and a list of parking areas with parking features are used by sorting, multi-level filtering, exploratory data analysis (EDA, Analytical Hierarchy Process (AHP)) and weighted sum model (WSM) to rank the parking areas and recommend the drivers with top-k most preferred parking areas. In the EDA process, “2020testcar-2020-03-03”, a freely available dataset is used to estimate CO2 emission of a particular vehicle. To evaluate the system, results of the proposed system are compared with the conventional approach, which reveal that the proposed methodology supersedes the conventional one in reducing the emission of CO2 into the atmosphere.

Keywords: car parking, Co2, Co2 reduction, IoT, merge sort, number plate recognition, smart car parking

Procedia PDF Downloads 126