Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 1148

Search results for: Imbalanced dataset

308 Physics-Informed Convolutional Neural Networks for Reservoir Simulation

Authors: Jiangxia Han, Liang Xue, Keda Chen

Abstract:

Despite the significant progress over the last decades in reservoir simulation using numerical discretization, meshing is complex. Moreover, the high degree of freedom of the space-time flow field makes the solution process very time-consuming. Therefore, we present Physics-Informed Convolutional Neural Networks(PICNN) as a hybrid scientific theory and data method for reservoir modeling. Besides labeled data, the model is driven by the scientific theories of the underlying problem, such as governing equations, boundary conditions, and initial conditions. PICNN integrates governing equations and boundary conditions into the network architecture in the form of a customized convolution kernel. The loss function is composed of data matching, initial conditions, and other measurable prior knowledge. By customizing the convolution kernel and minimizing the loss function, the neural network parameters not only fit the data but also honor the governing equation. The PICNN provides a methodology to model and history-match flow and transport problems in porous media. Numerical results demonstrate that the proposed PICNN can provide an accurate physical solution from a limited dataset. We show how this method can be applied in the context of a forward simulation for continuous problems. Furthermore, several complex scenarios are tested, including the existence of data noise, different work schedules, and different good patterns.

Keywords: convolutional neural networks, deep learning, flow and transport in porous media, physics-informed neural networks, reservoir simulation

Procedia PDF Downloads 118

307 Mapping of Geological Structures Using Aerial Photography

Authors: Ankit Sharma, Mudit Sachan, Anurag Prakash

Abstract:

Rapid growth in data acquisition technologies through drones, have led to advances and interests in collecting high-resolution images of geological fields. Being advantageous in capturing high volume of data in short flights, a number of challenges have to overcome for efficient analysis of this data, especially while data acquisition, image interpretation and processing. We introduce a method that allows effective mapping of geological fields using photogrammetric data of surfaces, drainage area, water bodies etc, which will be captured by airborne vehicles like UAVs, we are not taking satellite images because of problems in adequate resolution, time when it is captured may be 1 yr back, availability problem, difficult to capture exact image, then night vision etc. This method includes advanced automated image interpretation technology and human data interaction to model structures and. First Geological structures will be detected from the primary photographic dataset and the equivalent three dimensional structures would then be identified by digital elevation model. We can calculate dip and its direction by using the above information. The structural map will be generated by adopting a specified methodology starting from choosing the appropriate camera, camera’s mounting system, UAVs design ( based on the area and application), Challenge in air borne systems like Errors in image orientation, payload problem, mosaicing and geo referencing and registering of different images to applying DEM. The paper shows the potential of using our method for accurate and efficient modeling of geological structures, capture particularly from remote, of inaccessible and hazardous sites.

Keywords: digital elevation model, mapping, photogrammetric data analysis, geological structures

Procedia PDF Downloads 670

306 Analysing Trends in Rice Cropping Intensity and Seasonality across the Philippines Using 14 Years of Moderate Resolution Remote Sensing Imagery

Authors: Bhogendra Mishra, Andy Nelson, Mirco Boschetti, Lorenzo Busetto, Alice Laborte

Abstract:

Rice is grown on over 100 million hectares in almost every country of Asia. It is the most important staple crop for food security and has high economic and cultural importance in Asian societies. The combination of genetic diversity and management options, coupled with the large geographic extent means that there is a large variation in seasonality (when it is grown) and cropping intensity (how often it is grown per year on the same plot of land), even over relatively small distances. Seasonality and intensity can and do change over time depending on climatic, environmental and economic factors. Detecting where and when these changes happen can provide information to better understand trends in regional and even global rice production. Remote sensing offers a unique opportunity to estimate these trends. We apply the recently published PhenoRice algorithm to 14 years of moderate resolution remote sensing (MODIS) data (utilizing 250m resolution 16 day composites from Terra and Aqua) to estimate seasonality and cropping intensity per year and changes over time. We compare the results to the surveyed data collected by International Rice Research Institute (IRRI). The study results in a unique and validated dataset on the extent and change of extent, the seasonality and change in seasonality and the cropping intensity and change in cropping intensity between 2003 and 2016 for the Philippines. Observed trends and their implications for food security and trade policies are also discussed.

Keywords: rice, cropping intensity, moderate resolution remote sensing (MODIS), phenology, seasonality

Procedia PDF Downloads 283

305 Lung HRCT Pattern Classification for Cystic Fibrosis Using a Convolutional Neural Network

Authors: Parisa Mansour

Abstract:

Cystic fibrosis (CF) is one of the most common autosomal recessive diseases among whites. It mostly affects the lungs, causing infections and inflammation that account for 90% of deaths in CF patients. Because of this high variability in clinical presentation and organ involvement, investigating treatment responses and evaluating lung changes over time is critical to preventing CF progression. High-resolution computed tomography (HRCT) greatly facilitates the assessment of lung disease progression in CF patients. Recently, artificial intelligence was used to analyze chest CT scans of CF patients. In this paper, we propose a convolutional neural network (CNN) approach to classify CF lung patterns in HRCT images. The proposed network consists of two convolutional layers with 3 × 3 kernels and maximally connected in each layer, followed by two dense layers with 1024 and 10 neurons, respectively. The softmax layer prepares a predicted output probability distribution between classes. This layer has three exits corresponding to the categories of normal (healthy), bronchitis and inflammation. To train and evaluate the network, we constructed a patch-based dataset extracted from more than 1100 lung HRCT slices obtained from 45 CF patients. Comparative evaluation showed the effectiveness of the proposed CNN compared to its close peers. Classification accuracy, average sensitivity and specificity of 93.64%, 93.47% and 96.61% were achieved, indicating the potential of CNNs in analyzing lung CF patterns and monitoring lung health. In addition, the visual features extracted by our proposed method can be useful for automatic measurement and finally evaluation of the severity of CF patterns in lung HRCT images.

Keywords: HRCT, CF, cystic fibrosis, chest CT, artificial intelligence

Procedia PDF Downloads 49

304 Development of Energy Benchmarks Using Mandatory Energy and Emissions Reporting Data: Ontario Post-Secondary Residences

Authors: C. Xavier Mendieta, J. J McArthur

Abstract:

Governments are playing an increasingly active role in reducing carbon emissions, and a key strategy has been the introduction of mandatory energy disclosure policies. These policies have resulted in a significant amount of publicly available data, providing researchers with a unique opportunity to develop location-specific energy and carbon emission benchmarks from this data set, which can then be used to develop building archetypes and used to inform urban energy models. This study presents the development of such a benchmark using the public reporting data. The data from Ontario’s Ministry of Energy for Post-Secondary Educational Institutions are being used to develop a series of building archetype dynamic building loads and energy benchmarks to fill a gap in the currently available building database. This paper presents the development of a benchmark for college and university residences within ASHRAE climate zone 6 areas in Ontario using the mandatory disclosure energy and greenhouse gas emissions data. The methodology presented includes data cleaning, statistical analysis, and benchmark development, and lessons learned from this investigation are presented and discussed to inform the development of future energy benchmarks from this larger data set. The key findings from this initial benchmarking study are: (1) the importance of careful data screening and outlier identification to develop a valid dataset; (2) the key features used to develop a model of the data are building age, size, and occupancy schedules and these can be used to estimate energy consumption; and (3) policy changes affecting the primary energy generation significantly affected greenhouse gas emissions, and consideration of these factors was critical to evaluate the validity of the reported data.

Keywords: building archetypes, data analysis, energy benchmarks, GHG emissions

Procedia PDF Downloads 286

303 Educational Experience, Record Keeping, Genetic Selection and Herd Management Effects on Monthly Milk Yield and Revenues of Dairy Farms in Southern Vietnam

Authors: Ngoc-Hieu Vu

Abstract:

A study was conducted to estimate the record keeping, genetic selection, educational experience, and farm management effect on monthly milk yield per farm, average milk yield per cow, monthly milk revenue per farm, and monthly milk revenue per cow of dairy farms in the Southern region of Vietnam. The dataset contained 5448 monthly record collected from January 2013 to May 2015. Results showed that longer experience increased (P < 0.001) monthly milk yields and revenues. Better educated farmers produced more monthly milk per farm and monthly milk per cow and revenues (P < 0.001) than lower educated farmers. Farm that kept records on individual animals had higher (P < 0.001) for monthly milk yields and revenues than farms that did not. Farms that used hired people produced the highest (p < 0.05) monthly milk yield per farm, milk yield per cow and revenues, followed by farms that used both hire and family members, and lowest values were for farms that used family members only. Farms that used crosses Holstein in herd were higher performance (p < 0.001) for all traits than farms that used purebred Holstein and other breeds. Farms that used genetic information and phenotypes when selecting sires were higher (p < 0.05) for all traits than farms that used only phenotypes and personal option. Farms that received help from Vet, organization staff, or government officials had higher monthly milk yield and revenues than those that decided by owner. These findings suggest that dairy farmers should be training in systematic, must be considered and continuous support to improve farm milk production and revenues, to increase the likelihood of adoption on a sustainable way.

Keywords: dairy farming, education, milk yield, Southern Vietnam

Procedia PDF Downloads 308

302 Customer Churn Prediction by Using Four Machine Learning Algorithms Integrating Features Selection and Normalization in the Telecom Sector

Authors: Alanoud Moraya Aldalan, Abdulaziz Almaleh

Abstract:

A crucial component of maintaining a customer-oriented business as in the telecom industry is understanding the reasons and factors that lead to customer churn. Competition between telecom companies has greatly increased in recent years. It has become more important to understand customers’ needs in this strong market of telecom industries, especially for those who are looking to turn over their service providers. So, predictive churn is now a mandatory requirement for retaining those customers. Machine learning can be utilized to accomplish this. Churn Prediction has become a very important topic in terms of machine learning classification in the telecommunications industry. Understanding the factors of customer churn and how they behave is very important to building an effective churn prediction model. This paper aims to predict churn and identify factors of customers’ churn based on their past service usage history. Aiming at this objective, the study makes use of feature selection, normalization, and feature engineering. Then, this study compared the performance of four different machine learning algorithms on the Orange dataset: Logistic Regression, Random Forest, Decision Tree, and Gradient Boosting. Evaluation of the performance was conducted by using the F1 score and ROC-AUC. Comparing the results of this study with existing models has proven to produce better results. The results showed the Gradients Boosting with feature selection technique outperformed in this study by achieving a 99% F1-score and 99% AUC, and all other experiments achieved good results as well.

Keywords: machine learning, gradient boosting, logistic regression, churn, random forest, decision tree, ROC, AUC, F1-score

Procedia PDF Downloads 121

301 An Efficient Machine Learning Model to Detect Metastatic Cancer in Pathology Scans Using Principal Component Analysis Algorithm, Genetic Algorithm, and Classification Algorithms

Authors: Bliss Singhal

Abstract:

Machine learning (ML) is a branch of Artificial Intelligence (AI) where computers analyze data and find patterns in the data. The study focuses on the detection of metastatic cancer using ML. Metastatic cancer is the stage where cancer has spread to other parts of the body and is the cause of approximately 90% of cancer-related deaths. Normally, pathologists spend hours each day to manually classifying whether tumors are benign or malignant. This tedious task contributes to mislabeling metastasis being over 60% of the time and emphasizes the importance of being aware of human error and other inefficiencies. ML is a good candidate to improve the correct identification of metastatic cancer, saving thousands of lives and can also improve the speed and efficiency of the process, thereby taking fewer resources and time. So far, the deep learning methodology of AI has been used in research to detect cancer. This study is a novel approach to determining the potential of using preprocessing algorithms combined with classification algorithms in detecting metastatic cancer. The study used two preprocessing algorithms: principal component analysis (PCA) and the genetic algorithm, to reduce the dimensionality of the dataset and then used three classification algorithms: logistic regression, decision tree classifier, and k-nearest neighbors to detect metastatic cancer in the pathology scans. The highest accuracy of 71.14% was produced by the ML pipeline comprising of PCA, the genetic algorithm, and the k-nearest neighbor algorithm, suggesting that preprocessing and classification algorithms have great potential for detecting metastatic cancer.

Keywords: breast cancer, principal component analysis, genetic algorithm, k-nearest neighbors, decision tree classifier, logistic regression

Procedia PDF Downloads 65

300 Evaluating the Validity of CFD Model of Dispersion in a Complex Urban Geometry Using Two Sets of Experimental Measurements

Authors: Mohammad R. Kavian Nezhad, Carlos F. Lange, Brian A. Fleck

Abstract:

This research presents the validation study of a computational fluid dynamics (CFD) model developed to simulate the scalar dispersion emitted from rooftop sources around the buildings at the University of Alberta North Campus. The ANSYS CFX code was used to perform the numerical simulation of the wind regime and pollutant dispersion by solving the 3D steady Reynolds-averaged Navier-Stokes (RANS) equations on a building-scale high-resolution grid. The validation study was performed in two steps. First, the CFD model performance in 24 cases (eight wind directions and three wind speeds) was evaluated by comparing the predicted flow fields with the available data from the previous measurement campaign designed at the North Campus, using the standard deviation method (SDM), while the estimated results of the numerical model showed maximum average percent errors of approximately 53% and 37% for wind incidents from the North and Northwest, respectively. Good agreement with the measurements was observed for the other six directions, with an average error of less than 30%. In the second step, the reliability of the implemented turbulence model, numerical algorithm, modeling techniques, and the grid generation scheme was further evaluated using the Mock Urban Setting Test (MUST) dispersion dataset. Different statistical measures, including the fractional bias (FB), the geometric mean bias (MG), and the normalized mean square error (NMSE), were used to assess the accuracy of the predicted dispersion field. Our CFD results are in very good agreement with the field measurements.

Keywords: CFD, plume dispersion, complex urban geometry, validation study, wind flow

Procedia PDF Downloads 120

299 Supervised Machine Learning Approach for Studying the Effect of Different Joint Sets on Stability of Mine Pit Slopes Under the Presence of Different External Factors

Authors: Sudhir Kumar Singh, Debashish Chakravarty

Abstract:

Slope stability analysis is an important aspect in the field of geotechnical engineering. It is also important from safety, and economic point of view as any slope failure leads to loss of valuable lives and damage to property worth millions. This paper aims at mitigating the risk of slope failure by studying the effect of different joint sets on the stability of mine pit slopes under the influence of various external factors, namely degree of saturation, rainfall intensity, and seismic coefficients. Supervised machine learning approach has been utilized for making accurate and reliable predictions regarding the stability of slopes based on the value of Factor of Safety. Numerous cases have been studied for analyzing the stability of slopes using the popular Finite Element Method, and the data thus obtained has been used as training data for the supervised machine learning models. The input data has been trained on different supervised machine learning models, namely Random Forest, Decision Tree, Support vector Machine, and XGBoost. Distinct test data that is not present in training data has been used for measuring the performance and accuracy of different models. Although all models have performed well on the test dataset but Random Forest stands out from others due to its high accuracy of greater than 95%, thus helping us by providing a valuable tool at our disposition which is neither computationally expensive nor time consuming and in good accordance with the numerical analysis result.

Keywords: finite element method, geotechnical engineering, machine learning, slope stability

Procedia PDF Downloads 88

298 Non-Uniform Filter Banks-based Minimum Distance to Riemannian Mean Classifition in Motor Imagery Brain-Computer Interface

Authors: Ping Tan, Xiaomeng Su, Yi Shen

Abstract:

The motion intention in the motor imagery braincomputer interface is identified by classifying the event-related desynchronization (ERD) and event-related synchronization ERS characteristics of sensorimotor rhythm (SMR) in EEG signals. When the subject imagines different limbs or different parts moving, the rhythm components and bandwidth will change, which varies from person to person. How to find the effective sensorimotor frequency band of subjects is directly related to the classification accuracy of brain-computer interface. To solve this problem, this paper proposes a Minimum Distance to Riemannian Mean Classification method based on Non-Uniform Filter Banks. During the training phase, the EEG signals are decomposed into multiple different bandwidt signals by using multiple band-pass filters firstly; Then the spatial covariance characteristics of each frequency band signal are computered to be as the feature vectors. these feature vectors will be classified by the MDRM (Minimum Distance to Riemannian Mean) method, and cross validation is employed to obtain the effective sensorimotor frequency bands. During the test phase, the test signals are filtered by the bandpass filter of the effective sensorimotor frequency bands, and the extracted spatial covariance feature vectors will be classified by using the MDRM. Experiments on the BCI competition IV 2a dataset show that the proposed method is superior to other classification methods.

Keywords: non-uniform filter banks, motor imagery, brain-computer interface, minimum distance to Riemannian mean

Procedia PDF Downloads 97

297 Reducing CO2 Emission Using EDA and Weighted Sum Model in Smart Parking System

Authors: Rahman Ali, Muhammad Sajjad, Farkhund Iqbal, Muhammad Sadiq Hassan Zada, Mohammed Hussain

Abstract:

Emission of Carbon Dioxide (CO₂) has adversely affected the environment. One of the major sources of CO₂emission is transportation. In the last few decades, the increase in mobility of people using vehicles has enormously increased the emission of CO₂ in the environment. To reduce CO₂ emission, sustainable transportation system is required in which smart parking is one of the important measures that need to be established. To contribute to the issue of reducing the amount of CO₂emission, this research proposes a smart parking system. A cloud-based solution is provided to the drivers which automatically searches and recommends the most preferred parking slots. To determine preferences of the parking areas, this methodology exploits a number of unique parking features which ultimately results in the selection of a parking that leads to minimum level of CO₂emission from the current position of the vehicle. To realize the methodology, a scenario-based implementation is considered. During the implementation, a mobile application with GPS signals, vehicles with a number of vehicle features and a list of parking areas with parking features are used by sorting, multi-level filtering, exploratory data analysis (EDA, Analytical Hierarchy Process (AHP)) and weighted sum model (WSM) to rank the parking areas and recommend the drivers with top-k most preferred parking areas. In the EDA process, “2020testcar-2020-03-03”, a freely available dataset is used to estimate CO₂emission of a particular vehicle. To evaluate the system, results of the proposed system are compared with the conventional approach, which reveal that the proposed methodology supersedes the conventional one in reducing the emission of CO₂into the atmosphere.

Keywords: car parking, Co2, Co2 reduction, IoT, merge sort, number plate recognition, smart car parking

Procedia PDF Downloads 130

296 Acceleration-Based Motion Model for Visual Simultaneous Localization and Mapping

Authors: Daohong Yang, Xiang Zhang, Lei Li, Wanting Zhou

Abstract:

Visual Simultaneous Localization and Mapping (VSLAM) is a technology that obtains information in the environment for self-positioning and mapping. It is widely used in computer vision, robotics and other fields. Many visual SLAM systems, such as OBSLAM3, employ a constant-speed motion model that provides the initial pose of the current frame to improve the speed and accuracy of feature matching. However, in actual situations, the constant velocity motion model is often difficult to be satisfied, which may lead to a large deviation between the obtained initial pose and the real value, and may lead to errors in nonlinear optimization results. Therefore, this paper proposed a motion model based on acceleration, which can be applied on most SLAM systems. In order to better describe the acceleration of the camera pose, we decoupled the pose transformation matrix, and calculated the rotation matrix and the translation vector respectively, where the rotation matrix is represented by rotation vector. We assume that, in a short period of time, the changes of rotating angular velocity and translation vector remain the same. Based on this assumption, the initial pose of the current frame is estimated. In addition, the error of constant velocity model was analyzed theoretically. Finally, we applied our proposed approach to the ORBSLAM3 system and evaluated two sets of sequences on the TUM dataset. The results showed that our proposed method had a more accurate initial pose estimation and the accuracy of ORBSLAM3 system is improved by 6.61% and 6.46% respectively on the two test sequences.

Keywords: error estimation, constant acceleration motion model, pose estimation, visual SLAM

Procedia PDF Downloads 75

295 Epilepsy Seizure Prediction by Effective Connectivity Estimation Using Granger Causality and Directed Transfer Function Analysis of Multi-Channel Electroencephalogram

Authors: Mona Hejazi, Ali Motie Nasrabadi

Abstract:

Epilepsy is a persistent neurological disorder that affects more than 50 million people worldwide. Hence, there is a necessity to introduce an efficient prediction model for making a correct diagnosis of the epileptic seizure and accurate prediction of its type. In this study we consider how the Effective Connectivity (EC) patterns obtained from intracranial Electroencephalographic (EEG) recordings reveal information about the dynamics of the epileptic brain and can be used to predict imminent seizures, as this will enable the patients (and caregivers) to take appropriate precautions. We use this definition because we believe that effective connectivity near seizures begin to change, so we can predict seizures according to this feature. Results are reported on the standard Freiburg EEG dataset which contains data from 21 patients suffering from medically intractable focal epilepsy. Six channels of EEG from each patients are considered and effective connectivity using Directed Transfer Function (DTF) and Granger Causality (GC) methods is estimated. We concentrate on effective connectivity standard deviation over time and feature changes in five brain frequency sub-bands (Alpha, Beta, Theta, Delta, and Gamma) are compared. The performance obtained for the proposed scheme in predicting seizures is: average prediction time is 50 minutes before seizure onset, the maximum sensitivity is approximate ~80% and the false positive rate is 0.33 FP/h. DTF method is more acceptable to predict epileptic seizures and generally we can observe that the greater results are in gamma and beta sub-bands. The research of this paper is significantly helpful for clinical applications, especially for the exploitation of online portable devices.

Keywords: effective connectivity, Granger causality, directed transfer function, epilepsy seizure prediction, EEG

Procedia PDF Downloads 445

294 Molecular Identification and Evolutionary Status of Lucilia bufonivora: An Obligate Parasite of Amphibians in Europe

Authors: Gerardo Arias, Richard Wall, Jamie Stevens

Abstract:

Lucilia bufonivora Moniez, is an obligate parasite of toads and frogs widely distributed in Europe. Its sister taxon Lucilia silvarum Meigen behaves mainly as a carrion breeder in Europe, however it has been reported as a facultative parasite of amphibians. These two closely related species are morphologically almost identical, which has led to misidentification, and in fact, it has been suggested that the amphibian myiasis cases by L. silvarum reported in Europe should be attributed to L. bufonivora. Both species remain poorly studied and their taxonomic relationships are still unclear. The identification of the larval specimens involved in amphibian myiasis with molecular tools and phylogenetic analysis of these two closely related species may resolve this problem. In this work seventeen unidentified larval specimens extracted from toad myiasis cases of the UK, the Netherlands and Switzerland were obtained, their COX1 (mtDNA) and EF1-α (Nuclear DNA) gene regions were amplified and then sequenced. The 17 larval samples were identified with both molecular markers as L. bufonivora. Phylogenetic analysis was carried out with 10 other blowfly species, including L. silvarum samples from the UK and USA. Bayesian Inference trees of COX1 and a combined-gene dataset suggested that L. silvarum and L. bufonivora are separate sister species. However, the nuclear gene EF1-α does not appear to resolve their relationships, suggesting that the rates of evolution of the mtDNA are much faster than those of the nuclear DNA. This work provides the molecular evidence for successful identification of L. bufonivora and a molecular analysis of the populations of this obligate parasite from different locations across Europe. The relationships with L. silvarum are discussed.

Keywords: calliphoridae, molecular evolution, myiasis, obligate parasitism

Procedia PDF Downloads 215

293 Investigation and Analysis of Residential Building Energy End-Use Profile in Hot and Humid Area with Reference to Zhuhai City in China

Authors: Qingqing Feng, S. Thomas Ng, Frank Xu

Abstract:

Energy consumption in domestic sector has been increasing rapidly in China all along these years. Confronted with environmental challenges, the international society has made a concerted effort by setting the Paris Agreement, the Sustainable Development Goals, and the New Urban Agenda. Thus it’s very important for China to put forward reasonable countermeasures to boost building energy conservation which necessitates looking into the actuality of residential energy end-use profile and its influence factors. In this study, questionnaire surveys have been conducted in Zhuhai city in China, a typical city in hot summer warm winter climate zone. The data solicited mainly include the occupancy schedule, building’s information, residents’ information, household energy uses, the type, quantity and use patterns of appliances and occupants’ satisfaction. Over 200 valid samples have been collected through face-to-face interviews. Descriptive analysis, clustering analysis, correlation analysis and sensitivity analysis were then conducted on the dataset to understand the energy end-use profile. The findings identify: 1) several typical clusters of occupancy patterns and appliances utilization patterns; 2) the top three sensitive factors influencing energy consumption; 3) the correlations between satisfaction and energy consumption. For China with many different climates zones, it’s difficult to find a silver bullet on energy conservation. The aim of this paper is to provide a theoretical basis for multi-stakeholders including policy makers, residents, and academic communities to formulate reasonable energy saving blueprints for hot and humid urban residential buildings in China.

Keywords: residential building, energy end-use profile, questionnaire survey, sustainability

Procedia PDF Downloads 113

292 Current Methods for Drug Property Prediction in the Real World

Authors: Jacob Green, Cecilia Cabrera, Maximilian Jakobs, Andrea Dimitracopoulos, Mark van der Wilk, Ryan Greenhalgh

Abstract:

Predicting drug properties is key in drug discovery to enable de-risking of assets before expensive clinical trials and to find highly active compounds faster. Interest from the machine learning community has led to the release of a variety of benchmark datasets and proposed methods. However, it remains unclear for practitioners which method or approach is most suitable, as different papers benchmark on different datasets and methods, leading to varying conclusions that are not easily compared. Our large-scale empirical study links together numerous earlier works on different datasets and methods, thus offering a comprehensive overview of the existing property classes, datasets, and their interactions with different methods. We emphasise the importance of uncertainty quantification and the time and, therefore, cost of applying these methods in the drug development decision-making cycle. To the best of the author's knowledge, it has been observed that the optimal approach varies depending on the dataset and that engineered features with classical machine learning methods often outperform deep learning. Specifically, QSAR datasets are typically best analysed with classical methods such as Gaussian Processes, while ADMET datasets are sometimes better described by Trees or deep learning methods such as Graph Neural Networks or language models. Our work highlights that practitioners do not yet have a straightforward, black-box procedure to rely on and sets a precedent for creating practitioner-relevant benchmarks. Deep learning approaches must be proven on these benchmarks to become the practical method of choice in drug property prediction.

Keywords: activity (QSAR), ADMET, classical methods, drug property prediction, empirical study, machine learning

Procedia PDF Downloads 56

291 A Review of Deep Learning Methods in Computer-Aided Detection and Diagnosis Systems based on Whole Mammogram and Ultrasound Scan Classification

Authors: Ian Omung'a

Abstract:

Breast cancer remains to be one of the deadliest cancers for women worldwide, with the risk of developing tumors being as high as 50 percent in Sub-Saharan African countries like Kenya. With as many as 42 percent of these cases set to be diagnosed late when cancer has metastasized and or the prognosis has become terminal, Full Field Digital [FFD] Mammography remains an effective screening technique that leads to early detection where in most cases, successful interventions can be made to control or eliminate the tumors altogether. FFD Mammograms have been proven to multiply more effective when used together with Computer-Aided Detection and Diagnosis [CADe] systems, relying on algorithmic implementations of Deep Learning techniques in Computer Vision to carry out deep pattern recognition that is comparable to the level of a human radiologist and decipher whether specific areas of interest in the mammogram scan image portray abnormalities if any and whether these abnormalities are indicative of a benign or malignant tumor. Within this paper, we review emergent Deep Learning techniques that will prove relevant to the development of State-of-The-Art FFD Mammogram CADe systems. These techniques will span self-supervised learning for context-encoded occlusion, self-supervised learning for pre-processing and labeling automation, as well as the creation of a standardized large-scale mammography dataset as a benchmark for CADe systems' evaluation. Finally, comparisons are drawn between existing practices that pre-date these techniques and how the development of CADe systems that incorporate them will be different.

Keywords: breast cancer diagnosis, computer aided detection and diagnosis, deep learning, whole mammogram classfication, ultrasound classification, computer vision

Procedia PDF Downloads 78

290 Development of Medical Intelligent Process Model Using Ontology Based Technique

Authors: Emmanuel Chibuogu Asogwa, Tochukwu Sunday Belonwu

Abstract:

An urgent demand for creative solutions has been created by the rapid expansion of medical knowledge, the complexity of patient care, and the requirement for more precise decision-making. As a solution to this problem, the creation of a Medical Intelligent Process Model (MIPM) utilizing ontology-based appears as a promising way to overcome this obstacle and unleash the full potential of healthcare systems. The development of a Medical Intelligent Process Model (MIPM) using ontology-based techniques is motivated by a lack of quick access to relevant medical information and advanced tools for treatment planning and clinical decision-making, which ontology-based techniques can provide. The aim of this work is to develop a structured and knowledge-driven framework that leverages ontology, a formal representation of domain knowledge, to enhance various aspects of healthcare. Object-Oriented Analysis and Design Methodology (OOADM) were adopted in the design of the system as we desired to build a usable and evolvable application. For effective implementation of this work, we used the following materials/methods/tools: the medical dataset for the test of our model in this work was obtained from Kaggle. The ontology-based technique was used with Confusion Matrix, MySQL, Python, Hypertext Markup Language (HTML), Hypertext Preprocessor (PHP), Cascaded Style Sheet (CSS), JavaScript, Dreamweaver, and Fireworks. According to test results on the new system using Confusion Matrix, both the accuracy and overall effectiveness of the medical intelligent process significantly improved by 20% compared to the previous system. Therefore, using the model is recommended for healthcare professionals.

Keywords: ontology-based, model, database, OOADM, healthcare

Procedia PDF Downloads 61

289 Integrating Molecular Approaches to Understand Diatom Assemblages in Marine Environment

Authors: Shruti Malviya, Chris Bowler

Abstract:

Environmental processes acting at multiple spatial scales control marine diatom community structure. However, the contribution of local factors (e.g., temperature, salinity, etc.) in these highly complex systems is poorly understood. We, therefore, investigated the diatom community organization as a function of environmental predictors and determined the relative contribution of various environmental factors on the structure of marine diatoms assemblages in the world’s ocean. The dataset for this study was derived from the Tara Oceans expedition, constituting 46 sampling stations from diverse oceanic provinces. The V9 hypervariable region of 18s rDNA was organized into assemblages based on their distributional co-occurrence. Using Ward’s hierarchical clustering, nine clusters were defined. The number of ribotypes and reads varied within each cluster-three clusters (II, VIII and IX) contained only a few reads whereas two of them (I and IV) were highly abundant. Of the nine clusters, seven can be divided into two categories defined by a positive correlation with phosphate and nitrate and a negative correlation with longitude and, the other by a negative correlation with salinity, temperature, latitude and positive correlation with Lyapunov exponent. All the clusters were found to be remarkably dominant in South Pacific Ocean and can be placed into three classes, namely Southern Ocean-South Pacific Ocean clusters (I, II, V, VIII, IX), South Pacific Ocean clusters (IV and VII), and cosmopolitan clusters (III and VI). Our findings showed that co-occurring ribotypes can be significantly associated into recognizable clusters which exhibit a distinct response to environmental variables. This study, thus, demonstrated distinct behavior of each recognized assemblage displaying a taxonomic and environmental signature.

Keywords: assemblage, diatoms, hierarchical clustering, Tara Oceans

Procedia PDF Downloads 185

288 Analyzing the Climate Change Impact and Farmer's Adaptability Strategies in Khyber Pakhtunkhwa, Pakistan

Authors: Khuram Nawaz Sadozai, Sonia

Abstract:

The agriculture sector is deemed more vulnerable to climate change as its variation can directly affect the crop’s productivity, but farmers’ adaptation strategies play a vital role in climate change-agriculture relationship. Therefore, this research has been undertaken to assess the Climate Change impact on wheat productivity and farmers’ adaptability strategies in Khyber Pakhtunkhwa province, Pakistan. The panel dataset was analyzed to gauge the impact of changing climate variables (i.e., temperature, rainfall, and humidity) on wheat productivity from 1985 to 2015. Amid the study period, the fixed effect estimates confirmed an inverse relationship of temperature and rainfall on the wheat yield. The impact of temperature is observed to be detrimental as compared to the rainfall, causing 0.07 units reduction in the production of wheat with 1C upsurge in temperature. On the flip side, humidity revealed a positive association with the wheat productivity by confirming that high humidity could be beneficial to the production of the crop over time. Thus, this study ensures significant nexus between agricultural production and climatic parameters. However, the farming community in the underlying study area has limited knowledge about the adaptation strategies to lessen the detrimental impact of changing climate on crop yield. It is recommended that farmers should be well equipped with training and advanced agricultural management practices under the realm of climate change. Moreover, innovative technologies pertinent to the agriculture system should be encouraged to handle the challenges arising due to variations in climate factors.

Keywords: climate change, fixed effect model, panel data, wheat productivity

Procedia PDF Downloads 103

287 2D Convolutional Networks for Automatic Segmentation of Knee Cartilage in 3D MRI

Authors: Ananya Ananya, Karthik Rao

Abstract:

Accurate segmentation of knee cartilage in 3-D magnetic resonance (MR) images for quantitative assessment of volume is crucial for studying and diagnosing osteoarthritis (OA) of the knee, one of the major causes of disability in elderly people. Radiologists generally perform this task in slice-by-slice manner taking 15-20 minutes per 3D image, and lead to high inter and intra observer variability. Hence automatic methods for knee cartilage segmentation are desirable and are an active field of research. This paper presents design and experimental evaluation of 2D convolutional neural networks based fully automated methods for knee cartilage segmentation in 3D MRI. The architectures are validated based on 40 test images and 60 training images from SKI10 dataset. The proposed methods segment 2D slices one by one, which are then combined to give segmentation for whole 3D images. Proposed methods are modified versions of U-net and dilated convolutions, consisting of a single step that segments the given image to 5 labels: background, femoral cartilage, tibia cartilage, femoral bone and tibia bone; cartilages being the primary components of interest. U-net consists of a contracting path and an expanding path, to capture context and localization respectively. Dilated convolutions lead to an exponential expansion of receptive field with only a linear increase in a number of parameters. A combination of modified U-net and dilated convolutions has also been explored. These architectures segment one 3D image in 8 – 10 seconds giving average volumetric Dice Score Coefficients (DSC) of 0.950 - 0.962 for femoral cartilage and 0.951 - 0.966 for tibia cartilage, reference being the manual segmentation.

Keywords: convolutional neural networks, dilated convolutions, 3 dimensional, fully automated, knee cartilage, MRI, segmentation, U-net

Procedia PDF Downloads 242

286 EcoMush: Mapping Sustainable Mushroom Production in Bangladesh

Authors: A. A. Sadia, A. Emdad, E. Hossain

Abstract:

The increasing importance of mushrooms as a source of nutrition, health benefits, and even potential cancer treatment has raised awareness of the impact of climate-sensitive variables on their cultivation. Factors like temperature, relative humidity, air quality, and substrate composition play pivotal roles in shaping mushroom growth, especially in Bangladesh. Oyster mushrooms, a commonly cultivated variety in this region, are particularly vulnerable to climate fluctuations. This research explores the climatic dynamics affecting oyster mushroom cultivation and, presents an approach to address these challenges and provides tangible solutions to fortify the agro-economy, ensure food security, and promote the sustainability of this crucial food source. Using climate and production data, this study evaluates the performance of three clustering algorithms -KMeans, OPTICS, and BIRCH- based on various quality metrics. While each algorithm demonstrates specific strengths, the findings provide insights into their effectiveness for this specific dataset. The results yield essential information, pinpointing the optimal temperature range of 13°C-22°C, the unfavorable temperature threshold of 28°C and above, and the ideal relative humidity range of 75-85% with the suitable production regions in three different seasons: Kharif-1, 2, and Robi. Additionally, a user-friendly web application is developed to support mushroom farmers in making well-informed decisions about their cultivation practices. This platform offers valuable insights into the most advantageous periods for oyster mushroom farming, with the overarching goal of enhancing the efficiency and profitability of mushroom farming.

Keywords: climate variability, mushroom cultivation, clustering techniques, food security, sustainability, web-application

Procedia PDF Downloads 45

285 Grassland Phenology in Different Eco-Geographic Regions over the Tibetan Plateau

Authors: Jiahua Zhang, Qing Chang, Fengmei Yao

Abstract:

Studying on the response of vegetation phenology to climate change at different temporal and spatial scales is important for understanding and predicting future terrestrial ecosystem dynamics andthe adaptation of ecosystems to global change. In this study, the Moderate Resolution Imaging Spectroradiometer (MODIS) Normalized Difference Vegetation Index (NDVI) dataset and climate data were used to analyze the dynamics of grassland phenology as well as their correlation with climatic factors in different eco-geographic regions and elevation units across the Tibetan Plateau. The results showed that during 2003–2012, the start of the grassland greening season (SOS) appeared later while the end of the growing season (EOS) appeared earlier following the plateau’s precipitation and heat gradients from southeast to northwest. The multi-year mean value of SOS showed differences between various eco-geographic regions and was significantly impacted by average elevation and regional average precipitation during spring. Regional mean differences for EOS were mainly regulated by mean temperature during autumn. Changes in trends of SOS in the central and eastern eco-geographic regions were coupled to the mean temperature during spring, advancing by about 7d/°C. However, in the two southwestern eco-geographic regions, SOS was delayed significantly due to the impact of spring precipitation. The results also showed that the SOS occurred later with increasing elevation, as expected, with a delay rate of 0.66 d/100m. For 2003–2012, SOS showed an advancing trend in low-elevation areas, but a delayed trend in high-elevation areas, while EOS was delayed in low-elevation areas, but advanced in high-elevation areas. Grassland SOS and EOS changes may be influenced by a variety of other environmental factors in each eco-geographic region.

Keywords: grassland, phenology, MODIS, eco-geographic regions, elevation, climatic factors, Tibetan Plateau

Procedia PDF Downloads 308

284 Remote Sensing through Deep Neural Networks for Satellite Image Classification

Authors: Teja Sai Puligadda

Abstract:

Satellite images in detail can serve an important role in the geographic study. Quantitative and qualitative information provided by the satellite and remote sensing images minimizes the complexity of work and time. Data/images are captured at regular intervals by satellite remote sensing systems, and the amount of data collected is often enormous, and it expands rapidly as technology develops. Interpreting remote sensing images, geographic data mining, and researching distinct vegetation types such as agricultural and forests are all part of satellite image categorization. One of the biggest challenge data scientists faces while classifying satellite images is finding the best suitable classification algorithms based on the available that could able to classify images with utmost accuracy. In order to categorize satellite images, which is difficult due to the sheer volume of data, many academics are turning to deep learning machine algorithms. As, the CNN algorithm gives high accuracy in image recognition problems and automatically detects the important features without any human supervision and the ANN algorithm stores information on the entire network (Abhishek Gupta., 2020), these two deep learning algorithms have been used for satellite image classification. This project focuses on remote sensing through Deep Neural Networks i.e., ANN and CNN with Deep Sat (SAT-4) Airborne dataset for classifying images. Thus, in this project of classifying satellite images, the algorithms ANN and CNN are implemented, evaluated & compared and the performance is analyzed through evaluation metrics such as Accuracy and Loss. Additionally, the Neural Network algorithm which gives the lowest bias and lowest variance in solving multi-class satellite image classification is analyzed.

Keywords: artificial neural network, convolutional neural network, remote sensing, accuracy, loss

Procedia PDF Downloads 138

283 Applications of Multivariate Statistical Methods on Geochemical Data to Evaluate the Hydrocarbons Source Rocks and Oils from Ghadames Basin, NW Libya

Authors: Mohamed Hrouda

Abstract:

The Principal Component Analysis (PCA) was performed on a dataset comprising 41 biomarker concentrations from twenty-three core source rocks samples and seven oil samples from different location, with the objective of establishing the major sources of variance within the steranes, tricyclic terpanes, hopanes, and triaromatic steroid. This type of analysis can be used as an aid when deciding which molecular biomarker maturity, source facies or depositional environment parameters should be plotted, because the principal component loadings plots tend to extract the biomarker variables related to maturity, source facies or depositional environment controls. Facies characterization of the source rock samples separate the Silurian and Devonian source rock samples into three groups. Maturity evaluation of source rock samples based on biomarker and aromatic hydrocarbon distributions indicates that not all the samples are strongly affected by maturity, the Upper Devonian samples from wells located in the northern part of the basin are immature, whereas the other samples which have been selected from the Lower Silurian are mature and have reached the main stage of the oil window, the Lower Silurian source rock strata revealed a trend of increasing maturity towards the south and southwestern part of Ghadames Basin. Most of the facies-based parameters employed in this project using biomarker distributions clearly separate the oil samples into three groups. Group I contain oil samples from wells within Al-Wafa oil field Located in the south western part of the basin, Group II contains oil samples collected from Al-Hamada oil field complex in the south and the third group contains oil samples collected from oil fields located in the north

Keywords: Ghadamis basin, geochemistry, silurian, devonian

Procedia PDF Downloads 47

282 Unpacking the Summarising Event in Trauma Emergencies: The Case of Pre-briefings

Authors: Professor Jo Angouri, Polina Mesinioti, Chris Turner

Abstract:

In order for a group of ad-hoc professional to perform as a team, a shared understanding of the problem at hand and an agreed action plan are necessary components. This is particularly significant in complex, time sensitive professional settings such as in trauma emergencies. In this context, team briefings prior to the patient arrival (pre-briefings) constitute a critical event for the performance of the team; they provide the necessary space for co-constructing a shared understanding of the situation through summarising information available to the team: yet the act of summarising is widely assumed in medical practice but not systematically researched. In the vast teamwork literature, terms such as ‘shared mental model’, ‘mental space’ and ‘cognate labelling’ are used extensively, and loosely, to denote the outcome of the summarising process, but how exactly this is done interactionally remains under researched. This paper reports on the forms and functions of pre-briefings in a major trauma centre in the UK. Taking an interactional approach, we draw on 30 simulated and real-life trauma emergencies (15 from each dataset) and zoom in on the use of pre-briefings, which we consider focal points in the management of trauma emergencies. We show how ad hoc teams negotiate sharedness of future orientation through summarising, synthesising information, and establishing common understanding of the situation. We illustrate the role, characteristics, and structure of pre-briefing sequences that have been evaluated as ‘efficient’ in our data and the impact (in)effective pre-briefings have on teamwork. Our work shows that the key roles in the event own the act of summarising and we problematise the implications for leadership in trauma emergencies. We close the paper with a model for pre-briefing and provide recommendations for clinical practice, arguing that effective pre-briefing practice is teachable.

Keywords: summarising, medical emergencies, interaction analysis, shared/mental models

Procedia PDF Downloads 76

281 3D Numerical Study of Tsunami Loading and Inundation in a Model Urban Area

Authors: A. Bahmanpour, I. Eames, C. Klettner, A. Dimakopoulos

Abstract:

We develop a new set of diagnostic tools to analyze inundation into a model district using three-dimensional CFD simulations, with a view to generating a database against which to test simpler models. A three-dimensional model of Oregon city with different-sized groups of building next to the coastline is used to run calculations of the movement of a long period wave on the shore. The initial and boundary conditions of the off-shore water are set using a nonlinear inverse method based on Eulerian spatial information matching experimental Eulerian time series measurements of water height. The water movement is followed in time, and this enables the pressure distribution on every surface of each building to be followed in a temporal manner. The three-dimensional numerical data set is validated against published experimental work. In the first instance, we use the dataset as a basis to understand the success of reduced models - including 2D shallow water model and reduced 1D models - to predict water heights, flow velocity and forces. This is because models based on the shallow water equations are known to underestimate drag forces after the initial surge of water. The second component is to identify critical flow features, such as hydraulic jumps and choked states, which are flow regions where dissipation occurs and drag forces are large. Finally, we describe how future tsunami inundation models should be modified to account for the complex effects of buildings through drag and blocking.Financial support from UCL and HR Wallingford is greatly appreciated. The authors would like to thank Professor Daniel Cox and Dr. Hyoungsu Park for providing the data on the Seaside Oregon experiment.

Keywords: computational fluid dynamics, extreme events, loading, tsunami

Procedia PDF Downloads 98

280 Recession Rate of Gangotri and Its Tributary Glacier, Garhwal Himalaya, India through Kinematic GPS Survey and Satellite Data

Authors: Harish Bisht, Bahadur Singh Kotlia, Kireet Kumar

Abstract:

In order to reconstruct past retreating rates, total area loss, volume change and shift in snout position were measured through multi-temporal satellite data from 1989 to 2016 and kinematic GPS survey from 2015 to 2016. The results obtained from satellite data indicate that in the last 27 years, Chaturangi glacier snout has retreated 1172.57 ± 38.3 m (average 45.07 ± 4.31 m/year) with a total area and volume loss of 0.626 ± 0.001 sq. Km and 0.139 Km³, respectively. The field measurements through differential global positioning system survey revealed that the annual retreating rate was 22.84 ± 0.05 m/year. The large variations in results derived from both the methods are probably because of higher difference in their accuracy. Snout monitoring of the Gangotri glacier during the ablation season (May to September) in the years 2005 and 2015 reveals that the retreating rate has been comparatively more declined than that shown by the earlier studies. The GPS dataset shows that the average recession rate is 10.26 ± 0.05 m/year. In order to determine the possible causes of decreased retreating rate, a relationship between debris thickness and melt rate was also established by using ablation stakes. The present study concludes that remote sensing method is suitable for large area and long term study, while kinematic GPS is more appropriate for the annual monitoring of retreating rate of glacier snout. The present study also emphasizes on mapping of all the tributary glaciers in order to assess the overall changes in the main glacier system and its health.

Keywords: Chaturangi glacier, Gangotri glacier, glacier snout, kinematic global positioning system, retreat rate

Procedia PDF Downloads 124

279 C-eXpress: A Web-Based Analysis Platform for Comparative Functional Genomics and Proteomics in Human Cancer Cell Line, NCI-60 as an Example

Authors: Chi-Ching Lee, Po-Jung Huang, Kuo-Yang Huang, Petrus Tang

Abstract:

Background: Recent advances in high-throughput research technologies such as new-generation sequencing and multi-dimensional liquid chromatography makes it possible to dissect the complete transcriptome and proteome in a single run for the first time. However, it is almost impossible for many laboratories to handle and analysis these “BIG” data without the support from a bioinformatics team. We aimed to provide a web-based analysis platform for users with only limited knowledge on bio-computing to study the functional genomics and proteomics. Method: We use NCI-60 as an example dataset to demonstrate the power of the web-based analysis platform and data delivering system: C-eXpress takes a simple text file that contain the standard NCBI gene or protein ID and expression levels (rpkm or fold) as input file to generate a distribution map of gene/protein expression levels in a heatmap diagram organized by color gradients. The diagram is hyper-linked to a dynamic html table that allows the users to filter the datasets based on various gene features. A dynamic summary chart is generated automatically after each filtering process. Results: We implemented an integrated database that contain pre-defined annotations such as gene/protein properties (ID, name, length, MW, pI); pathways based on KEGG and GO biological process; subcellular localization based on GO cellular component; functional classification based on GO molecular function, kinase, peptidase and transporter. Multiple ways of sorting of column and rows is also provided for comparative analysis and visualization of multiple samples.

Keywords: cancer, visualization, database, functional annotation

Procedia PDF Downloads 601