Search results for: imbalanced datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 746

Search results for: imbalanced datasets

206 An Enhanced Approach in Validating Analytical Methods Using Tolerance-Based Design of Experiments (DoE)

Authors: Gule Teri

Abstract:

The effective validation of analytical methods forms a crucial component of pharmaceutical manufacturing. However, traditional validation techniques can occasionally fail to fully account for inherent variations within datasets, which may result in inconsistent outcomes. This deficiency in validation accuracy is particularly noticeable when quantifying low concentrations of active pharmaceutical ingredients (APIs), excipients, or impurities, introducing a risk to the reliability of the results and, subsequently, the safety and effectiveness of the pharmaceutical products. In response to this challenge, we introduce an enhanced, tolerance-based Design of Experiments (DoE) approach for the validation of analytical methods. This approach distinctly measures variability with reference to tolerance or design margins, enhancing the precision and trustworthiness of the results. This method provides a systematic, statistically grounded validation technique that improves the truthfulness of results. It offers an essential tool for industry professionals aiming to guarantee the accuracy of their measurements, particularly for low-concentration components. By incorporating this innovative method, pharmaceutical manufacturers can substantially advance their validation processes, subsequently improving the overall quality and safety of their products. This paper delves deeper into the development, application, and advantages of this tolerance-based DoE approach and demonstrates its effectiveness using High-Performance Liquid Chromatography (HPLC) data for verification. This paper also discusses the potential implications and future applications of this method in enhancing pharmaceutical manufacturing practices and outcomes.

Keywords: tolerance-based design, design of experiments, analytical method validation, quality control, biopharmaceutical manufacturing

Procedia PDF Downloads 44
205 Learning Dynamic Representations of Nodes in Temporally Variant Graphs

Authors: Sandra Mitrovic, Gaurav Singh

Abstract:

In many industries, including telecommunications, churn prediction has been a topic of active research. A lot of attention has been drawn on devising the most informative features, and this area of research has gained even more focus with spread of (social) network analytics. The call detail records (CDRs) have been used to construct customer networks and extract potentially useful features. However, to the best of our knowledge, no studies including network features have yet proposed a generic way of representing network information. Instead, ad-hoc and dataset dependent solutions have been suggested. In this work, we build upon a recently presented method (node2vec) to obtain representations for nodes in observed network. The proposed approach is generic and applicable to any network and domain. Unlike node2vec, which assumes a static network, we consider a dynamic and time-evolving network. To account for this, we propose an approach that constructs the feature representation of each node by generating its node2vec representations at different timestamps, concatenating them and finally compressing using an auto-encoder-like method in order to retain reasonably long and informative feature vectors. We test the proposed method on churn prediction task in telco domain. To predict churners at timestamp ts+1, we construct training and testing datasets consisting of feature vectors from time intervals [t1, ts-1] and [t2, ts] respectively, and use traditional supervised classification models like SVM and Logistic Regression. Observed results show the effectiveness of proposed approach as compared to ad-hoc feature selection based approaches and static node2vec.

Keywords: churn prediction, dynamic networks, node2vec, auto-encoders

Procedia PDF Downloads 292
204 hsa-miR-1204 and hsa-miR-639 Prominent Role in Tamoxifen's Molecular Mechanisms on the EMT Phenomenon in Breast Cancer Patients

Authors: Mahsa Taghavi

Abstract:

In the treatment of breast cancer, tamoxifen is a regularly prescribed medication. The effect of tamoxifen on breast cancer patients' EMT pathways was studied. In this study to see if it had any effect on the cancer cells' resistance to tamoxifen and to look for specific miRNAs associated with EMT. In this work, we used continuous and integrated bioinformatics analysis to choose the optimal GEO datasets. Once we had sorted the gene expression profile, we looked at the mechanism of signaling, the ontology of genes, and the protein interaction of each gene. In the end, we used the GEPIA database to confirm the candidate genes. after that, I investigated critical miRNAs related to candidate genes. There were two gene expression profiles that were categorized into two distinct groups. Using the expression profile of genes that were lowered in the EMT pathway, the first group was examined. The second group represented the polar opposite of the first. A total of 253 genes from the first group and 302 genes from the second group were found to be common. Several genes in the first category were linked to cell death, focal adhesion, and cellular aging. Two genes in the second group were linked to cell death, focal adhesion, and cellular aging. distinct cell cycle stages were observed. Finally, proteins such as MYLK, SOCS3, and STAT5B from the first group and BIRC5, PLK1, and RAPGAP1 from the second group were selected as potential candidates linked to tamoxifen's influence on the EMT pathway. hsa-miR-1204 and hsa-miR-639 have a very close relationship with the candidates genes according to the node degrees and betweenness index. With this, the action of tamoxifen on the EMT pathway was better understood. It's important to learn more about how tamoxifen's target genes and proteins work so that we can better understand the drug.

Keywords: tamoxifen, breast cancer, bioinformatics analysis, EMT, miRNAs

Procedia PDF Downloads 103
203 Impact of Social Transfers on Energy Poverty in Turkey

Authors: Julide Yildirim, Nadir Ocal

Abstract:

Even though there are many studies investigating the extent and determinants of poverty, there is paucity of research investigating the issue of energy poverty in Turkey. The aim of this paper is threefold: First to investigate the extend of energy poverty in Turkey by using Household Budget Survey datasets belonging to 2005 - 2016 period. Second, to examine the risk factors for energy poverty. Finally, to assess the impact of social assistance program participation on energy poverty. Existing literature employs alternative methods to measure energy poverty. In this study energy poverty is measured by employing expenditure approach, where people are considered as energy poor if they disburse more than 10 per cent of their income to meet their energy requirements. Empirical results indicate that energy poverty rate is around 20 per cent during the time period under consideration. Since Household Budget Survey panel data is not available for 2005 - 2016 period, a pseudo panel has been constructed. Panel logistic regression method is utilized to determine the risk factors for energy poverty. The empirical results demonstrate that there is a statistically significant impact of work status and education level on energy poverty likelihood. In the final part of the paper the impact of social transfers on energy poverty has been examined by utilizing panel biprobit model, where social transfer participation and energy poverty incidences are jointly modeled. The empirical findings indicate that social transfer program participation reduces energy poverty. The negative association between energy poverty and social transfer program participation is more pronounced in urban areas compared with the rural areas.

Keywords: energy poverty, social transfers, panel data models, Turkey

Procedia PDF Downloads 113
202 Modeling of Sediment Yield and Streamflow of Watershed Basin in the Philippines Using the Soil Water Assessment Tool Model for Watershed Sustainability

Authors: Warda L. Panondi, Norihiro Izumi

Abstract:

Sedimentation is a significant threat to the sustainability of reservoirs and their watershed. In the Philippines, the Pulangi watershed experienced a high sediment loss mainly due to land conversions and plantations that showed critical erosion rates beyond the tolerable limit of -10 ton/ha/yr in all of its sub-basin. From this event, the prediction of runoff volume and sediment yield is essential to examine using the country's soil conservation techniques realistically. In this research, the Pulangi watershed was modeled using the soil water assessment tool (SWAT) to predict its watershed basin's annual runoff and sediment yield. For the calibration and validation of the model, the SWAT-CUP was utilized. The model was calibrated with monthly discharge data for 1990-1993 and validated for 1994-1997. Simultaneously, the sediment yield was calibrated in 2014 and validated in 2015 because of limited observed datasets. Uncertainty analysis and calculation of efficiency indexes were accomplished through the SUFI-2 algorithm. According to the coefficient of determination (R2), Nash Sutcliffe efficiency (NSE), King-Gupta efficiency (KGE), and PBIAS, the calculation of streamflow indicates a good performance for both calibration and validation periods while the sediment yield resulted in a satisfactory performance for both calibration and validation. Therefore, this study was able to identify the most critical sub-basin and severe needs of soil conservation. Furthermore, this study will provide baseline information to prevent floods and landslides and serve as a useful reference for land-use policies and watershed management and sustainability in the Pulangi watershed.

Keywords: Pulangi watershed, sediment yield, streamflow, SWAT model

Procedia PDF Downloads 175
201 Component Level Flood Vulnerability Framework for the United Kingdom

Authors: Mohammad Shoraka, Francesco Preti, Karen Angeles, Raulina Wojtkiewicz, Karthik Ramanathan

Abstract:

Catastrophe modeling has evolved significantly over the last four decades. Verisk introduced its pioneering comprehensive inland flood model tailored for the U.K. in 2008. Over the course of the last 15 years, Verisk has built a suite of physically driven flood models for several countries and regions across the globe. This paper aims to spotlight a selection of these advancements tailored to the development of vulnerability estimation, which forms an integral part of a forthcoming update to Verisk’s U.K. inland flood model. Vulnerability functions are critical to evaluating and robust modeling flood-induced damage to buildings and contents. The subsequent damage assessments then allow for direct quantification of losses for entire building portfolios. Notably, today’s flood loss models more often prioritize enhanced development of hazard characterization, while vulnerability functions often lack sufficient granularity for a robust assessment. This study proposes a novel, engineering-driven, physically based component-level flood vulnerability framework for the U.K. Various aspects of the framework, including component classification and comprehensive cost analysis, meticulously tailored to capture the distinct building characteristics unique to the U.K., will be discussed. This analysis will elucidate how the cost distribution across individual components contributes to translating component-level damage functions into building-level damage functions. Furthermore, a succinct overview of essential datasets employed to gauge building regional vulnerability will be highlighted.

Keywords: catastrophe modeling, inland flood, vulnerability, cost analysis

Procedia PDF Downloads 38
200 STTS-EAD: Improving Spatio-Temporal Learning Based Time Series Prediction via Embedded Anomaly Detection

Authors: Tianhao Zhang, Cen Chen, Dawei Cheng, Yuqi Liang, Yuanyuan Liang

Abstract:

Dealing with anomalies is a crucial preprocessing step for multivariate time series prediction. However, existing methods that separate anomaly preprocessing and model training into two stages have certain limitations. Specifically, these methods fail to leverage auxiliary information necessary to distinguish latent anomalies related to spatiotemporal factors during the preprocessing stage. Instead, they solely rely on data distribution for detection which may lead to incorrect processing of many samples that are beneficial for training. To address this, we propose STTS-EAD, an end-to-end method that seamlessly integrates anomaly detection into the training process of multivariate time series forecasting and aims to improve Spatio-Temporal learning based Time Series prediction via Embedded Anomaly Detection. Our proposed STTS-EAD leverages spatio-temporal information for forecasting and anomaly detection, with the two parts alternately executed and optimized for each other. To the best of our knowledge, STTS-EAD is the first to integrate anomaly detection and forecasting tasks in the training phase for improving the accuracy of multivariate time series forecasting. Extensive experiments on a public stock dataset and two real-world sales datasets from a renowned coffee chain enterprise show that our proposed method can effectively process detected anomalies in the training stage to improve forecasting performance in the inference stage and significantly outperform baselines.

Keywords: multivariate time series, anomaly detection, time series forecasting, spatiotemporal feature learning

Procedia PDF Downloads 13
199 Trading off Accuracy for Speed in Powerdrill

Authors: Filip Buruiana, Alexander Hall, Reimar Hofmann, Thomas Hofmann, Silviu Ganceanu, Alexandru Tudorica

Abstract:

In-memory column-stores make interactive analysis feasible for many big data scenarios. PowerDrill is a system used internally at Google for exploration in logs data. Even though it is a highly parallelized column-store and uses in memory caching, interactive response times cannot be achieved for all datasets (note that it is common to analyze data with 50 billion records in PowerDrill). In this paper, we investigate two orthogonal approaches to optimize performance at the expense of an acceptable loss of accuracy. Both approaches can be implemented as outer wrappers around existing database engines and so they should be easily applicable to other systems. For the first optimization we show that memory is the limiting factor in executing queries at speed and therefore explore possibilities to improve memory efficiency. We adapt some of the theory behind data sketches to reduce the size of particularly expensive fields in our largest tables by a factor of 4.5 when compared to a standard compression algorithm. This saves 37% of the overall memory in PowerDrill and introduces a 0.4% relative error in the 90th percentile for results of queries with the expensive fields. We additionally evaluate the effects of using sampling on accuracy and propose a simple heuristic for annotating individual result-values as accurate (or not). Based on measurements of user behavior in our real production system, we show that these estimates are essential for interpreting intermediate results before final results are available. For a large set of queries this effectively brings down the 95th latency percentile from 30 to 4 seconds.

Keywords: big data, in-memory column-store, high-performance SQL queries, approximate SQL queries

Procedia PDF Downloads 232
198 Spatial and Geostatistical Analysis of Surficial Soils of the Contiguous United States

Authors: Rachel Hetherington, Chad Deering, Ann Maclean, Snehamoy Chatterjee

Abstract:

The U.S. Geological Survey conducted a soil survey and subsequent mineralogical and geochemical analyses of over 4800 samples taken across the contiguous United States between the years 2007 and 2013. At each location, samples were taken from the top 5 cm, the A-horizon, and the C-horizon. Many studies have looked at the correlation between the mineralogical and geochemical content of soils and influencing factors such as parent lithology, climate, soil type, and age, but it seems little has been done in relation to quantifying and assessing the correlation between elements in the soil on a national scale. GIS was used for the mapping and multivariate interpolation of over 40 major and trace elements for surficial soils (0-5 cm depth). Qualitative analysis of the spatial distribution across the U.S. shows distinct patterns amongst elements both within the same periodic groups and within different periodic groups, and therefore with different behavioural characteristics. Results show the emergence of 4 main patterns of high concentration areas: vertically along the west coast, a C-shape formed through the states around Utah and northern Arizona, a V-shape through the Midwest and connecting to the Appalachians, and along the Appalachians. The Band Collection Statistics tool in GIS was used to quantitatively analyse the geochemical raster datasets and calculate a correlation matrix. Patterns emerged, which were not identified in qualitative analysis, many of which are also amongst elements with very different characteristics. Preliminary results show 41 element pairings with a strong positive correlation ( ≥ 0.75). Both qualitative and quantitative analyses on this scale could increase knowledge on the relationships between element distribution and behaviour in surficial soils of the U.S.

Keywords: correlation matrix, geochemical analyses, spatial distribution of elements, surficial soils

Procedia PDF Downloads 105
197 A Survey of Skin Cancer Detection and Classification from Skin Lesion Images Using Deep Learning

Authors: Joseph George, Anne Kotteswara Roa

Abstract:

Skin disease is one of the most common and popular kinds of health issues faced by people nowadays. Skin cancer (SC) is one among them, and its detection relies on the skin biopsy outputs and the expertise of the doctors, but it consumes more time and some inaccurate results. At the early stage, skin cancer detection is a challenging task, and it easily spreads to the whole body and leads to an increase in the mortality rate. Skin cancer is curable when it is detected at an early stage. In order to classify correct and accurate skin cancer, the critical task is skin cancer identification and classification, and it is more based on the cancer disease features such as shape, size, color, symmetry and etc. More similar characteristics are present in many skin diseases; hence it makes it a challenging issue to select important features from a skin cancer dataset images. Hence, the skin cancer diagnostic accuracy is improved by requiring an automated skin cancer detection and classification framework; thereby, the human expert’s scarcity is handled. Recently, the deep learning techniques like Convolutional neural network (CNN), Deep belief neural network (DBN), Artificial neural network (ANN), Recurrent neural network (RNN), and Long and short term memory (LSTM) have been widely used for the identification and classification of skin cancers. This survey reviews different DL techniques for skin cancer identification and classification. The performance metrics such as precision, recall, accuracy, sensitivity, specificity, and F-measures are used to evaluate the effectiveness of SC identification using DL techniques. By using these DL techniques, the classification accuracy increases along with the mitigation of computational complexities and time consumption.

Keywords: skin cancer, deep learning, performance measures, accuracy, datasets

Procedia PDF Downloads 98
196 Integrating Knowledge Distillation of Multiple Strategies

Authors: Min Jindong, Wang Mingxia

Abstract:

With the widespread use of artificial intelligence in life, computer vision, especially deep convolutional neural network models, has developed rapidly. With the increase of the complexity of the real visual target detection task and the improvement of the recognition accuracy, the target detection network model is also very large. The huge deep neural network model is not conducive to deployment on edge devices with limited resources, and the timeliness of network model inference is poor. In this paper, knowledge distillation is used to compress the huge and complex deep neural network model, and the knowledge contained in the complex network model is comprehensively transferred to another lightweight network model. Different from traditional knowledge distillation methods, we propose a novel knowledge distillation that incorporates multi-faceted features, called M-KD. In this paper, when training and optimizing the deep neural network model for target detection, the knowledge of the soft target output of the teacher network in knowledge distillation, the relationship between the layers of the teacher network and the feature attention map of the hidden layer of the teacher network are transferred to the student network as all knowledge. in the model. At the same time, we also introduce an intermediate transition layer, that is, an intermediate guidance layer, between the teacher network and the student network to make up for the huge difference between the teacher network and the student network. Finally, this paper adds an exploration module to the traditional knowledge distillation teacher-student network model. The student network model not only inherits the knowledge of the teacher network but also explores some new knowledge and characteristics. Comprehensive experiments in this paper using different distillation parameter configurations across multiple datasets and convolutional neural network models demonstrate that our proposed new network model achieves substantial improvements in speed and accuracy performance.

Keywords: object detection, knowledge distillation, convolutional network, model compression

Procedia PDF Downloads 250
195 Estimation Atmospheric parameters for Weather Study and Forecast over Equatorial Regions Using Ground-Based Global Position System

Authors: Asmamaw Yehun, Tsegaye Kassa, Addisu Hunegnaw, Martin Vermeer

Abstract:

There are various models to estimate the neutral atmospheric parameter values, such as in-suite and reanalysis datasets from numerical models. Accurate estimated values of the atmospheric parameters are useful for weather forecasting and, climate modeling and monitoring of climate change. Recently, Global Navigation Satellite System (GNSS) measurements have been applied for atmospheric sounding due to its robust data quality and wide horizontal and vertical coverage. The Global Positioning System (GPS) solutions that includes tropospheric parameters constitute a reliable set of data to be assimilated into climate models. The objective of this paper is, to estimate the neutral atmospheric parameters such as Wet Zenith Delay (WZD), Precipitable Water Vapour (PWV) and Total Zenith Delay (TZD) using six selected GPS stations in the equatorial regions, more precisely, the Ethiopian GPS stations from 2012 to 2015 observational data. Based on historic estimated GPS-derived values of PWV, we forecasted the PWV from 2015 to 2030. During data processing and analysis, we applied GAMIT-GLOBK software packages to estimate the atmospheric parameters. In the result, we found that the annual averaged minimum values of PWV are 9.72 mm for IISC and maximum 50.37 mm for BJCO stations. The annual averaged minimum values of WZD are 6 cm for IISC and maximum 31 cm for BDMT stations. In the long series of observations (from 2012 to 2015), we also found that there is a trend and cyclic patterns of WZD, PWV and TZD for all stations.

Keywords: atmosphere, GNSS, neutral atmosphere, precipitable water vapour

Procedia PDF Downloads 32
194 The Targeting Logic of Terrorist Groups in the Sahel

Authors: Mathieu Bere

Abstract:

Al-Qaeda and Islamic State-affiliated groups such as Ja’amat Nusra al Islam Wal Muslimim (JNIM) and the Islamic State-Greater Sahara Faction, which is now part of the Boko Haram splinter group, Islamic State in West Africa, were responsible, between 2018 and 2020, for at least 1.333 violent incidents against both military and civilian targets, including the assassination and kidnapping for ransom of Western citizens in Mali, Burkina Faso and Niger, the Central Sahel. Protecting civilians from the terrorist violence that is now spreading from the Sahel to the coastal countries of West Africa has been very challenging, mainly because of the many unknowns that surround the perpetrators. To contribute to a better protection of civilians in the region, this paper aims to shed light on the motivations and targeting logic of jihadist perpetrators of terrorist violence against civilians in the central Sahel region. To that end, it draws on relevant secondary data retrieved from datasets, the media, and the existing literature, but also on primary data collected through interviews and surveys in Burkina Faso. An analysis of the data with the support of qualitative and statistical analysis software shows that military and rational strategic motives, more than purely ideological or religious motives, have been the main drivers of terrorist violence that strategically targeted government symbols and representatives as well as local leaders in the central Sahel. Behind this targeting logic, the jihadist grand strategy emerges: wiping out the Western-inspired legal, education and governance system in order to replace it with an Islamic, sharia-based political, legal, and educational system.

Keywords: terrorism, jihadism, Sahel, targeting logic

Procedia PDF Downloads 61
193 Bias-Corrected Estimation Methods for Receiver Operating Characteristic Surface

Authors: Khanh To Duc, Monica Chiogna, Gianfranco Adimari

Abstract:

With three diagnostic categories, assessment of the performance of diagnostic tests is achieved by the analysis of the receiver operating characteristic (ROC) surface, which generalizes the ROC curve for binary diagnostic outcomes. The volume under the ROC surface (VUS) is a summary index usually employed for measuring the overall diagnostic accuracy. When the true disease status can be exactly assessed by means of a gold standard (GS) test, unbiased nonparametric estimators of the ROC surface and VUS are easily obtained. In practice, unfortunately, disease status verification via the GS test could be unavailable for all study subjects, due to the expensiveness or invasiveness of the GS test. Thus, often only a subset of patients undergoes disease verification. Statistical evaluations of diagnostic accuracy based only on data from subjects with verified disease status are typically biased. This bias is known as verification bias. Here, we consider the problem of correcting for verification bias when continuous diagnostic tests for three-class disease status are considered. We assume that selection for disease verification does not depend on disease status, given test results and other observed covariates, i.e., we assume that the true disease status, when missing, is missing at random. Under this assumption, we discuss several solutions for ROC surface analysis based on imputation and re-weighting methods. In particular, verification bias-corrected estimators of the ROC surface and of VUS are proposed, namely, full imputation, mean score imputation, inverse probability weighting and semiparametric efficient estimators. Consistency and asymptotic normality of the proposed estimators are established, and their finite sample behavior is investigated by means of Monte Carlo simulation studies. Two illustrations using real datasets are also given.

Keywords: imputation, missing at random, inverse probability weighting, ROC surface analysis

Procedia PDF Downloads 388
192 Self-Supervised Attributed Graph Clustering with Dual Contrastive Loss Constraints

Authors: Lijuan Zhou, Mengqi Wu, Changyong Niu

Abstract:

Attributed graph clustering can utilize the graph topology and node attributes to uncover hidden community structures and patterns in complex networks, aiding in the understanding and analysis of complex systems. Utilizing contrastive learning for attributed graph clustering can effectively exploit meaningful implicit relationships between data. However, existing attributed graph clustering methods based on contrastive learning suffer from the following drawbacks: 1) Complex data augmentation increases computational cost, and inappropriate data augmentation may lead to semantic drift. 2) The selection of positive and negative samples neglects the intrinsic cluster structure learned from graph topology and node attributes. Therefore, this paper proposes a method called self-supervised Attributed Graph Clustering with Dual Contrastive Loss constraints (AGC-DCL). Firstly, Siamese Multilayer Perceptron (MLP) encoders are employed to generate two views separately to avoid complex data augmentation. Secondly, the neighborhood contrastive loss is introduced to constrain node representation using local topological structure while effectively embedding attribute information through attribute reconstruction. Additionally, clustering-oriented contrastive loss is applied to fully utilize clustering information in global semantics for discriminative node representations, regarding the cluster centers from two views as negative samples to fully leverage effective clustering information from different views. Comparative clustering results with existing attributed graph clustering algorithms on six datasets demonstrate the superiority of the proposed method.

Keywords: attributed graph clustering, contrastive learning, clustering-oriented, self-supervised learning

Procedia PDF Downloads 16
191 Implications of Agricultural Subsidies Since Green Revolution: A Case Study of Indian Punjab

Authors: Kriti Jain, Sucha Singh Gill

Abstract:

Subsidies have been a major part of agricultural policies around the world, and more extensively since the green revolution in developing countries, for the sake of attaining higher agricultural productivity and achieving food security. But entrenched subsidies lead to distorted incentives and promote inefficiencies in the agricultural sector, threatening the viability of these very subsidies and sustainability of the agricultural production systems, posing a threat to the livelihood of farmers and laborers dependent on it. This paper analyzes the economic and ecological sustainability implications of prolonged input and output subsidies in agriculture by studying the case of Indian Punjab, an agriculturally developed state responsible for ensuring food security in the country when it was facing a major food crisis. The paper focuses specifically on the environmentally unsustainable cropping pattern changes as a result of Minimum Support Price (MSP) and assured procurement and on the resource use efficiency and cost implications of power subsidy for irrigation in Punjab. The study is based on an analysis of both secondary and primary data sources. Using secondary data, a time series analysis was done to capture the changes in Punjab’s cropping pattern, water table depth, fertilizer consumption, and electrification of agriculture. This has been done to examine the role of price and output support adopted to encourage the adoption of green revolution technology in changing the cropping structure of the state, resulting in increased input use intensities (especially groundwater and fertilizers), which harms the ecological balance and decreases factor productivity. Evaluation of electrification of Punjab agriculture helped evaluate the trend in electricity productivity of agriculture and how free power imposed further pressure on the extant agricultural ecosystem. Using data collected from a primary survey of 320 farmers in Punjab, the extent of wasteful application of groundwater irrigation, water productivity of output, electricity usage, and cost of irrigation driven electricity subsidy to the exchequer were estimated for the dominant cropping pattern amongst farmers. The main findings of the study revealed how because of a subsidy has driven agricultural framework, Punjab has lost area under agro climatically suitable and staple crops and moved towards a paddy-wheat cropping system, that is gnawing away the state’s natural resources like water table has been declining at a significant rate of 25 cms per year since 1975-76, and excessive and imbalanced fertilizer usage has led to declining soil fertility in the state. With electricity-driven tubewells as the major source of irrigation within a regime of free electricity and water-intensive crop cultivation, there is both wasteful application of irrigation water and electricity in the cultivation of paddy crops, burning an unproductive hole in the exchequer’s pocket. There is limited access to both agricultural extension services and water-conserving technology, along with policy imbalance, keeping farmers in an intensive and unsustainable production system. Punjab agriculture is witnessing diminishing returns to factor, which under the business-as-usual scenario, will soon enter the phase of negative returns to factor.

Keywords: cropping pattern, electrification, subsidy, sustainability

Procedia PDF Downloads 157
190 Reducing the Imbalance Penalty Through Artificial Intelligence Methods Geothermal Production Forecasting: A Case Study for Turkey

Authors: Hayriye Anıl, Görkem Kar

Abstract:

In addition to being rich in renewable energy resources, Turkey is one of the countries that promise potential in geothermal energy production with its high installed power, cheapness, and sustainability. Increasing imbalance penalties become an economic burden for organizations since geothermal generation plants cannot maintain the balance of supply and demand due to the inadequacy of the production forecasts given in the day-ahead market. A better production forecast reduces the imbalance penalties of market participants and provides a better imbalance in the day ahead market. In this study, using machine learning, deep learning, and, time series methods, the total generation of the power plants belonging to Zorlu Natural Electricity Generation, which has a high installed capacity in terms of geothermal, was estimated for the first one and two weeks of March, then the imbalance penalties were calculated with these estimates and compared with the real values. These modeling operations were carried out on two datasets, the basic dataset and the dataset created by extracting new features from this dataset with the feature engineering method. According to the results, Support Vector Regression from traditional machine learning models outperformed other models and exhibited the best performance. In addition, the estimation results in the feature engineering dataset showed lower error rates than the basic dataset. It has been concluded that the estimated imbalance penalty calculated for the selected organization is lower than the actual imbalance penalty, optimum and profitable accounts.

Keywords: machine learning, deep learning, time series models, feature engineering, geothermal energy production forecasting

Procedia PDF Downloads 78
189 C-eXpress: A Web-Based Analysis Platform for Comparative Functional Genomics and Proteomics in Human Cancer Cell Line, NCI-60 as an Example

Authors: Chi-Ching Lee, Po-Jung Huang, Kuo-Yang Huang, Petrus Tang

Abstract:

Background: Recent advances in high-throughput research technologies such as new-generation sequencing and multi-dimensional liquid chromatography makes it possible to dissect the complete transcriptome and proteome in a single run for the first time. However, it is almost impossible for many laboratories to handle and analysis these “BIG” data without the support from a bioinformatics team. We aimed to provide a web-based analysis platform for users with only limited knowledge on bio-computing to study the functional genomics and proteomics. Method: We use NCI-60 as an example dataset to demonstrate the power of the web-based analysis platform and data delivering system: C-eXpress takes a simple text file that contain the standard NCBI gene or protein ID and expression levels (rpkm or fold) as input file to generate a distribution map of gene/protein expression levels in a heatmap diagram organized by color gradients. The diagram is hyper-linked to a dynamic html table that allows the users to filter the datasets based on various gene features. A dynamic summary chart is generated automatically after each filtering process. Results: We implemented an integrated database that contain pre-defined annotations such as gene/protein properties (ID, name, length, MW, pI); pathways based on KEGG and GO biological process; subcellular localization based on GO cellular component; functional classification based on GO molecular function, kinase, peptidase and transporter. Multiple ways of sorting of column and rows is also provided for comparative analysis and visualization of multiple samples.

Keywords: cancer, visualization, database, functional annotation

Procedia PDF Downloads 588
188 Verification of Satellite and Observation Measurements to Build Solar Energy Projects in North Africa

Authors: Samy A. Khalil, U. Ali Rahoma

Abstract:

The measurements of solar radiation, satellite data has been routinely utilize to estimate solar energy. However, the temporal coverage of satellite data has some limits. The reanalysis, also known as "retrospective analysis" of the atmosphere's parameters, is produce by fusing the output of NWP (Numerical Weather Prediction) models with observation data from a variety of sources, including ground, and satellite, ship, and aircraft observation. The result is a comprehensive record of the parameters affecting weather and climate. The effectiveness of reanalysis datasets (ERA-5) for North Africa was evaluate against high-quality surfaces measured using statistical analysis. Estimating the distribution of global solar radiation (GSR) over five chosen areas in North Africa through ten-years during the period time from 2011 to 2020. To investigate seasonal change in dataset performance, a seasonal statistical analysis was conduct, which showed a considerable difference in mistakes throughout the year. By altering the temporal resolution of the data used for comparison, the performance of the dataset is alter. Better performance is indicate by the data's monthly mean values, but data accuracy is degraded. Solar resource assessment and power estimation are discuses using the ERA-5 solar radiation data. The average values of mean bias error (MBE), root mean square error (RMSE) and mean absolute error (MAE) of the reanalysis data of solar radiation vary from 0.079 to 0.222, 0.055 to 0.178, and 0.0145 to 0.198 respectively during the period time in the present research. The correlation coefficient (R2) varies from 0.93 to 99% during the period time in the present research. This research's objective is to provide a reliable representation of the world's solar radiation to aid in the use of solar energy in all sectors.

Keywords: solar energy, ERA-5 analysis data, global solar radiation, North Africa

Procedia PDF Downloads 74
187 Non-Invasive Data Extraction from Machine Display Units Using Video Analytics

Authors: Ravneet Kaur, Joydeep Acharya, Sudhanshu Gaur

Abstract:

Artificial Intelligence (AI) has the potential to transform manufacturing by improving shop floor processes such as production, maintenance and quality. However, industrial datasets are notoriously difficult to extract in a real-time, streaming fashion thus, negating potential AI benefits. The main example is some specialized industrial controllers that are operated by custom software which complicates the process of connecting them to an Information Technology (IT) based data acquisition network. Security concerns may also limit direct physical access to these controllers for data acquisition. To connect the Operational Technology (OT) data stored in these controllers to an AI application in a secure, reliable and available way, we propose a novel Industrial IoT (IIoT) solution in this paper. In this solution, we demonstrate how video cameras can be installed in a factory shop floor to continuously obtain images of the controller HMIs. We propose image pre-processing to segment the HMI into regions of streaming data and regions of fixed meta-data. We then evaluate the performance of multiple Optical Character Recognition (OCR) technologies such as Tesseract and Google vision to recognize the streaming data and test it for typical factory HMIs and realistic lighting conditions. Finally, we use the meta-data to match the OCR output with the temporal, domain-dependent context of the data to improve the accuracy of the output. Our IIoT solution enables reliable and efficient data extraction which will improve the performance of subsequent AI applications.

Keywords: human machine interface, industrial internet of things, internet of things, optical character recognition, video analytics

Procedia PDF Downloads 85
186 The Application of Participatory Social Media in Collaborative Planning: A Systematic Review

Authors: Yujie Chen , Zhen Li

Abstract:

In the context of planning transformation, how to promote public participation in the formulation and implementation of collaborative planning has been the focused issue of discussion. However, existing studies have often been case-specific or focused on a specific design field, leaving the role of participatory social media (PSM) in urban collaborative planning generally questioned. A systematic database search was conducted in December 2019. Articles and projects were eligible if they reported a quantitative empirical study applying participatory social media in the collaborative planning process (a prospective, retrospective, experimental, longitudinal research, or collective actions in planning practices). Twenty studies and seven projects were included in the review. Findings showed that social media are generally applied in public spatial behavior, transportation behavior, and community planning fields, with new technologies and new datasets. PSM has provided a new platform for participatory design, decision analysis, and collaborative negotiation most widely used in participatory design. Findings extracted several existing forms of PSM. PSM mainly act as three roles: the language of decision-making for communication, study mode for spatial evaluation, and decision agenda for interactive decision support. Three optimization content of PSM were recognized, including improving participatory scale, improvement of the grass-root organization, and promotion of politics. However, basically, participants only could provide information and comment through PSM in the future collaborative planning process, therefore the issues of low data response rate, poor spatial data quality, and participation sustainability issues worth more attention and solutions.

Keywords: participatory social media, collaborative planning, planning workshop, application mode

Procedia PDF Downloads 107
185 High Resolution Sandstone Connectivity Modelling: Implications for Outcrop Geological and Its Analog Studies

Authors: Numair Ahmed Siddiqui, Abdul Hadi bin Abd Rahman, Chow Weng Sum, Wan Ismail Wan Yousif, Asif Zameer, Joel Ben-Awal

Abstract:

Advances in data capturing from outcrop studies have made possible the acquisition of high-resolution digital data, offering improved and economical reservoir modelling methods. Terrestrial laser scanning utilizing LiDAR (Light detection and ranging) provides a new method to build outcrop based reservoir models, which provide a crucial piece of information to understand heterogeneities in sandstone facies with high-resolution images and data set. This study presents the detailed application of outcrop based sandstone facies connectivity model by acquiring information gathered from traditional fieldwork and processing detailed digital point-cloud data from LiDAR to develop an intermediate small-scale reservoir sandstone facies model of the Miocene Sandakan Formation, Sabah, East Malaysia. The software RiScan pro (v1.8.0) was used in digital data collection and post-processing with an accuracy of 0.01 m and point acquisition rate of up to 10,000 points per second. We provide an accurate and descriptive workflow to triangulate point-clouds of different sets of sandstone facies with well-marked top and bottom boundaries in conjunction with field sedimentology. This will provide highly accurate qualitative sandstone facies connectivity model which is a challenge to obtain from subsurface datasets (i.e., seismic and well data). Finally, by applying this workflow, we can build an outcrop based static connectivity model, which can be an analogue to subsurface reservoir studies.

Keywords: LiDAR, outcrop, high resolution, sandstone faceis, connectivity model

Procedia PDF Downloads 179
184 Meanings and Concepts of Standardization in Systems Medicine

Authors: Imme Petersen, Wiebke Sick, Regine Kollek

Abstract:

In systems medicine, high-throughput technologies produce large amounts of data on different biological and pathological processes, including (disturbed) gene expressions, metabolic pathways and signaling. The large volume of data of different types, stored in separate databases and often located at different geographical sites have posed new challenges regarding data handling and processing. Tools based on bioinformatics have been developed to resolve the upcoming problems of systematizing, standardizing and integrating the various data. However, the heterogeneity of data gathered at different levels of biological complexity is still a major challenge in data analysis. To build multilayer disease modules, large and heterogeneous data of disease-related information (e.g., genotype, phenotype, environmental factors) are correlated. Therefore, a great deal of attention in systems medicine has been put on data standardization, primarily to retrieve and combine large, heterogeneous datasets into standardized and incorporated forms and structures. However, this data-centred concept of standardization in systems medicine is contrary to the debate in science and technology studies (STS) on standardization that rather emphasizes the dynamics, contexts and negotiations of standard operating procedures. Based on empirical work on research consortia that explore the molecular profile of diseases to establish systems medical approaches in the clinic in Germany, we trace how standardized data are processed and shaped by bioinformatics tools, how scientists using such data in research perceive such standard operating procedures and which consequences for knowledge production (e.g. modeling) arise from it. Hence, different concepts and meanings of standardization are explored to get a deeper insight into standard operating procedures not only in systems medicine, but also beyond.

Keywords: data, science and technology studies (STS), standardization, systems medicine

Procedia PDF Downloads 313
183 Pregnant Women in Substance Abuse: Transition of Characteristics and Mining of Association from Teds-a 2011 to 2018

Authors: Md Tareq Ferdous Khan, Shrabanti Mazumder, MB Rao

Abstract:

Background: Substance use during pregnancy is a longstanding public health problem that results in severe consequences for pregnant women and fetuses. Methods: Eight (2011-2018) datasets on pregnant women’s admissions are extracted from TEDS-A. Distributions of sociodemographic, substance abuse behaviors, and clinical characteristics are constructed and compared over the years for trends by the Cochran-Armitage test. Market basket analysis is used in mining the association among polysubstance abuse. Results: Over the years, pregnant woman admissions as the percentage of total and female admissions remain stable, where total annual admissions range from 1.54 to about 2 million with the female share of 33.30% to 35.61%. Pregnant women aged 21-29, 12 or more years of education, white race, unemployed, holding independent living status are among the most vulnerable. Concerns prevail on a significant number of polysubstance users, young age at first use, frequency of daily users, and records of prior admissions (60%). Trends of abused primary substances show a significant rise in heroin (66%) and methamphetamine (46%) over the years, although the latest year shows a considerable downturn. On the other hand, significant decreasing patterns are evident for alcohol (43%), marijuana or hashish (24%), cocaine or crack (23%), other opiates or synthetics (36%), and benzodiazepines (29%). Basket analysis reveals some patterns of co-occurrence of substances consistent over the years. Conclusions: This comprehensive study can work as a reference to identify the most vulnerable groups based on their characteristics and deal with the most hazardous substances from their evidence of co-occurrence.

Keywords: basket analysis, pregnant women, substance abuse, trend analysis

Procedia PDF Downloads 170
182 A Comprehensive Study and Evaluation on Image Fashion Features Extraction

Authors: Yuanchao Sang, Zhihao Gong, Longsheng Chen, Long Chen

Abstract:

Clothing fashion represents a human’s aesthetic appreciation towards everyday outfits and appetite for fashion, and it reflects the development of status in society, humanity, and economics. However, modelling fashion by machine is extremely challenging because fashion is too abstract to be efficiently described by machines. Even human beings can hardly reach a consensus about fashion. In this paper, we are dedicated to answering a fundamental fashion-related problem: what image feature best describes clothing fashion? To address this issue, we have designed and evaluated various image features, ranging from traditional low-level hand-crafted features to mid-level style awareness features to various current popular deep neural network-based features, which have shown state-of-the-art performance in various vision tasks. In summary, we tested the following 9 feature representations: color, texture, shape, style, convolutional neural networks (CNNs), CNNs with distance metric learning (CNNs&DML), AutoEncoder, CNNs with multiple layer combination (CNNs&MLC) and CNNs with dynamic feature clustering (CNNs&DFC). Finally, we validated the performance of these features on two publicly available datasets. Quantitative and qualitative experimental results on both intra-domain and inter-domain fashion clothing image retrieval showed that deep learning based feature representations far outweigh traditional hand-crafted feature representation. Additionally, among all deep learning based methods, CNNs with explicit feature clustering performs best, which shows feature clustering is essential for discriminative fashion feature representation.

Keywords: convolutional neural network, feature representation, image processing, machine modelling

Procedia PDF Downloads 113
181 A Framework for Auditing Multilevel Models Using Explainability Methods

Authors: Debarati Bhaumik, Diptish Dey

Abstract:

Multilevel models, increasingly deployed in industries such as insurance, food production, and entertainment within functions such as marketing and supply chain management, need to be transparent and ethical. Applications usually result in binary classification within groups or hierarchies based on a set of input features. Using open-source datasets, we demonstrate that popular explainability methods, such as SHAP and LIME, consistently underperform inaccuracy when interpreting these models. They fail to predict the order of feature importance, the magnitudes, and occasionally even the nature of the feature contribution (negative versus positive contribution to the outcome). Besides accuracy, the computational intractability of SHAP for binomial classification is a cause of concern. For transparent and ethical applications of these hierarchical statistical models, sound audit frameworks need to be developed. In this paper, we propose an audit framework for technical assessment of multilevel regression models focusing on three aspects: (i) model assumptions & statistical properties, (ii) model transparency using different explainability methods, and (iii) discrimination assessment. To this end, we undertake a quantitative approach and compare intrinsic model methods with SHAP and LIME. The framework comprises a shortlist of KPIs, such as PoCE (Percentage of Correct Explanations) and MDG (Mean Discriminatory Gap) per feature, for each of these three aspects. A traffic light risk assessment method is furthermore coupled to these KPIs. The audit framework will assist regulatory bodies in performing conformity assessments of AI systems using multilevel binomial classification models at businesses. It will also benefit businesses deploying multilevel models to be future-proof and aligned with the European Commission’s proposed Regulation on Artificial Intelligence.

Keywords: audit, multilevel model, model transparency, model explainability, discrimination, ethics

Procedia PDF Downloads 65
180 Urban Road Network Connectivity and Accessibility Analysis Using RS and GIS: A Case Study of Chandannagar City

Authors: Joy Ghosh, Debasmita Biswas

Abstract:

The road network of any area is the most important indicator of regional planning. For proper utilization of urban road networks, the structural parameters such as connectivity and accessibility should be analyzed and evaluated. This paper aims to explain the application of GIS on urban road network connectivity and accessibility analysis with a case study of Chandannagar City. This paper has been made to analyze the road network connectivity through various connectivity measurements like the total number of nodes and links, Cyclomatic Number, Alpha Index, Beta Index, Gamma index, Eta index, Pi index, Theta Index, and Aggregated Transport Score, Road Density based on existing road network in Chandannagar city in India. Accessibility is measured through the shortest Path Matrix, associate Number, and Shimbel Index. Various urban services, such as schools, banks, Hospitals, petrol pumps, ATMs, police stations, theatres, parks, etc., are considered for the accessibility analysis for each ward. This paper also highlights the relationship between urban land use/ land cover (LULC) and urban road network and population density using various spatial and statistical measurements. The datasets were collected through a field survey of 33 wards of the Chandannagar Municipal Corporation area, and the secondary data were collected through an open street map and satellite image of LANDSAT8 OLI & TIRS from USGS. Chandannagar was actually once a French colony, and at that time, various sort of planning was applied, but now Chandannagar city continues to grow haphazardly because that city is facing some problems; the knowledge gained from this paper helps to create a more efficient and accessible road network. Therefore, it would be suggested that some wards need to improve their connectivity and accessibility for the future growth and development of Chandannagar.

Keywords: accessibility, connectivity, transport, road network

Procedia PDF Downloads 34
179 Recurrent Neural Networks for Complex Survival Models

Authors: Pius Marthin, Nihal Ata Tutkun

Abstract:

Survival analysis has become one of the paramount procedures in the modeling of time-to-event data. When we encounter complex survival problems, the traditional approach remains limited in accounting for the complex correlational structure between the covariates and the outcome due to the strong assumptions that limit the inference and prediction ability of the resulting models. Several studies exist on the deep learning approach to survival modeling; moreover, the application for the case of complex survival problems still needs to be improved. In addition, the existing models need to address the data structure's complexity fully and are subject to noise and redundant information. In this study, we design a deep learning technique (CmpXRnnSurv_AE) that obliterates the limitations imposed by traditional approaches and addresses the above issues to jointly predict the risk-specific probabilities and survival function for recurrent events with competing risks. We introduce the component termed Risks Information Weights (RIW) as an attention mechanism to compute the weighted cumulative incidence function (WCIF) and an external auto-encoder (ExternalAE) as a feature selector to extract complex characteristics among the set of covariates responsible for the cause-specific events. We train our model using synthetic and real data sets and employ the appropriate metrics for complex survival models for evaluation. As benchmarks, we selected both traditional and machine learning models and our model demonstrates better performance across all datasets.

Keywords: cumulative incidence function (CIF), risk information weight (RIW), autoencoders (AE), survival analysis, recurrent events with competing risks, recurrent neural networks (RNN), long short-term memory (LSTM), self-attention, multilayers perceptrons (MLPs)

Procedia PDF Downloads 58
178 Geospatial Curve Fitting Methods for Disease Mapping of Tuberculosis in Eastern Cape Province, South Africa

Authors: Davies Obaromi, Qin Yongsong, James Ndege

Abstract:

To interpolate scattered or regularly distributed data, there are imprecise or exact methods. However, there are some of these methods that could be used for interpolating data in a regular grid and others in an irregular grid. In spatial epidemiology, it is important to examine how a disease prevalence rates are distributed in space, and how they relate with each other within a defined distance and direction. In this study, for the geographic and graphic representation of the disease prevalence, linear and biharmonic spline methods were implemented in MATLAB, and used to identify, localize and compare for smoothing in the distribution patterns of tuberculosis (TB) in Eastern Cape Province. The aim of this study is to produce a more “smooth” graphical disease map for TB prevalence patterns by a 3-D curve fitting techniques, especially the biharmonic splines that can suppress noise easily, by seeking a least-squares fit rather than exact interpolation. The datasets are represented generally as a 3D or XYZ triplets, where X and Y are the spatial coordinates and Z is the variable of interest and in this case, TB counts in the province. This smoothing spline is a method of fitting a smooth curve to a set of noisy observations using a spline function, and it has also become the conventional method for its high precision, simplicity and flexibility. Surface and contour plots are produced for the TB prevalence at the provincial level for 2012 – 2015. From the results, the general outlook of all the fittings showed a systematic pattern in the distribution of TB cases in the province and this is consistent with some spatial statistical analyses carried out in the province. This new method is rarely used in disease mapping applications, but it has a superior advantage to be assessed at subjective locations rather than only on a rectangular grid as seen in most traditional GIS methods of geospatial analyses.

Keywords: linear, biharmonic splines, tuberculosis, South Africa

Procedia PDF Downloads 217
177 Damage Identification in Reinforced Concrete Beams Using Modal Parameters and Their Formulation

Authors: Ali Al-Ghalib, Fouad Mohammad

Abstract:

The identification of damage in reinforced concrete structures subjected to incremental cracking performance exploiting vibration data is recognized as a challenging topic in the published and heavily cited literature. Therefore, this paper attempts to shine light on the extent of dynamic methods when applied to reinforced concrete beams simulated with various scenarios of defects. For this purpose, three different reinforced concrete beams are tested through the course of the study. The three beams are loaded statically to failure in incremental successive load cycles and later rehabilitated. After each static load stage, the beams are tested under free-free support condition using experimental modal analysis. The beams were all of the same length and cross-sectional area (2.0x0.14x0.09)m, but they were different in concrete compressive strength and the type of damage presented. The experimental modal parameters as damage identification parameters were showed computationally expensive, time consuming and require substantial inputs and considerable expertise. Nonetheless, they were proved plausible for the condition monitoring of the current case study as well as structural changes in the course of progressive loads. It was accentuated that a satisfactory localization and quantification for structural changes (Level 2 and Level 3 of damage identification problem) can only be achieved reasonably through considering frequencies and mode shapes of a system in a proper analytical model. A convenient post analysis process for various datasets of vibration measurements for the three beams is conducted in order to extract, check and correlate the basic modal parameters; namely, natural frequency, modal damping and mode shapes. The results of the extracted modal parameters and their combination are utilized and discussed in this research as quantification parameters.

Keywords: experimental modal analysis, damage identification, structural health monitoring, reinforced concrete beam

Procedia PDF Downloads 238