Search results for: datasets
241 INRAM-3DCNN: Multi-Scale Convolutional Neural Network Based on Residual and Attention Module Combined with Multilayer Perceptron for Hyperspectral Image Classification
Authors: Jianhong Xiang, Rui Sun, Linyu Wang
Abstract:
In recent years, due to the continuous improvement of deep learning theory, Convolutional Neural Network (CNN) has played a great superior performance in the research of Hyperspectral Image (HSI) classification. Since HSI has rich spatial-spectral information, only utilizing a single dimensional or single size convolutional kernel will limit the detailed feature information received by CNN, which limits the classification accuracy of HSI. In this paper, we design a multi-scale CNN with MLP based on residual and attention modules (INRAM-3DCNN) for the HSI classification task. We propose to use multiple 3D convolutional kernels to extract the packet feature information and fully learn the spatial-spectral features of HSI while designing residual 3D convolutional branches to avoid the decline of classification accuracy due to network degradation. Secondly, we also design the 2D Inception module with a joint channel attention mechanism to quickly extract key spatial feature information at different scales of HSI and reduce the complexity of the 3D model. Due to the high parallel processing capability and nonlinear global action of the Multilayer Perceptron (MLP), we use it in combination with the previous CNN structure for the final classification process. The experimental results on two HSI datasets show that the proposed INRAM-3DCNN method has superior classification performance and can perform the classification task excellently.Keywords: INRAM-3DCNN, residual, channel attention, hyperspectral image classification
Procedia PDF Downloads 80240 Classifier for Liver Ultrasound Images
Authors: Soumya Sajjan
Abstract:
Liver cancer is the most common cancer disease worldwide in men and women, and is one of the few cancers still on the rise. Liver disease is the 4th leading cause of death. According to new NHS (National Health Service) figures, deaths from liver diseases have reached record levels, rising by 25% in less than a decade; heavy drinking, obesity, and hepatitis are believed to be behind the rise. In this study, we focus on Development of Diagnostic Classifier for Ultrasound liver lesion. Ultrasound (US) Sonography is an easy-to-use and widely popular imaging modality because of its ability to visualize many human soft tissues/organs without any harmful effect. This paper will provide an overview of underlying concepts, along with algorithms for processing of liver ultrasound images Naturaly, Ultrasound liver lesion images are having more spackle noise. Developing classifier for ultrasound liver lesion image is a challenging task. We approach fully automatic machine learning system for developing this classifier. First, we segment the liver image by calculating the textural features from co-occurrence matrix and run length method. For classification, Support Vector Machine is used based on the risk bounds of statistical learning theory. The textural features for different features methods are given as input to the SVM individually. Performance analysis train and test datasets carried out separately using SVM Model. Whenever an ultrasonic liver lesion image is given to the SVM classifier system, the features are calculated, classified, as normal and diseased liver lesion. We hope the result will be helpful to the physician to identify the liver cancer in non-invasive method.Keywords: segmentation, Support Vector Machine, ultrasound liver lesion, co-occurance Matrix
Procedia PDF Downloads 412239 Image Ranking to Assist Object Labeling for Training Detection Models
Authors: Tonislav Ivanov, Oleksii Nedashkivskyi, Denis Babeshko, Vadim Pinskiy, Matthew Putman
Abstract:
Training a machine learning model for object detection that generalizes well is known to benefit from a training dataset with diverse examples. However, training datasets usually contain many repeats of common examples of a class and lack rarely seen examples. This is due to the process commonly used during human annotation where a person would proceed sequentially through a list of images labeling a sufficiently high total number of examples. Instead, the method presented involves an active process where, after the initial labeling of several images is completed, the next subset of images for labeling is selected by an algorithm. This process of algorithmic image selection and manual labeling continues in an iterative fashion. The algorithm used for the image selection is a deep learning algorithm, based on the U-shaped architecture, which quantifies the presence of unseen data in each image in order to find images that contain the most novel examples. Moreover, the location of the unseen data in each image is highlighted, aiding the labeler in spotting these examples. Experiments performed using semiconductor wafer data show that labeling a subset of the data, curated by this algorithm, resulted in a model with a better performance than a model produced from sequentially labeling the same amount of data. Also, similar performance is achieved compared to a model trained on exhaustive labeling of the whole dataset. Overall, the proposed approach results in a dataset that has a diverse set of examples per class as well as more balanced classes, which proves beneficial when training a deep learning model.Keywords: computer vision, deep learning, object detection, semiconductor
Procedia PDF Downloads 136238 Simulation of Climatic Change Effects on the Potential Fishing Zones of Dorado Fish (Coryphaena hippurus L.) in the Colombian Pacific under Scenarios RCP Using CMIP5 Model
Authors: Adriana Martínez-Arias, John Josephraj Selvaraj, Luis Octavio González-Salcedo
Abstract:
In the Colombian Pacific, Dorado fish (Coryphaena hippurus L.) fisheries is of great commercial interest. However, its habitat and fisheries may be affected by climatic change especially by the actual increase in sea surface temperature. Hence, it is of interest to study the dynamics of these species fishing zones. In this study, we developed Artificial Neural Networks (ANN) models to predict Catch per Unit Effort (CPUE) as an indicator of species abundance. The model was based on four oceanographic variables (Chlorophyll a, Sea Surface Temperature, Sea Level Anomaly and Bathymetry) derived from satellite data. CPUE datasets for model training and cross-validation were obtained from logbooks of commercial fishing vessel. Sea surface Temperature for Colombian Pacific were projected under Representative Concentration Pathway (RCP) scenarios 4.5 and 8.5 using Coupled Model Intercomparison Project Phase 5 (CMIP5) and CPUE maps were created. Our results indicated that an increase in sea surface temperature reduces the potential fishing zones of this species in the Colombian Pacific. We conclude that ANN is a reliable tool for simulation of climate change effects on the potential fishing zones. This research opens a future agenda for other species that have been affected by climate change.Keywords: climatic change, artificial neural networks, dorado fish, CPUE
Procedia PDF Downloads 244237 A Fast Community Detection Algorithm
Authors: Chung-Yuan Huang, Yu-Hsiang Fu, Chuen-Tsai Sun
Abstract:
Community detection represents an important data-mining tool for analyzing and understanding real-world complex network structures and functions. We believe that at least four criteria determine the appropriateness of a community detection algorithm: (a) it produces useable normalized mutual information (NMI) and modularity results for social networks, (b) it overcomes resolution limitation problems associated with synthetic networks, (c) it produces good NMI results and performance efficiency for Lancichinetti-Fortunato-Radicchi (LFR) benchmark networks, and (d) it produces good modularity and performance efficiency for large-scale real-world complex networks. To our knowledge, no existing community detection algorithm meets all four criteria. In this paper, we describe a simple hierarchical arc-merging (HAM) algorithm that uses network topologies and rule-based arc-merging strategies to identify community structures that satisfy the criteria. We used five well-studied social network datasets and eight sets of LFR benchmark networks to validate the ground-truth community correctness of HAM, eight large-scale real-world complex networks to measure its performance efficiency, and two synthetic networks to determine its susceptibility to resolution limitation problems. Our results indicate that the proposed HAM algorithm is capable of providing satisfactory performance efficiency and that HAM-identified communities were close to ground-truth communities in social and LFR benchmark networks while overcoming resolution limitation problems.Keywords: complex network, social network, community detection, network hierarchy
Procedia PDF Downloads 228236 Q-Map: Clinical Concept Mining from Clinical Documents
Authors: Sheikh Shams Azam, Manoj Raju, Venkatesh Pagidimarri, Vamsi Kasivajjala
Abstract:
Over the past decade, there has been a steep rise in the data-driven analysis in major areas of medicine, such as clinical decision support system, survival analysis, patient similarity analysis, image analytics etc. Most of the data in the field are well-structured and available in numerical or categorical formats which can be used for experiments directly. But on the opposite end of the spectrum, there exists a wide expanse of data that is intractable for direct analysis owing to its unstructured nature which can be found in the form of discharge summaries, clinical notes, procedural notes which are in human written narrative format and neither have any relational model nor any standard grammatical structure. An important step in the utilization of these texts for such studies is to transform and process the data to retrieve structured information from the haystack of irrelevant data using information retrieval and data mining techniques. To address this problem, the authors present Q-Map in this paper, which is a simple yet robust system that can sift through massive datasets with unregulated formats to retrieve structured information aggressively and efficiently. It is backed by an effective mining technique which is based on a string matching algorithm that is indexed on curated knowledge sources, that is both fast and configurable. The authors also briefly examine its comparative performance with MetaMap, one of the most reputed tools for medical concepts retrieval and present the advantages the former displays over the latter.Keywords: information retrieval, unified medical language system, syntax based analysis, natural language processing, medical informatics
Procedia PDF Downloads 133235 Infrastructure Change Monitoring Using Multitemporal Multispectral Satellite Images
Authors: U. Datta
Abstract:
The main objective of this study is to find a suitable approach to monitor the land infrastructure growth over a period of time using multispectral satellite images. Bi-temporal change detection method is unable to indicate the continuous change occurring over a long period of time. To achieve this objective, the approach used here estimates a statistical model from series of multispectral image data over a long period of time, assuming there is no considerable change during that time period and then compare it with the multispectral image data obtained at a later time. The change is estimated pixel-wise. Statistical composite hypothesis technique is used for estimating pixel based change detection in a defined region. The generalized likelihood ratio test (GLRT) is used to detect the changed pixel from probabilistic estimated model of the corresponding pixel. The changed pixel is detected assuming that the images have been co-registered prior to estimation. To minimize error due to co-registration, 8-neighborhood pixels around the pixel under test are also considered. The multispectral images from Sentinel-2 and Landsat-8 from 2015 to 2018 are used for this purpose. There are different challenges in this method. First and foremost challenge is to get quite a large number of datasets for multivariate distribution modelling. A large number of images are always discarded due to cloud coverage. Due to imperfect modelling there will be high probability of false alarm. Overall conclusion that can be drawn from this work is that the probabilistic method described in this paper has given some promising results, which need to be pursued further.Keywords: co-registration, GLRT, infrastructure growth, multispectral, multitemporal, pixel-based change detection
Procedia PDF Downloads 135234 Membrane-Localized Mutations as Predictors of Checkpoint Blockade Efficacy in Cancer
Authors: Zoe Goldberger, Priscilla S. Briquez, Jeffrey A. Hubbell
Abstract:
Tumor cells have mutations resulting from genetic instability that the immune system can actively recognize. Immune checkpoint immunotherapy (ICI) is commonly used in the clinic to re-activate immune reactions against mutated proteins, called neoantigens, resulting in tumor remission in cancer patients. However, only around 20% of patients show durable response to ICI. While tumor mutational burden (TMB) has been approved by the Food and Drug Administration (FDA) as a criterion for ICI therapy, the relevance of the subcellular localizations of the mutated proteins within the tumor cell has not been investigated. Here, we hypothesized that localization of mutations impacts the effect of immune responsiveness to ICI. We analyzed publicly available tumor mutation sequencing data of ICI treated patients from 3 independent datasets. We extracted the subcellular localization from the UniProtKB/Swiss-Prot database and quantified the proportion of membrane, cytoplasmic, nuclear, or secreted mutations per patient. We analyzed this information in relation to response to ICI treatment and overall survival of patients showing with 1722 ICI-treated patients that high mutational burden localized at the membrane (mTMB), correlate with ICI responsiveness, and improved overall survival in multiple cancer types. We anticipate that our results will ameliorate predictability of cancer patient response to ICI with potential implications in clinical guidelines to tailor ICI treatment. This would not only increase patient survival for those receiving ICI, but also patients’ quality of life by reducing the number of patients enduring non-effective ICI treatments.Keywords: cancer, immunotherapy, membrane neoantigens, efficacy prediction, biomarkers
Procedia PDF Downloads 109233 The Influence of Air Temperature Controls in Estimation of Air Temperature over Homogeneous Terrain
Authors: Fariza Yunus, Jasmee Jaafar, Zamalia Mahmud, Nurul Nisa’ Khairul Azmi, Nursalleh K. Chang, Nursalleh K. Chang
Abstract:
Variation of air temperature from one place to another is cause by air temperature controls. In general, the most important control of air temperature is elevation. Another significant independent variable in estimating air temperature is the location of meteorological stations. Distances to coastline and land use type are also contributed to significant variations in the air temperature. On the other hand, in homogeneous terrain direct interpolation of discrete points of air temperature work well to estimate air temperature values in un-sampled area. In this process the estimation is solely based on discrete points of air temperature. However, this study presents that air temperature controls also play significant roles in estimating air temperature over homogenous terrain of Peninsular Malaysia. An Inverse Distance Weighting (IDW) interpolation technique was adopted to generate continuous data of air temperature. This study compared two different datasets, observed mean monthly data of T, and estimation error of T–T’, where T’ estimated value from a multiple regression model. The multiple regression model considered eight independent variables of elevation, latitude, longitude, coastline, and four land use types of water bodies, forest, agriculture and build up areas, to represent the role of air temperature controls. Cross validation analysis was conducted to review accuracy of the estimation values. Final results show, estimation values of T–T’ produced lower errors for mean monthly mean air temperature over homogeneous terrain in Peninsular Malaysia.Keywords: air temperature control, interpolation analysis, peninsular Malaysia, regression model, air temperature
Procedia PDF Downloads 374232 An Enhanced Approach in Validating Analytical Methods Using Tolerance-Based Design of Experiments (DoE)
Authors: Gule Teri
Abstract:
The effective validation of analytical methods forms a crucial component of pharmaceutical manufacturing. However, traditional validation techniques can occasionally fail to fully account for inherent variations within datasets, which may result in inconsistent outcomes. This deficiency in validation accuracy is particularly noticeable when quantifying low concentrations of active pharmaceutical ingredients (APIs), excipients, or impurities, introducing a risk to the reliability of the results and, subsequently, the safety and effectiveness of the pharmaceutical products. In response to this challenge, we introduce an enhanced, tolerance-based Design of Experiments (DoE) approach for the validation of analytical methods. This approach distinctly measures variability with reference to tolerance or design margins, enhancing the precision and trustworthiness of the results. This method provides a systematic, statistically grounded validation technique that improves the truthfulness of results. It offers an essential tool for industry professionals aiming to guarantee the accuracy of their measurements, particularly for low-concentration components. By incorporating this innovative method, pharmaceutical manufacturers can substantially advance their validation processes, subsequently improving the overall quality and safety of their products. This paper delves deeper into the development, application, and advantages of this tolerance-based DoE approach and demonstrates its effectiveness using High-Performance Liquid Chromatography (HPLC) data for verification. This paper also discusses the potential implications and future applications of this method in enhancing pharmaceutical manufacturing practices and outcomes.Keywords: tolerance-based design, design of experiments, analytical method validation, quality control, biopharmaceutical manufacturing
Procedia PDF Downloads 80231 Learning Dynamic Representations of Nodes in Temporally Variant Graphs
Authors: Sandra Mitrovic, Gaurav Singh
Abstract:
In many industries, including telecommunications, churn prediction has been a topic of active research. A lot of attention has been drawn on devising the most informative features, and this area of research has gained even more focus with spread of (social) network analytics. The call detail records (CDRs) have been used to construct customer networks and extract potentially useful features. However, to the best of our knowledge, no studies including network features have yet proposed a generic way of representing network information. Instead, ad-hoc and dataset dependent solutions have been suggested. In this work, we build upon a recently presented method (node2vec) to obtain representations for nodes in observed network. The proposed approach is generic and applicable to any network and domain. Unlike node2vec, which assumes a static network, we consider a dynamic and time-evolving network. To account for this, we propose an approach that constructs the feature representation of each node by generating its node2vec representations at different timestamps, concatenating them and finally compressing using an auto-encoder-like method in order to retain reasonably long and informative feature vectors. We test the proposed method on churn prediction task in telco domain. To predict churners at timestamp ts+1, we construct training and testing datasets consisting of feature vectors from time intervals [t1, ts-1] and [t2, ts] respectively, and use traditional supervised classification models like SVM and Logistic Regression. Observed results show the effectiveness of proposed approach as compared to ad-hoc feature selection based approaches and static node2vec.Keywords: churn prediction, dynamic networks, node2vec, auto-encoders
Procedia PDF Downloads 315230 hsa-miR-1204 and hsa-miR-639 Prominent Role in Tamoxifen's Molecular Mechanisms on the EMT Phenomenon in Breast Cancer Patients
Authors: Mahsa Taghavi
Abstract:
In the treatment of breast cancer, tamoxifen is a regularly prescribed medication. The effect of tamoxifen on breast cancer patients' EMT pathways was studied. In this study to see if it had any effect on the cancer cells' resistance to tamoxifen and to look for specific miRNAs associated with EMT. In this work, we used continuous and integrated bioinformatics analysis to choose the optimal GEO datasets. Once we had sorted the gene expression profile, we looked at the mechanism of signaling, the ontology of genes, and the protein interaction of each gene. In the end, we used the GEPIA database to confirm the candidate genes. after that, I investigated critical miRNAs related to candidate genes. There were two gene expression profiles that were categorized into two distinct groups. Using the expression profile of genes that were lowered in the EMT pathway, the first group was examined. The second group represented the polar opposite of the first. A total of 253 genes from the first group and 302 genes from the second group were found to be common. Several genes in the first category were linked to cell death, focal adhesion, and cellular aging. Two genes in the second group were linked to cell death, focal adhesion, and cellular aging. distinct cell cycle stages were observed. Finally, proteins such as MYLK, SOCS3, and STAT5B from the first group and BIRC5, PLK1, and RAPGAP1 from the second group were selected as potential candidates linked to tamoxifen's influence on the EMT pathway. hsa-miR-1204 and hsa-miR-639 have a very close relationship with the candidates genes according to the node degrees and betweenness index. With this, the action of tamoxifen on the EMT pathway was better understood. It's important to learn more about how tamoxifen's target genes and proteins work so that we can better understand the drug.Keywords: tamoxifen, breast cancer, bioinformatics analysis, EMT, miRNAs
Procedia PDF Downloads 129229 Impact of Social Transfers on Energy Poverty in Turkey
Authors: Julide Yildirim, Nadir Ocal
Abstract:
Even though there are many studies investigating the extent and determinants of poverty, there is paucity of research investigating the issue of energy poverty in Turkey. The aim of this paper is threefold: First to investigate the extend of energy poverty in Turkey by using Household Budget Survey datasets belonging to 2005 - 2016 period. Second, to examine the risk factors for energy poverty. Finally, to assess the impact of social assistance program participation on energy poverty. Existing literature employs alternative methods to measure energy poverty. In this study energy poverty is measured by employing expenditure approach, where people are considered as energy poor if they disburse more than 10 per cent of their income to meet their energy requirements. Empirical results indicate that energy poverty rate is around 20 per cent during the time period under consideration. Since Household Budget Survey panel data is not available for 2005 - 2016 period, a pseudo panel has been constructed. Panel logistic regression method is utilized to determine the risk factors for energy poverty. The empirical results demonstrate that there is a statistically significant impact of work status and education level on energy poverty likelihood. In the final part of the paper the impact of social transfers on energy poverty has been examined by utilizing panel biprobit model, where social transfer participation and energy poverty incidences are jointly modeled. The empirical findings indicate that social transfer program participation reduces energy poverty. The negative association between energy poverty and social transfer program participation is more pronounced in urban areas compared with the rural areas.Keywords: energy poverty, social transfers, panel data models, Turkey
Procedia PDF Downloads 143228 Modeling of Sediment Yield and Streamflow of Watershed Basin in the Philippines Using the Soil Water Assessment Tool Model for Watershed Sustainability
Authors: Warda L. Panondi, Norihiro Izumi
Abstract:
Sedimentation is a significant threat to the sustainability of reservoirs and their watershed. In the Philippines, the Pulangi watershed experienced a high sediment loss mainly due to land conversions and plantations that showed critical erosion rates beyond the tolerable limit of -10 ton/ha/yr in all of its sub-basin. From this event, the prediction of runoff volume and sediment yield is essential to examine using the country's soil conservation techniques realistically. In this research, the Pulangi watershed was modeled using the soil water assessment tool (SWAT) to predict its watershed basin's annual runoff and sediment yield. For the calibration and validation of the model, the SWAT-CUP was utilized. The model was calibrated with monthly discharge data for 1990-1993 and validated for 1994-1997. Simultaneously, the sediment yield was calibrated in 2014 and validated in 2015 because of limited observed datasets. Uncertainty analysis and calculation of efficiency indexes were accomplished through the SUFI-2 algorithm. According to the coefficient of determination (R2), Nash Sutcliffe efficiency (NSE), King-Gupta efficiency (KGE), and PBIAS, the calculation of streamflow indicates a good performance for both calibration and validation periods while the sediment yield resulted in a satisfactory performance for both calibration and validation. Therefore, this study was able to identify the most critical sub-basin and severe needs of soil conservation. Furthermore, this study will provide baseline information to prevent floods and landslides and serve as a useful reference for land-use policies and watershed management and sustainability in the Pulangi watershed.Keywords: Pulangi watershed, sediment yield, streamflow, SWAT model
Procedia PDF Downloads 210227 FDX1, a Cuproptosis-Related Gene, Identified as a Potential Target for Human Ovarian Aging
Authors: Li-Te Lin, Chia-Jung Li, Kuan-Hao Tsui
Abstract:
Cuproptosis, a newly identified cell death mechanism, has attracted attention for its association with various diseases. However, the genetic interplay between cuproptosis and ovarian aging remains largely unexplored. This study aims to address this gap by analyzing datasets related to ovarian aging and cuproptosis. Spatial transcriptome analyses were conducted in the ovaries of both young and aged female mice to elucidate the role of FDX1. Comprehensive bioinformatics analyses, facilitated by R software, identified FDX1 as a potential cuproptosis-related gene with implications for ovarian aging. Clinical infertility biopsies were examined to validate these findings, showing consistent results in elderly infertile patients. Furthermore, pharmacogenomic analyses of ovarian cell lines explored the intricate association between FDX1 expression levels and sensitivity to specific small molecule drugs. Spatial transcriptome analyses revealed a significant reduction in FDX1 expression in aging ovaries, supported by consistent findings in biopsies from elderly infertile patients. Pharmacogenomic investigations indicated that modulating FDX1 could influence drug responses in ovarian-related therapies. This study pioneers the identification of FDX1 as a cuproptosis-related gene linked to ovarian aging. These findings not only contribute to understanding the mechanisms of ovarian aging but also position FDX1 as a potential diagnostic biomarker and therapeutic target. Further research may establish FDX1's pivotal role in advancing precision medicine and therapies for ovarian-related conditions.Keywords: cuproptosis, FDX1, ovarian aging, biomarker
Procedia PDF Downloads 41226 Trading off Accuracy for Speed in Powerdrill
Authors: Filip Buruiana, Alexander Hall, Reimar Hofmann, Thomas Hofmann, Silviu Ganceanu, Alexandru Tudorica
Abstract:
In-memory column-stores make interactive analysis feasible for many big data scenarios. PowerDrill is a system used internally at Google for exploration in logs data. Even though it is a highly parallelized column-store and uses in memory caching, interactive response times cannot be achieved for all datasets (note that it is common to analyze data with 50 billion records in PowerDrill). In this paper, we investigate two orthogonal approaches to optimize performance at the expense of an acceptable loss of accuracy. Both approaches can be implemented as outer wrappers around existing database engines and so they should be easily applicable to other systems. For the first optimization we show that memory is the limiting factor in executing queries at speed and therefore explore possibilities to improve memory efficiency. We adapt some of the theory behind data sketches to reduce the size of particularly expensive fields in our largest tables by a factor of 4.5 when compared to a standard compression algorithm. This saves 37% of the overall memory in PowerDrill and introduces a 0.4% relative error in the 90th percentile for results of queries with the expensive fields. We additionally evaluate the effects of using sampling on accuracy and propose a simple heuristic for annotating individual result-values as accurate (or not). Based on measurements of user behavior in our real production system, we show that these estimates are essential for interpreting intermediate results before final results are available. For a large set of queries this effectively brings down the 95th latency percentile from 30 to 4 seconds.Keywords: big data, in-memory column-store, high-performance SQL queries, approximate SQL queries
Procedia PDF Downloads 259225 Spatial and Geostatistical Analysis of Surficial Soils of the Contiguous United States
Authors: Rachel Hetherington, Chad Deering, Ann Maclean, Snehamoy Chatterjee
Abstract:
The U.S. Geological Survey conducted a soil survey and subsequent mineralogical and geochemical analyses of over 4800 samples taken across the contiguous United States between the years 2007 and 2013. At each location, samples were taken from the top 5 cm, the A-horizon, and the C-horizon. Many studies have looked at the correlation between the mineralogical and geochemical content of soils and influencing factors such as parent lithology, climate, soil type, and age, but it seems little has been done in relation to quantifying and assessing the correlation between elements in the soil on a national scale. GIS was used for the mapping and multivariate interpolation of over 40 major and trace elements for surficial soils (0-5 cm depth). Qualitative analysis of the spatial distribution across the U.S. shows distinct patterns amongst elements both within the same periodic groups and within different periodic groups, and therefore with different behavioural characteristics. Results show the emergence of 4 main patterns of high concentration areas: vertically along the west coast, a C-shape formed through the states around Utah and northern Arizona, a V-shape through the Midwest and connecting to the Appalachians, and along the Appalachians. The Band Collection Statistics tool in GIS was used to quantitatively analyse the geochemical raster datasets and calculate a correlation matrix. Patterns emerged, which were not identified in qualitative analysis, many of which are also amongst elements with very different characteristics. Preliminary results show 41 element pairings with a strong positive correlation ( ≥ 0.75). Both qualitative and quantitative analyses on this scale could increase knowledge on the relationships between element distribution and behaviour in surficial soils of the U.S.Keywords: correlation matrix, geochemical analyses, spatial distribution of elements, surficial soils
Procedia PDF Downloads 126224 A Survey of Skin Cancer Detection and Classification from Skin Lesion Images Using Deep Learning
Authors: Joseph George, Anne Kotteswara Roa
Abstract:
Skin disease is one of the most common and popular kinds of health issues faced by people nowadays. Skin cancer (SC) is one among them, and its detection relies on the skin biopsy outputs and the expertise of the doctors, but it consumes more time and some inaccurate results. At the early stage, skin cancer detection is a challenging task, and it easily spreads to the whole body and leads to an increase in the mortality rate. Skin cancer is curable when it is detected at an early stage. In order to classify correct and accurate skin cancer, the critical task is skin cancer identification and classification, and it is more based on the cancer disease features such as shape, size, color, symmetry and etc. More similar characteristics are present in many skin diseases; hence it makes it a challenging issue to select important features from a skin cancer dataset images. Hence, the skin cancer diagnostic accuracy is improved by requiring an automated skin cancer detection and classification framework; thereby, the human expert’s scarcity is handled. Recently, the deep learning techniques like Convolutional neural network (CNN), Deep belief neural network (DBN), Artificial neural network (ANN), Recurrent neural network (RNN), and Long and short term memory (LSTM) have been widely used for the identification and classification of skin cancers. This survey reviews different DL techniques for skin cancer identification and classification. The performance metrics such as precision, recall, accuracy, sensitivity, specificity, and F-measures are used to evaluate the effectiveness of SC identification using DL techniques. By using these DL techniques, the classification accuracy increases along with the mitigation of computational complexities and time consumption.Keywords: skin cancer, deep learning, performance measures, accuracy, datasets
Procedia PDF Downloads 129223 Unsupervised Echocardiogram View Detection via Autoencoder-Based Representation Learning
Authors: Andrea Treviño Gavito, Diego Klabjan, Sanjiv J. Shah
Abstract:
Echocardiograms serve as pivotal resources for clinicians in diagnosing cardiac conditions, offering non-invasive insights into a heart’s structure and function. When echocardiographic studies are conducted, no standardized labeling of the acquired views is performed. Employing machine learning algorithms for automated echocardiogram view detection has emerged as a promising solution to enhance efficiency in echocardiogram use for diagnosis. However, existing approaches predominantly rely on supervised learning, necessitating labor-intensive expert labeling. In this paper, we introduce a fully unsupervised echocardiographic view detection framework that leverages convolutional autoencoders to obtain lower dimensional representations and the K-means algorithm for clustering them into view-related groups. Our approach focuses on discriminative patches from echocardiographic frames. Additionally, we propose a trainable inverse average layer to optimize decoding of average operations. By integrating both public and proprietary datasets, we obtain a marked improvement in model performance when compared to utilizing a proprietary dataset alone. Our experiments show boosts of 15.5% in accuracy and 9.0% in the F-1 score for frame-based clustering, and 25.9% in accuracy and 19.8% in the F-1 score for view-based clustering. Our research highlights the potential of unsupervised learning methodologies and the utilization of open-sourced data in addressing the complexities of echocardiogram interpretation, paving the way for more accurate and efficient cardiac diagnoses.Keywords: artificial intelligence, echocardiographic view detection, echocardiography, machine learning, self-supervised representation learning, unsupervised learning
Procedia PDF Downloads 36222 Integrating Knowledge Distillation of Multiple Strategies
Authors: Min Jindong, Wang Mingxia
Abstract:
With the widespread use of artificial intelligence in life, computer vision, especially deep convolutional neural network models, has developed rapidly. With the increase of the complexity of the real visual target detection task and the improvement of the recognition accuracy, the target detection network model is also very large. The huge deep neural network model is not conducive to deployment on edge devices with limited resources, and the timeliness of network model inference is poor. In this paper, knowledge distillation is used to compress the huge and complex deep neural network model, and the knowledge contained in the complex network model is comprehensively transferred to another lightweight network model. Different from traditional knowledge distillation methods, we propose a novel knowledge distillation that incorporates multi-faceted features, called M-KD. In this paper, when training and optimizing the deep neural network model for target detection, the knowledge of the soft target output of the teacher network in knowledge distillation, the relationship between the layers of the teacher network and the feature attention map of the hidden layer of the teacher network are transferred to the student network as all knowledge. in the model. At the same time, we also introduce an intermediate transition layer, that is, an intermediate guidance layer, between the teacher network and the student network to make up for the huge difference between the teacher network and the student network. Finally, this paper adds an exploration module to the traditional knowledge distillation teacher-student network model. The student network model not only inherits the knowledge of the teacher network but also explores some new knowledge and characteristics. Comprehensive experiments in this paper using different distillation parameter configurations across multiple datasets and convolutional neural network models demonstrate that our proposed new network model achieves substantial improvements in speed and accuracy performance.Keywords: object detection, knowledge distillation, convolutional network, model compression
Procedia PDF Downloads 278221 Estimation Atmospheric parameters for Weather Study and Forecast over Equatorial Regions Using Ground-Based Global Position System
Authors: Asmamaw Yehun, Tsegaye Kassa, Addisu Hunegnaw, Martin Vermeer
Abstract:
There are various models to estimate the neutral atmospheric parameter values, such as in-suite and reanalysis datasets from numerical models. Accurate estimated values of the atmospheric parameters are useful for weather forecasting and, climate modeling and monitoring of climate change. Recently, Global Navigation Satellite System (GNSS) measurements have been applied for atmospheric sounding due to its robust data quality and wide horizontal and vertical coverage. The Global Positioning System (GPS) solutions that includes tropospheric parameters constitute a reliable set of data to be assimilated into climate models. The objective of this paper is, to estimate the neutral atmospheric parameters such as Wet Zenith Delay (WZD), Precipitable Water Vapour (PWV) and Total Zenith Delay (TZD) using six selected GPS stations in the equatorial regions, more precisely, the Ethiopian GPS stations from 2012 to 2015 observational data. Based on historic estimated GPS-derived values of PWV, we forecasted the PWV from 2015 to 2030. During data processing and analysis, we applied GAMIT-GLOBK software packages to estimate the atmospheric parameters. In the result, we found that the annual averaged minimum values of PWV are 9.72 mm for IISC and maximum 50.37 mm for BJCO stations. The annual averaged minimum values of WZD are 6 cm for IISC and maximum 31 cm for BDMT stations. In the long series of observations (from 2012 to 2015), we also found that there is a trend and cyclic patterns of WZD, PWV and TZD for all stations.Keywords: atmosphere, GNSS, neutral atmosphere, precipitable water vapour
Procedia PDF Downloads 61220 The Targeting Logic of Terrorist Groups in the Sahel
Authors: Mathieu Bere
Abstract:
Al-Qaeda and Islamic State-affiliated groups such as Ja’amat Nusra al Islam Wal Muslimim (JNIM) and the Islamic State-Greater Sahara Faction, which is now part of the Boko Haram splinter group, Islamic State in West Africa, were responsible, between 2018 and 2020, for at least 1.333 violent incidents against both military and civilian targets, including the assassination and kidnapping for ransom of Western citizens in Mali, Burkina Faso and Niger, the Central Sahel. Protecting civilians from the terrorist violence that is now spreading from the Sahel to the coastal countries of West Africa has been very challenging, mainly because of the many unknowns that surround the perpetrators. To contribute to a better protection of civilians in the region, this paper aims to shed light on the motivations and targeting logic of jihadist perpetrators of terrorist violence against civilians in the central Sahel region. To that end, it draws on relevant secondary data retrieved from datasets, the media, and the existing literature, but also on primary data collected through interviews and surveys in Burkina Faso. An analysis of the data with the support of qualitative and statistical analysis software shows that military and rational strategic motives, more than purely ideological or religious motives, have been the main drivers of terrorist violence that strategically targeted government symbols and representatives as well as local leaders in the central Sahel. Behind this targeting logic, the jihadist grand strategy emerges: wiping out the Western-inspired legal, education and governance system in order to replace it with an Islamic, sharia-based political, legal, and educational system.Keywords: terrorism, jihadism, Sahel, targeting logic
Procedia PDF Downloads 87219 Bias-Corrected Estimation Methods for Receiver Operating Characteristic Surface
Authors: Khanh To Duc, Monica Chiogna, Gianfranco Adimari
Abstract:
With three diagnostic categories, assessment of the performance of diagnostic tests is achieved by the analysis of the receiver operating characteristic (ROC) surface, which generalizes the ROC curve for binary diagnostic outcomes. The volume under the ROC surface (VUS) is a summary index usually employed for measuring the overall diagnostic accuracy. When the true disease status can be exactly assessed by means of a gold standard (GS) test, unbiased nonparametric estimators of the ROC surface and VUS are easily obtained. In practice, unfortunately, disease status verification via the GS test could be unavailable for all study subjects, due to the expensiveness or invasiveness of the GS test. Thus, often only a subset of patients undergoes disease verification. Statistical evaluations of diagnostic accuracy based only on data from subjects with verified disease status are typically biased. This bias is known as verification bias. Here, we consider the problem of correcting for verification bias when continuous diagnostic tests for three-class disease status are considered. We assume that selection for disease verification does not depend on disease status, given test results and other observed covariates, i.e., we assume that the true disease status, when missing, is missing at random. Under this assumption, we discuss several solutions for ROC surface analysis based on imputation and re-weighting methods. In particular, verification bias-corrected estimators of the ROC surface and of VUS are proposed, namely, full imputation, mean score imputation, inverse probability weighting and semiparametric efficient estimators. Consistency and asymptotic normality of the proposed estimators are established, and their finite sample behavior is investigated by means of Monte Carlo simulation studies. Two illustrations using real datasets are also given.Keywords: imputation, missing at random, inverse probability weighting, ROC surface analysis
Procedia PDF Downloads 416218 Self-Supervised Attributed Graph Clustering with Dual Contrastive Loss Constraints
Authors: Lijuan Zhou, Mengqi Wu, Changyong Niu
Abstract:
Attributed graph clustering can utilize the graph topology and node attributes to uncover hidden community structures and patterns in complex networks, aiding in the understanding and analysis of complex systems. Utilizing contrastive learning for attributed graph clustering can effectively exploit meaningful implicit relationships between data. However, existing attributed graph clustering methods based on contrastive learning suffer from the following drawbacks: 1) Complex data augmentation increases computational cost, and inappropriate data augmentation may lead to semantic drift. 2) The selection of positive and negative samples neglects the intrinsic cluster structure learned from graph topology and node attributes. Therefore, this paper proposes a method called self-supervised Attributed Graph Clustering with Dual Contrastive Loss constraints (AGC-DCL). Firstly, Siamese Multilayer Perceptron (MLP) encoders are employed to generate two views separately to avoid complex data augmentation. Secondly, the neighborhood contrastive loss is introduced to constrain node representation using local topological structure while effectively embedding attribute information through attribute reconstruction. Additionally, clustering-oriented contrastive loss is applied to fully utilize clustering information in global semantics for discriminative node representations, regarding the cluster centers from two views as negative samples to fully leverage effective clustering information from different views. Comparative clustering results with existing attributed graph clustering algorithms on six datasets demonstrate the superiority of the proposed method.Keywords: attributed graph clustering, contrastive learning, clustering-oriented, self-supervised learning
Procedia PDF Downloads 54217 Reducing the Imbalance Penalty Through Artificial Intelligence Methods Geothermal Production Forecasting: A Case Study for Turkey
Authors: Hayriye Anıl, Görkem Kar
Abstract:
In addition to being rich in renewable energy resources, Turkey is one of the countries that promise potential in geothermal energy production with its high installed power, cheapness, and sustainability. Increasing imbalance penalties become an economic burden for organizations since geothermal generation plants cannot maintain the balance of supply and demand due to the inadequacy of the production forecasts given in the day-ahead market. A better production forecast reduces the imbalance penalties of market participants and provides a better imbalance in the day ahead market. In this study, using machine learning, deep learning, and, time series methods, the total generation of the power plants belonging to Zorlu Natural Electricity Generation, which has a high installed capacity in terms of geothermal, was estimated for the first one and two weeks of March, then the imbalance penalties were calculated with these estimates and compared with the real values. These modeling operations were carried out on two datasets, the basic dataset and the dataset created by extracting new features from this dataset with the feature engineering method. According to the results, Support Vector Regression from traditional machine learning models outperformed other models and exhibited the best performance. In addition, the estimation results in the feature engineering dataset showed lower error rates than the basic dataset. It has been concluded that the estimated imbalance penalty calculated for the selected organization is lower than the actual imbalance penalty, optimum and profitable accounts.Keywords: machine learning, deep learning, time series models, feature engineering, geothermal energy production forecasting
Procedia PDF Downloads 110216 Layer-Level Feature Aggregation Network for Effective Semantic Segmentation of Fine-Resolution Remote Sensing Images
Authors: Wambugu Naftaly, Ruisheng Wang, Zhijun Wang
Abstract:
Models based on convolutional neural networks (CNNs), in conjunction with Transformer, have excelled in semantic segmentation, a fundamental task for intelligent Earth observation using remote sensing (RS) imagery. Nonetheless, tokenization in the Transformer model undermines object structures and neglects inner-patch local information, whereas CNNs are unable to simulate global semantics due to limitations inherent in their convolutional local properties. The integration of the two methodologies facilitates effective global-local feature aggregation and interactions, potentially enhancing segmentation results. Inspired by the merits of CNNs and Transformers, we introduce a layer-level feature aggregation network (LLFA-Net) to address semantic segmentation of fine-resolution remote sensing (FRRS) images for land cover classification. The simple yet efficient system employs a transposed unit that hierarchically utilizes dense high-level semantics and sufficient spatial information from various encoder layers through a layer-level feature aggregation module (LLFAM) and models global contexts using structured Transformer blocks. Furthermore, the decoder aggregates resultant features to generate rich semantic representation. Extensive experiments on two public land cover datasets demonstrate that our proposed framework exhibits competitive performance relative to the most recent frameworks in semantic segmentation.Keywords: land cover mapping, semantic segmentation, remote sensing, vision transformer networks, deep learning
Procedia PDF Downloads 7215 C-eXpress: A Web-Based Analysis Platform for Comparative Functional Genomics and Proteomics in Human Cancer Cell Line, NCI-60 as an Example
Authors: Chi-Ching Lee, Po-Jung Huang, Kuo-Yang Huang, Petrus Tang
Abstract:
Background: Recent advances in high-throughput research technologies such as new-generation sequencing and multi-dimensional liquid chromatography makes it possible to dissect the complete transcriptome and proteome in a single run for the first time. However, it is almost impossible for many laboratories to handle and analysis these “BIG” data without the support from a bioinformatics team. We aimed to provide a web-based analysis platform for users with only limited knowledge on bio-computing to study the functional genomics and proteomics. Method: We use NCI-60 as an example dataset to demonstrate the power of the web-based analysis platform and data delivering system: C-eXpress takes a simple text file that contain the standard NCBI gene or protein ID and expression levels (rpkm or fold) as input file to generate a distribution map of gene/protein expression levels in a heatmap diagram organized by color gradients. The diagram is hyper-linked to a dynamic html table that allows the users to filter the datasets based on various gene features. A dynamic summary chart is generated automatically after each filtering process. Results: We implemented an integrated database that contain pre-defined annotations such as gene/protein properties (ID, name, length, MW, pI); pathways based on KEGG and GO biological process; subcellular localization based on GO cellular component; functional classification based on GO molecular function, kinase, peptidase and transporter. Multiple ways of sorting of column and rows is also provided for comparative analysis and visualization of multiple samples.Keywords: cancer, visualization, database, functional annotation
Procedia PDF Downloads 619214 Verification of Satellite and Observation Measurements to Build Solar Energy Projects in North Africa
Authors: Samy A. Khalil, U. Ali Rahoma
Abstract:
The measurements of solar radiation, satellite data has been routinely utilize to estimate solar energy. However, the temporal coverage of satellite data has some limits. The reanalysis, also known as "retrospective analysis" of the atmosphere's parameters, is produce by fusing the output of NWP (Numerical Weather Prediction) models with observation data from a variety of sources, including ground, and satellite, ship, and aircraft observation. The result is a comprehensive record of the parameters affecting weather and climate. The effectiveness of reanalysis datasets (ERA-5) for North Africa was evaluate against high-quality surfaces measured using statistical analysis. Estimating the distribution of global solar radiation (GSR) over five chosen areas in North Africa through ten-years during the period time from 2011 to 2020. To investigate seasonal change in dataset performance, a seasonal statistical analysis was conduct, which showed a considerable difference in mistakes throughout the year. By altering the temporal resolution of the data used for comparison, the performance of the dataset is alter. Better performance is indicate by the data's monthly mean values, but data accuracy is degraded. Solar resource assessment and power estimation are discuses using the ERA-5 solar radiation data. The average values of mean bias error (MBE), root mean square error (RMSE) and mean absolute error (MAE) of the reanalysis data of solar radiation vary from 0.079 to 0.222, 0.055 to 0.178, and 0.0145 to 0.198 respectively during the period time in the present research. The correlation coefficient (R2) varies from 0.93 to 99% during the period time in the present research. This research's objective is to provide a reliable representation of the world's solar radiation to aid in the use of solar energy in all sectors.Keywords: solar energy, ERA-5 analysis data, global solar radiation, North Africa
Procedia PDF Downloads 99213 Prompt Design for Code Generation in Data Analysis Using Large Language Models
Authors: Lu Song Ma Li Zhi
Abstract:
With the rapid advancement of artificial intelligence technology, large language models (LLMs) have become a milestone in the field of natural language processing, demonstrating remarkable capabilities in semantic understanding, intelligent question answering, and text generation. These models are gradually penetrating various industries, particularly showcasing significant application potential in the data analysis domain. However, retraining or fine-tuning these models requires substantial computational resources and ample downstream task datasets, which poses a significant challenge for many enterprises and research institutions. Without modifying the internal parameters of the large models, prompt engineering techniques can rapidly adapt these models to new domains. This paper proposes a prompt design strategy aimed at leveraging the capabilities of large language models to automate the generation of data analysis code. By carefully designing prompts, data analysis requirements can be described in natural language, which the large language model can then understand and convert into executable data analysis code, thereby greatly enhancing the efficiency and convenience of data analysis. This strategy not only lowers the threshold for using large models but also significantly improves the accuracy and efficiency of data analysis. Our approach includes requirements for the precision of natural language descriptions, coverage of diverse data analysis needs, and mechanisms for immediate feedback and adjustment. Experimental results show that with this prompt design strategy, large language models perform exceptionally well in multiple data analysis tasks, generating high-quality code and significantly shortening the data analysis cycle. This method provides an efficient and convenient tool for the data analysis field and demonstrates the enormous potential of large language models in practical applications.Keywords: large language models, prompt design, data analysis, code generation
Procedia PDF Downloads 43212 Non-Invasive Data Extraction from Machine Display Units Using Video Analytics
Authors: Ravneet Kaur, Joydeep Acharya, Sudhanshu Gaur
Abstract:
Artificial Intelligence (AI) has the potential to transform manufacturing by improving shop floor processes such as production, maintenance and quality. However, industrial datasets are notoriously difficult to extract in a real-time, streaming fashion thus, negating potential AI benefits. The main example is some specialized industrial controllers that are operated by custom software which complicates the process of connecting them to an Information Technology (IT) based data acquisition network. Security concerns may also limit direct physical access to these controllers for data acquisition. To connect the Operational Technology (OT) data stored in these controllers to an AI application in a secure, reliable and available way, we propose a novel Industrial IoT (IIoT) solution in this paper. In this solution, we demonstrate how video cameras can be installed in a factory shop floor to continuously obtain images of the controller HMIs. We propose image pre-processing to segment the HMI into regions of streaming data and regions of fixed meta-data. We then evaluate the performance of multiple Optical Character Recognition (OCR) technologies such as Tesseract and Google vision to recognize the streaming data and test it for typical factory HMIs and realistic lighting conditions. Finally, we use the meta-data to match the OCR output with the temporal, domain-dependent context of the data to improve the accuracy of the output. Our IIoT solution enables reliable and efficient data extraction which will improve the performance of subsequent AI applications.Keywords: human machine interface, industrial internet of things, internet of things, optical character recognition, video analytics
Procedia PDF Downloads 109