Search results for: imbalanced dataset
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 1149

Search results for: imbalanced dataset

279 C-eXpress: A Web-Based Analysis Platform for Comparative Functional Genomics and Proteomics in Human Cancer Cell Line, NCI-60 as an Example

Authors: Chi-Ching Lee, Po-Jung Huang, Kuo-Yang Huang, Petrus Tang

Abstract:

Background: Recent advances in high-throughput research technologies such as new-generation sequencing and multi-dimensional liquid chromatography makes it possible to dissect the complete transcriptome and proteome in a single run for the first time. However, it is almost impossible for many laboratories to handle and analysis these “BIG” data without the support from a bioinformatics team. We aimed to provide a web-based analysis platform for users with only limited knowledge on bio-computing to study the functional genomics and proteomics. Method: We use NCI-60 as an example dataset to demonstrate the power of the web-based analysis platform and data delivering system: C-eXpress takes a simple text file that contain the standard NCBI gene or protein ID and expression levels (rpkm or fold) as input file to generate a distribution map of gene/protein expression levels in a heatmap diagram organized by color gradients. The diagram is hyper-linked to a dynamic html table that allows the users to filter the datasets based on various gene features. A dynamic summary chart is generated automatically after each filtering process. Results: We implemented an integrated database that contain pre-defined annotations such as gene/protein properties (ID, name, length, MW, pI); pathways based on KEGG and GO biological process; subcellular localization based on GO cellular component; functional classification based on GO molecular function, kinase, peptidase and transporter. Multiple ways of sorting of column and rows is also provided for comparative analysis and visualization of multiple samples.

Keywords: cancer, visualization, database, functional annotation

Procedia PDF Downloads 601
278 Faster Pedestrian Recognition Using Deformable Part Models

Authors: Alessandro Preziosi, Antonio Prioletti, Luca Castangia

Abstract:

Deformable part models achieve high precision in pedestrian recognition, but all publicly available implementations are too slow for real-time applications. We implemented a deformable part model algorithm fast enough for real-time use by exploiting information about the camera position and orientation. This implementation is both faster and more precise than alternative DPM implementations. These results are obtained by computing convolutions in the frequency domain and using lookup tables to speed up feature computation. This approach is almost an order of magnitude faster than the reference DPM implementation, with no loss in precision. Knowing the position of the camera with respect to horizon it is also possible prune many hypotheses based on their size and location. The range of acceptable sizes and positions is set by looking at the statistical distribution of bounding boxes in labelled images. With this approach it is not needed to compute the entire feature pyramid: for example higher resolution features are only needed near the horizon. This results in an increase in mean average precision of 5% and an increase in speed by a factor of two. Furthermore, to reduce misdetections involving small pedestrians near the horizon, input images are supersampled near the horizon. Supersampling the image at 1.5 times the original scale, results in an increase in precision of about 4%. The implementation was tested against the public KITTI dataset, obtaining an 8% improvement in mean average precision over the best performing DPM-based method. By allowing for a small loss in precision computational time can be easily brought down to our target of 100ms per image, reaching a solution that is faster and still more precise than all publicly available DPM implementations.

Keywords: autonomous vehicles, deformable part model, dpm, pedestrian detection, real time

Procedia PDF Downloads 260
277 Performance Comparison of Deep Convolutional Neural Networks for Binary Classification of Fine-Grained Leaf Images

Authors: Kamal KC, Zhendong Yin, Dasen Li, Zhilu Wu

Abstract:

Intra-plant disease classification based on leaf images is a challenging computer vision task due to similarities in texture, color, and shape of leaves with a slight variation of leaf spot; and external environmental changes such as lighting and background noises. Deep convolutional neural network (DCNN) has proven to be an effective tool for binary classification. In this paper, two methods for binary classification of diseased plant leaves using DCNN are presented; model created from scratch and transfer learning. Our main contribution is a thorough evaluation of 4 networks created from scratch and transfer learning of 5 pre-trained models. Training and testing of these models were performed on a plant leaf images dataset belonging to 16 distinct classes, containing a total of 22,265 images from 8 different plants, consisting of a pair of healthy and diseased leaves. We introduce a deep CNN model, Optimized MobileNet. This model with depthwise separable CNN as a building block attained an average test accuracy of 99.77%. We also present a fine-tuning method by introducing the concept of a convolutional block, which is a collection of different deep neural layers. Fine-tuned models proved to be efficient in terms of accuracy and computational cost. Fine-tuned MobileNet achieved an average test accuracy of 99.89% on 8 pairs of [healthy, diseased] leaf ImageSet.

Keywords: deep convolution neural network, depthwise separable convolution, fine-grained classification, MobileNet, plant disease, transfer learning

Procedia PDF Downloads 168
276 Medical Diagnosis of Retinal Diseases Using Artificial Intelligence Deep Learning Models

Authors: Ethan James

Abstract:

Over one billion people worldwide suffer from some level of vision loss or blindness as a result of progressive retinal diseases. Many patients, particularly in developing areas, are incorrectly diagnosed or undiagnosed whatsoever due to unconventional diagnostic tools and screening methods. Artificial intelligence (AI) based on deep learning (DL) convolutional neural networks (CNN) have recently gained a high interest in ophthalmology for its computer-imaging diagnosis, disease prognosis, and risk assessment. Optical coherence tomography (OCT) is a popular imaging technique used to capture high-resolution cross-sections of retinas. In ophthalmology, DL has been applied to fundus photographs, optical coherence tomography, and visual fields, achieving robust classification performance in the detection of various retinal diseases including macular degeneration, diabetic retinopathy, and retinitis pigmentosa. However, there is no complete diagnostic model to analyze these retinal images that provide a diagnostic accuracy above 90%. Thus, the purpose of this project was to develop an AI model that utilizes machine learning techniques to automatically diagnose specific retinal diseases from OCT scans. The algorithm consists of neural network architecture that was trained from a dataset of over 20,000 real-world OCT images to train the robust model to utilize residual neural networks with cyclic pooling. This DL model can ultimately aid ophthalmologists in diagnosing patients with these retinal diseases more quickly and more accurately, therefore facilitating earlier treatment, which results in improved post-treatment outcomes.

Keywords: artificial intelligence, deep learning, imaging, medical devices, ophthalmic devices, ophthalmology, retina

Procedia PDF Downloads 159
275 Graph Neural Network-Based Classification for Disease Prediction in Health Care Heterogeneous Data Structures of Electronic Health Record

Authors: Raghavi C. Janaswamy

Abstract:

In the healthcare sector, heterogenous data elements such as patients, diagnosis, symptoms, conditions, observation text from physician notes, and prescriptions form the essentials of the Electronic Health Record (EHR). The data in the form of clear text and images are stored or processed in a relational format in most systems. However, the intrinsic structure restrictions and complex joins of relational databases limit the widespread utility. In this regard, the design and development of realistic mapping and deep connections as real-time objects offer unparallel advantages. Herein, a graph neural network-based classification of EHR data has been developed. The patient conditions have been predicted as a node classification task using a graph-based open source EHR data, Synthea Database, stored in Tigergraph. The Synthea DB dataset is leveraged due to its closer representation of the real-time data and being voluminous. The graph model is built from the EHR heterogeneous data using python modules, namely, pyTigerGraph to get nodes and edges from the Tigergraph database, PyTorch to tensorize the nodes and edges, PyTorch-Geometric (PyG) to train the Graph Neural Network (GNN) and adopt the self-supervised learning techniques with the AutoEncoders to generate the node embeddings and eventually perform the node classifications using the node embeddings. The model predicts patient conditions ranging from common to rare situations. The outcome is deemed to open up opportunities for data querying toward better predictions and accuracy.

Keywords: electronic health record, graph neural network, heterogeneous data, prediction

Procedia PDF Downloads 72
274 Remaining Useful Life Estimation of Bearings Based on Nonlinear Dimensional Reduction Combined with Timing Signals

Authors: Zhongmin Wang, Wudong Fan, Hengshan Zhang, Yimin Zhou

Abstract:

In data-driven prognostic methods, the prediction accuracy of the estimation for remaining useful life of bearings mainly depends on the performance of health indicators, which are usually fused some statistical features extracted from vibrating signals. However, the existing health indicators have the following two drawbacks: (1) The differnet ranges of the statistical features have the different contributions to construct the health indicators, the expert knowledge is required to extract the features. (2) When convolutional neural networks are utilized to tackle time-frequency features of signals, the time-series of signals are not considered. To overcome these drawbacks, in this study, the method combining convolutional neural network with gated recurrent unit is proposed to extract the time-frequency image features. The extracted features are utilized to construct health indicator and predict remaining useful life of bearings. First, original signals are converted into time-frequency images by using continuous wavelet transform so as to form the original feature sets. Second, with convolutional and pooling layers of convolutional neural networks, the most sensitive features of time-frequency images are selected from the original feature sets. Finally, these selected features are fed into the gated recurrent unit to construct the health indicator. The results state that the proposed method shows the enhance performance than the related studies which have used the same bearing dataset provided by PRONOSTIA.

Keywords: continuous wavelet transform, convolution neural net-work, gated recurrent unit, health indicators, remaining useful life

Procedia PDF Downloads 113
273 Human-Centred Data Analysis Method for Future Design of Residential Spaces: Coliving Case Study

Authors: Alicia Regodon Puyalto, Alfonso Garcia-Santos

Abstract:

This article presents a method to analyze the use of indoor spaces based on data analytics obtained from inbuilt digital devices. The study uses the data generated by the in-place devices, such as smart locks, Wi-Fi routers, and electrical sensors, to gain additional insights on space occupancy, user behaviour, and comfort. Those devices, originally installed to facilitate remote operations, report data through the internet that the research uses to analyze information on human real-time use of spaces. Using an in-place Internet of Things (IoT) network enables a faster, more affordable, seamless, and scalable solution to analyze building interior spaces without incorporating external data collection systems such as sensors. The methodology is applied to a real case study of coliving, a residential building of 3000m², 7 floors, and 80 users in the centre of Madrid. The case study applies the method to classify IoT devices, assess, clean, and analyze collected data based on the analysis framework. The information is collected remotely, through the different platforms devices' platforms; the first step is to curate the data, understand what insights can be provided from each device according to the objectives of the study, this generates an analysis framework to be escalated for future building assessment even beyond the residential sector. The method will adjust the parameters to be analyzed tailored to the dataset available in the IoT of each building. The research demonstrates how human-centered data analytics can improve the future spatial design of indoor spaces.

Keywords: in-place devices, IoT, human-centred data-analytics, spatial design

Procedia PDF Downloads 178
272 Adjusting Electricity Demand Data to Account for the Impact of Loadshedding in Forecasting Models

Authors: Migael van Zyl, Stefanie Visser, Awelani Phaswana

Abstract:

The electricity landscape in South Africa is characterized by frequent occurrences of loadshedding, a measure implemented by Eskom to manage electricity generation shortages by curtailing demand. Loadshedding, classified into stages ranging from 1 to 8 based on severity, involves the systematic rotation of power cuts across municipalities according to predefined schedules. However, this practice introduces distortions in recorded electricity demand, posing challenges to accurate forecasting essential for budgeting, network planning, and generation scheduling. Addressing this challenge requires the development of a methodology to quantify the impact of loadshedding and integrate it back into metered electricity demand data. Fortunately, comprehensive records of loadshedding impacts are maintained in a database, enabling the alignment of Loadshedding effects with hourly demand data. This adjustment ensures that forecasts accurately reflect true demand patterns, independent of loadshedding's influence, thereby enhancing the reliability of electricity supply management in South Africa. This paper presents a methodology for determining the hourly impact of load scheduling and subsequently adjusting historical demand data to account for it. Furthermore, two forecasting models are developed: one utilizing the original dataset and the other using the adjusted data. A comparative analysis is conducted to evaluate forecast accuracy improvements resulting from the adjustment process. By implementing this methodology, stakeholders can make more informed decisions regarding electricity infrastructure investments, resource allocation, and operational planning, contributing to the overall stability and efficiency of South Africa's electricity supply system.

Keywords: electricity demand forecasting, load shedding, demand side management, data science

Procedia PDF Downloads 41
271 A Comparative Analysis of (De)legitimation Strategies in Selected African Inaugural Speeches

Authors: Lily Chimuanya, Ehioghae Esther

Abstract:

Language, a versatile and sophisticated tool, is fundamentally sacrosanct to mankind especially within the realm of politics. In this dynamic world, political leaders adroitly use language to engage in a strategic show aimed at manipulating or mechanising the opinion of discerning people. This nuanced synergy is marked by different rhetorical strategies, meticulously synced with contextual factors ranging from cultural, ideological, and political to achieve multifaceted persuasive objectives. This study investigates the (de)legitimation strategies inherent in African presidential inaugural speeches, as African leaders not only state their policy agenda through inaugural speeches but also subtly indulge in a dance of legitimation and delegitimation, performing a twofold objective of strengthening the credibility of their administration and, at times, undermining the performance of the past administration. Drawing insights from two different legitimation models and a dataset of 4 African presidential inaugural speeches obtained from authentic websites, the study describes the roles of authorisation, rationalisation, moral evaluation, altruism, and mythopoesis in unmasking the structure of political discourse. The analysis takes a mixed-method approach to unpack the (de)legitimation strategy embedded in the carefully chosen speeches. The focus extends beyond a superficial exploration and delves into the linguistic elements that form the basis of presidential discourse. In conclusion, this examination goes beyond the nuanced landscape of language as a potent tool in politics, with each strategy contributing to the overall rhetorical impact and shaping the narrative. From this perspective, the study argues that presidential inaugural speeches are not only linguistic exercises but also viable weapons that influence perceptions and legitimise authority.

Keywords: CDA, legitimation, inaugural speeches, delegitmation

Procedia PDF Downloads 42
270 Scalable and Accurate Detection of Pathogens from Whole-Genome Shotgun Sequencing

Authors: Janos Juhasz, Sandor Pongor, Balazs Ligeti

Abstract:

Next-generation sequencing, especially whole genome shotgun sequencing, is becoming a common approach to gain insight into the microbiomes in a culture-independent way, even in clinical practice. It does not only give us information about the species composition of an environmental sample but opens the possibility to detect antimicrobial resistance and novel, or currently unknown, pathogens. Accurately and reliably detecting the microbial strains is a challenging task. Here we present a sensitive approach for detecting pathogens in metagenomics samples with special regard to detecting novel variants of known pathogens. We have developed a pipeline that uses fast, short read aligner programs (i.e., Bowtie2/BWA) and comprehensive nucleotide databases. Taxonomic binning is based on the lowest common ancestor (LCA) principle; each read is assigned to a taxon, covering the most significantly hit taxa. This approach helps in balancing between sensitivity and running time. The program was tested both on experimental and synthetic data. The results implicate that our method performs as good as the state-of-the-art BLAST-based ones, furthermore, in some cases, it even proves to be better, while running two orders magnitude faster. It is sensitive and capable of identifying taxa being present only in small abundance. Moreover, it needs two orders of magnitude less reads to complete the identification than MetaPhLan2 does. We analyzed an experimental anthrax dataset (B. anthracis strain BA104). The majority of the reads (96.50%) was classified as Bacillus anthracis, a small portion, 1.2%, was classified as other species from the Bacillus genus. We demonstrate that the evaluation of high-throughput sequencing data is feasible in a reasonable time with good classification accuracy.

Keywords: metagenomics, taxonomy binning, pathogens, microbiome, B. anthracis

Procedia PDF Downloads 116
269 Generalized Additive Model for Estimating Propensity Score

Authors: Tahmidul Islam

Abstract:

Propensity Score Matching (PSM) technique has been widely used for estimating causal effect of treatment in observational studies. One major step of implementing PSM is estimating the propensity score (PS). Logistic regression model with additive linear terms of covariates is most used technique in many studies. Logistics regression model is also used with cubic splines for retaining flexibility in the model. However, choosing the functional form of the logistic regression model has been a question since the effectiveness of PSM depends on how accurately the PS been estimated. In many situations, the linearity assumption of linear logistic regression may not hold and non-linear relation between the logit and the covariates may be appropriate. One can estimate PS using machine learning techniques such as random forest, neural network etc for more accuracy in non-linear situation. In this study, an attempt has been made to compare the efficacy of Generalized Additive Model (GAM) in various linear and non-linear settings and compare its performance with usual logistic regression. GAM is a non-parametric technique where functional form of the covariates can be unspecified and a flexible regression model can be fitted. In this study various simple and complex models have been considered for treatment under several situations (small/large sample, low/high number of treatment units) and examined which method leads to more covariate balance in the matched dataset. It is found that logistic regression model is impressively robust against inclusion quadratic and interaction terms and reduces mean difference in treatment and control set equally efficiently as GAM does. GAM provided no significantly better covariate balance than logistic regression in both simple and complex models. The analysis also suggests that larger proportion of controls than treatment units leads to better balance for both of the methods.

Keywords: accuracy, covariate balances, generalized additive model, logistic regression, non-linearity, propensity score matching

Procedia PDF Downloads 350
268 In-Depth Analysis on Sequence Evolution and Molecular Interaction of Influenza Receptors (Hemagglutinin and Neuraminidase)

Authors: Dong Tran, Thanh Dac Van, Ly Le

Abstract:

Hemagglutinin (HA) and Neuraminidase (NA) play an important role in host immune evasion across influenza virus evolution process. The correlation between HA and NA evolution in respect to epitopic evolution and drug interaction has yet to be investigated. In this study, combining of sequence to structure evolution and statistical analysis on epitopic/binding site specificity, we identified potential therapeutic features of HA and NA that show specific antibody binding site of HA and specific binding distribution within NA active site of current inhibitors. Our approach introduces the use of sequence variation and molecular interaction to provide an effective strategy in establishing experimental based distributed representations of protein-protein/ligand complexes. The most important advantage of our method is that it does not require complete dataset of complexes but rather directly inferring feature interaction from sequence variation and molecular interaction. Using correlated sequence analysis, we additionally identified co-evolved mutations associated with maintaining HA/NA structural and functional variability toward immunity and therapeutic treatment. Our investigation on the HA binding specificity revealed unique conserved stalk domain interacts with unique loop domain of universal antibodies (CR9114, CT149, CR8043, CR8020, F16v3, CR6261, F10). On the other hand, NA inhibitors (Oseltamivir, Zaninamivir, Laninamivir) showed specific conserved residue contribution and similar to that of NA substrate (sialic acid) which can be exploited for drug design. Our study provides an important insight into rational design and identification of novel therapeutics targeting universally recognized feature of influenza HA/NA.

Keywords: influenza virus, hemagglutinin (HA), neuraminidase (NA), sequence evolution

Procedia PDF Downloads 140
267 Hands-off Parking: Deep Learning Gesture-based System for Individuals with Mobility Needs

Authors: Javier Romera, Alberto Justo, Ignacio Fidalgo, Joshue Perez, Javier Araluce

Abstract:

Nowadays, individuals with mobility needs face a significant challenge when docking vehicles. In many cases, after parking, they encounter insufficient space to exit, leading to two undesired outcomes: either avoiding parking in that spot or settling for improperly placed vehicles. To address this issue, the following paper presents a parking control system employing gestural teleoperation. The system comprises three main phases: capturing body markers, interpreting gestures, and transmitting orders to the vehicle. The initial phase is centered around the MediaPipe framework, a versatile tool optimized for real-time gesture recognition. MediaPipe excels at detecting and tracing body markers, with a special emphasis on hand gestures. Hands detection is done by generating 21 reference points for each hand. Subsequently, after data capture, the project employs the MultiPerceptron Layer (MPL) for indepth gesture classification. This tandem of MediaPipe's extraction prowess and MPL's analytical capability ensures that human gestures are translated into actionable commands with high precision. Furthermore, the system has been trained and validated within a built-in dataset. To prove the domain adaptation, a framework based on the Robot Operating System (ROS), as a communication backbone, alongside CARLA Simulator, is used. Following successful simulations, the system is transitioned to a real-world platform, marking a significant milestone in the project. This real vehicle implementation verifies the practicality and efficiency of the system beyond theoretical constructs.

Keywords: gesture detection, mediapipe, multiperceptron layer, robot operating system

Procedia PDF Downloads 77
266 Comparison of Deep Learning and Machine Learning Algorithms to Diagnose and Predict Breast Cancer

Authors: F. Ghazalnaz Sharifonnasabi, Iman Makhdoom

Abstract:

Breast cancer is a serious health concern that affects many people around the world. According to a study published in the Breast journal, the global burden of breast cancer is expected to increase significantly over the next few decades. The number of deaths from breast cancer has been increasing over the years, but the age-standardized mortality rate has decreased in some countries. It’s important to be aware of the risk factors for breast cancer and to get regular check- ups to catch it early if it does occur. Machin learning techniques have been used to aid in the early detection and diagnosis of breast cancer. These techniques, that have been shown to be effective in predicting and diagnosing the disease, have become a research hotspot. In this study, we consider two deep learning approaches including: Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN). We also considered the five-machine learning algorithm titled: Decision Tree (C4.5), Naïve Bayesian (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) Algorithm and XGBoost (eXtreme Gradient Boosting) on the Breast Cancer Wisconsin Diagnostic dataset. We have carried out the process of evaluating and comparing classifiers involving selecting appropriate metrics to evaluate classifier performance and selecting an appropriate tool to quantify this performance. The main purpose of the study is predicting and diagnosis breast cancer, applying the mentioned algorithms and also discovering of the most effective with respect to confusion matrix, accuracy and precision. It is realized that CNN outperformed all other classifiers and achieved the highest accuracy (0.982456). The work is implemented in the Anaconda environment based on Python programing language.

Keywords: breast cancer, multi-layer perceptron, Naïve Bayesian, SVM, decision tree, convolutional neural network, XGBoost, KNN

Procedia PDF Downloads 55
265 Review of the Road Crash Data Availability in Iraq

Authors: Abeer K. Jameel, Harry Evdorides

Abstract:

Iraq is a middle income country where the road safety issue is considered one of the leading causes of deaths. To control the road risk issue, the Iraqi Ministry of Planning, General Statistical Organization started to organise a collection system of traffic accidents data with details related to their causes and severity. These data are published as an annual report. In this paper, a review of the available crash data in Iraq will be presented. The available data represent the rate of accidents in aggregated level and classified according to their types, road users’ details, and crash severity, type of vehicles, causes and number of causalities. The review is according to the types of models used in road safety studies and research, and according to the required road safety data in the road constructions tasks. The available data are also compared with the road safety dataset published in the United Kingdom as an example of developed country. It is concluded that the data in Iraq are suitable for descriptive and exploratory models, aggregated level comparison analysis, and evaluation and monitoring the progress of the overall traffic safety performance. However, important traffic safety studies require disaggregated level of data and details related to the factors of the likelihood of traffic crashes. Some studies require spatial geographic details such as the location of the accidents which is essential in ranking the roads according to their level of safety, and name the most dangerous roads in Iraq which requires tactic plan to control this issue. Global Road safety agencies interested in solve this problem in low and middle-income countries have designed road safety assessment methodologies which are basing on the road attributes data only. Therefore, in this research it is recommended to use one of these methodologies.

Keywords: road safety, Iraq, crash data, road risk assessment, The International Road Assessment Program (iRAP)

Procedia PDF Downloads 239
264 Analysis of Pangasinan State University: Bayambang Students’ Concerns Through Social Media Analytics and Latent Dirichlet Allocation Topic Modelling Approach

Authors: Matthew John F. Sino Cruz, Sarah Jane M. Ferrer, Janice C. Francisco

Abstract:

COVID-19 pandemic has affected more than 114 countries all over the world since it was considered a global health concern in 2020. Different sectors, including education, have shifted to remote/distant setups to follow the guidelines set to prevent the spread of the disease. One of the higher education institutes which shifted to remote setup is the Pangasinan State University (PSU). In order to continue providing quality instructions to the students, PSU designed Flexible Learning Model to still provide services to its stakeholders amidst the pandemic. The model covers the redesigning of delivering instructions in remote setup and the technology needed to support these adjustments. The primary goal of this study is to determine the insights of the PSU – Bayambang students towards the remote setup implemented during the pandemic and how they perceived the initiatives employed in relation to their experiences in flexible learning. In this study, the topic modelling approach was implemented using Latent Dirichlet Allocation. The dataset used in the study. The results show that the most common concern of the students includes time and resource management, poor internet connection issues, and difficulty coping with the flexible learning modality. Furthermore, the findings of the study can be used as one of the bases for the administration to review and improve the policies and initiatives implemented during the pandemic in relation to remote service delivery. In addition, further studies can be conducted to determine the overall sentiment of the other stakeholders in the policies implemented at the University.

Keywords: COVID-19, topic modelling, students’ sentiment, flexible learning, Latent Dirichlet allocation

Procedia PDF Downloads 105
263 Hsa-miR-192-5p, and Hsa-miR-129-5p Prominent Biomarkers in Regulation Glioblastoma Cancer Stem Cells Genes Microenvironment

Authors: Rasha Ahmadi

Abstract:

Glioblastoma is one of the most frequent brain malignancies, having a high mortality rate and limited survival in individuals with this malignancy. Despite different treatments and surgery, recurrence of glioblastoma cancer stem cells may arise as a subsequent tumor. For this reason, it is crucial to research the markers associated with glioblastoma stem cells and specifically their microenvironment. In this study, using bioinformatics analysis, we analyzed and nominated genes in the microenvironment pathways of glioblastoma stem cells. In this study, an appropriate database was selected for analysis by referring to the GEO database. This dataset comprised gene expression patterns in stem cells derived from glioblastoma patients. Gene clusters were divided as high and low expression. Enrichment databases such as Enrichr, STRING, and GEPIA were utilized to analyze the data appropriately. Finally, we extracted the potential genes 2700 high-expression and 1100 low-expression genes are implicated in the metabolic pathways of glioblastoma cancer progression. Cellular senescence, MAPK, TNF, hypoxia, zimosterol biosynthesis, and phosphatidylinositol metabolism pathways were substantially expressed and the metabolic pathways were downregulated. After assessing the association between protein networks, MSMP, SOX2, FGD4 ,and CNTNAP3 genes with high expression and DMKN and SBSN genes with low were selected. All of these genes were observed in the survival curve, with a survival of fewer than 10 percent over around 15 months. hsa-mir-192-5p, hsa-mir-129-5p, hsa-mir-215-5p, hsa-mir-335-5p, and hsa-mir-340-5p played key function in glioblastoma cancer stem cells microenviroments. We introduced critical genes through integrated and regular bioinformatics studies by assessing the amount of gene expression profile data that can play an important role in targeting genes involved in the energy and microenvironment of glioblastoma cancer stem cells. Have. This study indicated that hsa-mir-192-5p, and hsa-mir-129-5p are appropriate candidates for this.

Keywords: Glioblastoma, Cancer Stem Cells, Biomarker Discovery, Gene Expression Profiles, Bioinformatics Analysis, Tumor Microenvironment

Procedia PDF Downloads 119
262 Modeling Average Paths Traveled by Ferry Vessels Using AIS Data

Authors: Devin Simmons

Abstract:

At the USDOT’s Bureau of Transportation Statistics, a biannual census of ferry operators in the U.S. is conducted, with results such as route mileage used to determine federal funding levels for operators. AIS data allows for the possibility of using GIS software and geographical methods to confirm operator-reported mileage for individual ferry routes. As part of the USDOT’s work on the ferry census, an algorithm was developed that uses AIS data for ferry vessels in conjunction with known ferry terminal locations to model the average route travelled for use as both a cartographic product and confirmation of operator-reported mileage. AIS data from each vessel is first analyzed to determine individual journeys based on the vessel’s velocity, and changes in velocity over time. These trips are then converted to geographic linestring objects. Using the terminal locations, the algorithm then determines whether the trip represented a known ferry route. Given a large enough dataset, routes will be represented by multiple trip linestrings, which are then filtered by DBSCAN spatial clustering to remove outliers. Finally, these remaining trips are ready to be averaged into one route. The algorithm interpolates the point on each trip linestring that represents the start point. From these start points, a centroid is calculated, and the first point of the average route is determined. Each trip is interpolated again to find the point that represents one percent of the journey’s completion, and the centroid of those points is used as the next point in the average route, and so on until 100 points have been calculated. Routes created using this algorithm have shown demonstrable improvement over previous methods, which included the implementation of a LOESS model. Additionally, the algorithm greatly reduces the amount of manual digitizing needed to visualize ferry activity.

Keywords: ferry vessels, transportation, modeling, AIS data

Procedia PDF Downloads 154
261 Influence of Geologic and Geotechnical Dataset Resolution on Regional Liquefaction Assessment of the Lower Wairau Plains

Authors: Omer Altaf, Liam Wotherspoon, Rolando Orense

Abstract:

The Wairau Plains are located in the northeast of the South Island of New Zealand, with alluvial deposits of fine-grained silts and sands combined with low-lying topography suggesting the presence of liquefiable deposits over significant portions of the region. Liquefaction manifestations were observed in past earthquakes, including the 1848 Marlborough and 1855 Wairarapa earthquakes, and more recently during the 2013 Lake Grassmere and 2016 Kaikōura earthquakes. Therefore, a good understanding of the deposits that may be susceptible to liquefaction is important for land use planning in the region and to allow developers and asset owners to appropriately address their risk. For this purpose, multiple approaches have been employed to develop regional-scale maps showing the liquefaction vulnerability categories for the region. After applying semi-qualitative criteria linked to geologic age and deposit type, the higher resolution surface mapping of geomorphologic characteristics encompassing the Wairau River and the Opaoa River was used for screening. A detailed basin geologic model developed for groundwater modelling was analysed to provide a higher level of resolution than the surface-geology based classification. This is used to identify the thickness of near-surface gravel deposits, providing an improved understanding of the presence or lack of potentially non-liquefiable crust deposits. This paper describes the methodology adopted for this project and focuses on the influence of geomorphic characteristics and analysis of the detailed geologic basin model on the liquefaction classification of the Lower Wairau Plains.

Keywords: liquefaction, earthquake, cone penetration test, mapping, liquefaction-induced damage

Procedia PDF Downloads 163
260 A Radiomics Approach to Predict the Evolution of Prostate Imaging Reporting and Data System Score 3/5 Prostate Areas in Multiparametric Magnetic Resonance

Authors: Natascha C. D'Amico, Enzo Grossi, Giovanni Valbusa, Ala Malasevschi, Gianpiero Cardone, Sergio Papa

Abstract:

Purpose: To characterize, through a radiomic approach, the nature of areas classified PI-RADS (Prostate Imaging Reporting and Data System) 3/5, recognized in multiparametric prostate magnetic resonance with T2-weighted (T2w), diffusion and perfusion sequences with paramagnetic contrast. Methods and Materials: 24 cases undergoing multiparametric prostate MR and biopsy were admitted to this pilot study. Clinical outcome of the PI-RADS 3/5 was found through biopsy, finding 8 malignant tumours. The analysed images were acquired with a Philips achieva 1.5T machine with a CE- T2-weighted sequence in the axial plane. Semi-automatic tumour segmentation was carried out on MR images using 3DSlicer image analysis software. 45 shape-based, intensity-based and texture-based features were extracted and represented the input for preprocessing. An evolutionary algorithm (a TWIST system based on KNN algorithm) was used to subdivide the dataset into training and testing set and select features yielding the maximal amount of information. After this pre-processing 20 input variables were selected and different machine learning systems were used to develop a predictive model based on a training testing crossover procedure. Results: The best machine learning system (three-layers feed-forward neural network) obtained a global accuracy of 90% ( 80 % sensitivity and 100% specificity ) with a ROC of 0.82. Conclusion: Machine learning systems coupled with radiomics show a promising potential in distinguishing benign from malign tumours in PI-RADS 3/5 areas.

Keywords: machine learning, MR prostate, PI-Rads 3, radiomics

Procedia PDF Downloads 173
259 Sea Level Characteristics Referenced to Specific Geodetic Datum in Alexandria, Egypt

Authors: Ahmed M. Khedr, Saad M. Abdelrahman, Kareem M. Tonbol

Abstract:

Two geo-referenced sea level datasets (September 2008 – November 2010) and (April 2012 – January 2014) were recorded at Alexandria Western Harbour (AWH). Accurate re-definition of tidal datum, referred to the latest International Terrestrial Reference Frame (ITRF-2014), was discussed and updated to improve our understanding of the old predefined tidal datum at Alexandria. Tidal and non-tidal components of sea level were separated with the use of Delft-3D hydrodynamic model-tide suit (Delft-3D, 2015). Tidal characteristics at AWH were investigated and harmonic analysis showed the most significant 34 constituents with their amplitudes and phases. Tide was identified as semi-diurnal pattern as indicated by a “Form Factor” of 0.24 and 0.25, respectively. Principle tidal datums related to major tidal phenomena were recalculated referred to a meaningful geodetic height datum. The portion of residual energy (surge) out of the total sea level energy was computed for each dataset and found 77% and 72%, respectively. Power spectral density (PSD) showed accurate resolvability in high band (1–6) cycle/days for the nominated independent constituents, except some neighbouring constituents, which are too close in frequency. Wind and atmospheric pressure data, during the recorded sea level time, were analysed and cross-correlated with the surge signals. Moderate association between surge and wind and atmospheric pressure data were obtained. In addition, long-term sea level rise trend at AWH was computed and showed good agreement with earlier estimated rates.

Keywords: Alexandria, Delft-3D, Egypt, geodetic reference, harmonic analysis, sea level

Procedia PDF Downloads 150
258 A Multi-Objective Decision Making Model for Biodiversity Conservation and Planning: Exploring the Concept of Interdependency

Authors: M. Mohan, J. P. Roise, G. P. Catts

Abstract:

Despite living in an era where conservation zones are de-facto the central element in any sustainable wildlife management strategy, we still find ourselves grappling with several pareto-optimal situations regarding resource allocation and area distribution for the same. In this paper, a multi-objective decision making (MODM) model is presented to answer the question of whether or not we can establish mutual relationships between these contradicting objectives. For our study, we considered a Red-cockaded woodpecker (Picoides borealis) habitat conservation scenario in the coastal plain of North Carolina, USA. Red-cockaded woodpecker (RCW) is a non-migratory territorial bird that excavates cavities in living pine trees for roosting and nesting. The RCW groups nest in an aggregation of cavity trees called ‘cluster’ and for our model we use the number of clusters to be established as a measure of evaluating the size of conservation zone required. The case study is formulated as a linear programming problem and the objective function optimises the Red-cockaded woodpecker clusters, carbon retention rate, biofuel, public safety and Net Present Value (NPV) of the forest. We studied the variation of individual objectives with respect to the amount of area available and plotted a two dimensional dynamic graph after establishing interrelations between the objectives. We further explore the concept of interdependency by integrating the MODM model with GIS, and derive a raster file representing carbon distribution from the existing forest dataset. Model results demonstrate the applicability of interdependency from both linear and spatial perspectives, and suggest that this approach holds immense potential for enhancing environmental investment decision making in future.

Keywords: conservation, interdependency, multi-objective decision making, red-cockaded woodpecker

Procedia PDF Downloads 320
257 Internal Auditing and the Performance of State-Owned Enterprises in Emerging Markets

Authors: Jobo Dubihlela, Kofi Boamah

Abstract:

The inimitable role of the internal auditing, challenges and the predicament of state-owned enterprises in emerging markets are acknowledged. Study sought to address the inter-related questions, about how does IAF complement the performance and sustainability of SOEs? How can effective IA control systems be implemented to improve the performance results and culture of SOEs in Namibia? The weaknesses inherent in the SOE sector, unfortunately, impacts on the IAF ability to effectively support the SOEs. Despite these challenges, the study has unearthed IAF potential capabilities to contribute to SOE survival in Namibia by complementing the governance practices of the sector. Using a quantitative research approach, the dataset was collected and analysed from SOEs to confirm the role of the internal auditing function (IAF) as an indispensable concomitant of SOE performance. The study adopted a data approach supported by the literary evidence, which enabled generalisation and connectedness of the issues being addressed. The outcome of the data analysis contributed to achieving the results, which are discussed and eventually support the conclusions reached. Results show that the intractable task of internal auditing depends on the leadership of the board of directors of the SOEs. Study also revealed critical priorities needed to influence policymakers and oversight bodies to overcome the iniquities influencing SOE operations, understand and embrace IAF to salvage a sector that has a lot to offer and yet is severely mismanaged. Results support literature on IA’s contribution to SOE development from a developing country’s point of view and is the first of its kind in Namibia. Findings suggest ways to possibly enhance knowledge development of future researchers and ‘wet their appetite’ for further research in emerging markets and on a global scale.

Keywords: internal auditing activity, state-owned enterprises, emerging markets, auditing function

Procedia PDF Downloads 85
256 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features

Authors: Bushra Zafar, Usman Qamar

Abstract:

Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.

Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection

Procedia PDF Downloads 304
255 Multivariate Analysis on Water Quality Attributes Using Master-Slave Neural Network Model

Authors: A. Clementking, C. Jothi Venkateswaran

Abstract:

Mathematical and computational functionalities such as descriptive mining, optimization, and predictions are espoused to resolve natural resource planning. The water quality prediction and its attributes influence determinations are adopted optimization techniques. The water properties are tainted while merging water resource one with another. This work aimed to predict influencing water resource distribution connectivity in accordance to water quality and sediment using an innovative proposed master-slave neural network back-propagation model. The experiment results are arrived through collecting water quality attributes, computation of water quality index, design and development of neural network model to determine water quality and sediment, master–slave back propagation neural network back-propagation model to determine variations on water quality and sediment attributes between the water resources and the recommendation for connectivity. The homogeneous and parallel biochemical reactions are influences water quality and sediment while distributing water from one location to another. Therefore, an innovative master-slave neural network model [M (9:9:2)::S(9:9:2)] designed and developed to predict the attribute variations. The result of training dataset given as an input to master model and its maximum weights are assigned as an input to the slave model to predict the water quality. The developed master-slave model is predicted physicochemical attributes weight variations for 85 % to 90% of water quality as a target values.The sediment level variations also predicated from 0.01 to 0.05% of each water quality percentage. The model produced the significant variations on physiochemical attribute weights. According to the predicated experimental weight variation on training data set, effective recommendations are made to connect different resources.

Keywords: master-slave back propagation neural network model(MSBPNNM), water quality analysis, multivariate analysis, environmental mining

Procedia PDF Downloads 456
254 Feature Evaluation Based on Random Subspace and Multiple-K Ensemble

Authors: Jaehong Yu, Seoung Bum Kim

Abstract:

Clustering analysis can facilitate the extraction of intrinsic patterns in a dataset and reveal its natural groupings without requiring class information. For effective clustering analysis in high dimensional datasets, unsupervised dimensionality reduction is an important task. Unsupervised dimensionality reduction can generally be achieved by feature extraction or feature selection. In many situations, feature selection methods are more appropriate than feature extraction methods because of their clear interpretation with respect to the original features. The unsupervised feature selection can be categorized as feature subset selection and feature ranking method, and we focused on unsupervised feature ranking methods which evaluate the features based on their importance scores. Recently, several unsupervised feature ranking methods were developed based on ensemble approaches to achieve their higher accuracy and stability. However, most of the ensemble-based feature ranking methods require the true number of clusters. Furthermore, these algorithms evaluate the feature importance depending on the ensemble clustering solution, and they produce undesirable evaluation results if the clustering solutions are inaccurate. To address these limitations, we proposed an ensemble-based feature ranking method with random subspace and multiple-k ensemble (FRRM). The proposed FRRM algorithm evaluates the importance of each feature with the random subspace ensemble, and all evaluation results are combined with the ensemble importance scores. Moreover, FRRM does not require the determination of the true number of clusters in advance through the use of the multiple-k ensemble idea. Experiments on various benchmark datasets were conducted to examine the properties of the proposed FRRM algorithm and to compare its performance with that of existing feature ranking methods. The experimental results demonstrated that the proposed FRRM outperformed the competitors.

Keywords: clustering analysis, multiple-k ensemble, random subspace-based feature evaluation, unsupervised feature ranking

Procedia PDF Downloads 315
253 Hyper Parameter Optimization of Deep Convolutional Neural Networks for Pavement Distress Classification

Authors: Oumaima Khlifati, Khadija Baba

Abstract:

Pavement distress is the main factor responsible for the deterioration of road structure durability, damage vehicles, and driver comfort. Transportation agencies spend a high proportion of their funds on pavement monitoring and maintenance. The auscultation of pavement distress was based on the manual survey, which was extremely time consuming, labor intensive, and required domain expertise. Therefore, the automatic distress detection is needed to reduce the cost of manual inspection and avoid more serious damage by implementing the appropriate remediation actions at the right time. Inspired by recent deep learning applications, this paper proposes an algorithm for automatic road distress detection and classification using on the Deep Convolutional Neural Network (DCNN). In this study, the types of pavement distress are classified as transverse or longitudinal cracking, alligator, pothole, and intact pavement. The dataset used in this work is composed of public asphalt pavement images. In order to learn the structure of the different type of distress, the DCNN models are trained and tested as a multi-label classification task. In addition, to get the highest accuracy for our model, we adjust the structural optimization hyper parameters such as the number of convolutions and max pooling, filers, size of filters, loss functions, activation functions, and optimizer and fine-tuning hyper parameters that conclude batch size and learning rate. The optimization of the model is executed by checking all feasible combinations and selecting the best performing one. The model, after being optimized, performance metrics is calculated, which describe the training and validation accuracies, precision, recall, and F1 score.

Keywords: distress pavement, hyperparameters, automatic classification, deep learning

Procedia PDF Downloads 66
252 Alternating Expectation-Maximization Algorithm for a Bilinear Model in Isoform Quantification from RNA-Seq Data

Authors: Wenjiang Deng, Tian Mou, Yudi Pawitan, Trung Nghia Vu

Abstract:

Estimation of isoform-level gene expression from RNA-seq data depends on simplifying assumptions, such as uniform reads distribution, that are easily violated in real data. Such violations typically lead to biased estimates. Most existing methods provide a bias correction step(s), which is based on biological considerations, such as GC content–and applied in single samples separately. The main problem is that not all biases are known. For example, new technologies such as single-cell RNA-seq (scRNA-seq) may introduce new sources of bias not seen in bulk-cell data. This study introduces a method called XAEM based on a more flexible and robust statistical model. Existing methods are essentially based on a linear model Xβ, where the design matrix X is known and derived based on the simplifying assumptions. In contrast, XAEM considers Xβ as a bilinear model with both X and β unknown. Joint estimation of X and β is made possible by simultaneous analysis of multi-sample RNA-seq data. Compared to existing methods, XAEM automatically performs empirical correction of potentially unknown biases. XAEM implements an alternating expectation-maximization (AEM) algorithm, alternating between estimation of X and β. For speed XAEM utilizes quasi-mapping for read alignment, thus leading to a fast algorithm. Overall XAEM performs favorably compared to other recent advanced methods. For simulated datasets, XAEM obtains higher accuracy for multiple-isoform genes, particularly for paralogs. In a differential-expression analysis of a real scRNA-seq dataset, XAEM achieves substantially greater rediscovery rates in an independent validation set.

Keywords: alternating EM algorithm, bias correction, bilinear model, gene expression, RNA-seq

Procedia PDF Downloads 129
251 A Multi-Output Network with U-Net Enhanced Class Activation Map and Robust Classification Performance for Medical Imaging Analysis

Authors: Jaiden Xuan Schraut, Leon Liu, Yiqiao Yin

Abstract:

Computer vision in medical diagnosis has achieved a high level of success in diagnosing diseases with high accuracy. However, conventional classifiers that produce an image to-label result provides insufficient information for medical professionals to judge and raise concerns over the trust and reliability of a model with results that cannot be explained. In order to gain local insight into cancerous regions, separate tasks such as imaging segmentation need to be implemented to aid the doctors in treating patients, which doubles the training time and costs which renders the diagnosis system inefficient and difficult to be accepted by the public. To tackle this issue and drive AI-first medical solutions further, this paper proposes a multi-output network that follows a U-Net architecture for image segmentation output and features an additional convolutional neural networks (CNN) module for auxiliary classification output. Class activation maps are a method of providing insight into a convolutional neural network’s feature maps that leads to its classification but in the case of lung diseases, the region of interest is enhanced by U-net-assisted Class Activation Map (CAM) visualization. Therefore, our proposed model combines image segmentation models and classifiers to crop out only the lung region of a chest X-ray’s class activation map to provide a visualization that improves the explainability and is able to generate classification results simultaneously which builds trust for AI-led diagnosis systems. The proposed U-Net model achieves 97.61% accuracy and a dice coefficient of 0.97 on testing data from the COVID-QU-Ex Dataset which includes both diseased and healthy lungs.

Keywords: multi-output network model, U-net, class activation map, image classification, medical imaging analysis

Procedia PDF Downloads 176
250 Recent Climate Variability and Crop Production in the Central Highlands of Ethiopia

Authors: Arragaw Alemayehu, Woldeamlak Bewket

Abstract:

The aim of this study was to understand the influence of current climate variability on crop production in the central highlands of Ethiopia. We used monthly rainfall and temperature data from 132 points each representing a pixel of 10×10 km. The data are reconstructions based on station records and meteorological satellite observations. Production data of the five major crops in the area were collected from the Central Statistical Agency for the period 2004-2013 and for the main cropping season, locally known as Meher. The production data are at the Enumeration Area (EA ) level and hence the best available dataset on crop production. The results show statistically significant decreasing trends in March–May (Belg) rainfall in the area. However, June – September (Kiremt) rainfall showed increasing trends in Efratana Gidim and Menz Gera Meder which the latter is statistically significant. Annual rainfall also showed positive trends in the area except Basona Werana where significant negative trends were observed. On the other hand, maximum and minimum temperatures showed warming trends in the study area. Correlation results have shown that crop production and area of cultivation have positive correlation with rainfall, and negative with temperature. When the trends in crop production are investigated, most crops showed negative trends and below average production was observed. Regression results have shown that rainfall was the most important determinant of crop production in the area. It is concluded that current climate variability has a significant influence on crop production in the area and any unfavorable change in the local climate in the future will have serious implications for household level food security. Efforts to adapt to the ongoing climate change should begin from tackling the current climate variability and take a climate risk management approach.

Keywords: central highlands, climate variability, crop production, Ethiopia, regression, trend

Procedia PDF Downloads 418