Search results for: categorical datasets
808 A Comparison of Caesarean Section Indications and Characteristics in 2009 and 2020 in a Saudi Tertiary Hospital
Authors: Sarah K. Basudan, Ragad I. Al Jazzar, Zeinah Sulaihim, Hanan M. Al-Kadri
Abstract:
Background: Cesarean section has been increasing in recent years, with a wide range of etiologies contributing to this rise. This study aimed to assess the indications, outcomes, and complications in Riyadh, Saudi Arabia. Methods: A Retrospective Cohort study was conducted at King Abdulaziz medical city. The study includes two cohorts: G1 (2009) and G2 (2020) groups who met the inclusion criteria. The data was transferred to the SPSS (statistical package for social sciences) version 24 for analysis. The initial descriptive statistics were run for all variables, including numerical and categorical data. The numerical data were reported as median, and standard deviation and categorical data were reported as frequencies and percentages. Results: The data were collected from 399 women who were divided into two groups, G1(199) and G2(200). The mean age of all participants is 32+-6; G1 and G2 had significant differences in age means with 30+-6 and 34+-5, respectively, with a p-value of <0.001, which indicates delayed fertility by four years. Moreover, a breech presentation was less likely to occur in G2 (OR 0.64, CI: 0.21-0.62. P<0.001). Nonetheless, maternal causes such as repeated C-sections and maternal medical conditions were more likely to happen in G2 (OR 1.5, CI: 1.04-2.38, p=0.03) and (OR 5.4, CI: 1.12-23.9, P=0.01), respectively. Furthermore, postpartum hemorrhage showed an increase of 12% in G2 (OR 5.4, CI: 2.2-13.4, p<0.001). G2 was more likely to be admitted to the neonatal intensive care unit (NICU) (OR 16, CI: 7.4-38.7) and to special care baby (SCB) (OR 7.2, CI: 3.9-13.1), both with a p-value<0.001 compared to regular nursery admission. Conclusion: There are multiple factors that are contributing to the increase in c section rate in a Saudi tertiary hospitals. The factors were suggested to be previous c-sections, abnormal fetal heart rate, malpresentation, and maternal or fetal medical conditions.Keywords: cesarean sections, maternal indications, maternal complications, neonatal condition
Procedia PDF Downloads 88807 Dissimilarity Measure for General Histogram Data and Its Application to Hierarchical Clustering
Authors: K. Umbleja, M. Ichino
Abstract:
Symbolic data mining has been developed to analyze data in very large datasets. It is also useful in cases when entry specific details should remain hidden. Symbolic data mining is quickly gaining popularity as datasets in need of analyzing are becoming ever larger. One type of such symbolic data is a histogram, which enables to save huge amounts of information into a single variable with high-level of granularity. Other types of symbolic data can also be described in histograms, therefore making histogram a very important and general symbolic data type - a method developed for histograms - can also be applied to other types of symbolic data. Due to its complex structure, analyzing histograms is complicated. This paper proposes a method, which allows to compare two histogram-valued variables and therefore find a dissimilarity between two histograms. Proposed method uses the Ichino-Yaguchi dissimilarity measure for mixed feature-type data analysis as a base and develops a dissimilarity measure specifically for histogram data, which allows to compare histograms with different number of bins and bin widths (so called general histogram). Proposed dissimilarity measure is then used as a measure for clustering. Furthermore, linkage method based on weighted averages is proposed with the concept of cluster compactness to measure the quality of clustering. The method is then validated with application on real datasets. As a result, the proposed dissimilarity measure is found producing adequate and comparable results with general histograms without the loss of detail or need to transform the data.Keywords: dissimilarity measure, hierarchical clustering, histograms, symbolic data analysis
Procedia PDF Downloads 162806 Distances over Incomplete Diabetes and Breast Cancer Data Based on Bhattacharyya Distance
Authors: Loai AbdAllah, Mahmoud Kaiyal
Abstract:
Missing values in real-world datasets are a common problem. Many algorithms were developed to deal with this problem, most of them replace the missing values with a fixed value that was computed based on the observed values. In our work, we used a distance function based on Bhattacharyya distance to measure the distance between objects with missing values. Bhattacharyya distance, which measures the similarity of two probability distributions. The proposed distance distinguishes between known and unknown values. Where the distance between two known values is the Mahalanobis distance. When, on the other hand, one of them is missing the distance is computed based on the distribution of the known values, for the coordinate that contains the missing value. This method was integrated with Wikaya, a digital health company developing a platform that helps to improve prevention of chronic diseases such as diabetes and cancer. In order for Wikaya’s recommendation system to work distance between users need to be measured. Since there are missing values in the collected data, there is a need to develop a distance function distances between incomplete users profiles. To evaluate the accuracy of the proposed distance function in reflecting the actual similarity between different objects, when some of them contain missing values, we integrated it within the framework of k nearest neighbors (kNN) classifier, since its computation is based only on the similarity between objects. To validate this, we ran the algorithm over diabetes and breast cancer datasets, standard benchmark datasets from the UCI repository. Our experiments show that kNN classifier using our proposed distance function outperforms the kNN using other existing methods.Keywords: missing values, incomplete data, distance, incomplete diabetes data
Procedia PDF Downloads 225805 Detecting Memory-Related Gene Modules in sc/snRNA-seq Data by Deep-Learning
Authors: Yong Chen
Abstract:
To understand the detailed molecular mechanisms of memory formation in engram cells is one of the most fundamental questions in neuroscience. Recent single-cell RNA-seq (scRNA-seq) and single-nucleus RNA-seq (snRNA-seq) techniques have allowed us to explore the sparsely activated engram ensembles, enabling access to the molecular mechanisms that underlie experience-dependent memory formation and consolidation. However, the absence of specific and powerful computational methods to detect memory-related genes (modules) and their regulatory relationships in the sc/snRNA-seq datasets has strictly limited the analysis of underlying mechanisms and memory coding principles in mammalian brains. Here, we present a deep-learning method named SCENTBOX, to detect memory-related gene modules and causal regulatory relationships among themfromsc/snRNA-seq datasets. SCENTBOX first constructs codifferential expression gene network (CEGN) from case versus control sc/snRNA-seq datasets. It then detects the highly correlated modules of differential expression genes (DEGs) in CEGN. The deep network embedding and attention-based convolutional neural network strategies are employed to precisely detect regulatory relationships among DEG genes in a module. We applied them on scRNA-seq datasets of TRAP; Ai14 mouse neurons with fear memory and detected not only known memory-related genes, but also the modules and potential causal regulations. Our results provided novel regulations within an interesting module, including Arc, Bdnf, Creb, Dusp1, Rgs4, and Btg2. Overall, our methods provide a general computational tool for processing sc/snRNA-seq data from case versus control studie and a systematic investigation of fear-memory-related gene modules.Keywords: sc/snRNA-seq, memory formation, deep learning, gene module, causal inference
Procedia PDF Downloads 120804 A Pipeline for Detecting Copy Number Variation from Whole Exome Sequencing Using Comprehensive Tools
Authors: Cheng-Yang Lee, Petrus Tang, Tzu-Hao Chang
Abstract:
Copy number variations (CNVs) have played an important role in many kinds of human diseases, such as Autism, Schizophrenia and a number of cancers. Many diseases are found in genome coding regions and whole exome sequencing (WES) is a cost-effective and powerful technology in detecting variants that are enriched in exons and have potential applications in clinical setting. Although several algorithms have been developed to detect CNVs using WES and compared with other algorithms for finding the most suitable methods using their own samples, there were not consistent datasets across most of algorithms to evaluate the ability of CNV detection. On the other hand, most of algorithms is using command line interface that may greatly limit the analysis capability of many laboratories. We create a series of simulated WES datasets from UCSC hg19 chromosome 22, and then evaluate the CNV detective ability of 19 algorithms from OMICtools database using our simulated WES datasets. We compute the sensitivity, specificity and accuracy in each algorithm for validation of the exome-derived CNVs. After comparison of 19 algorithms from OMICtools database, we construct a platform to install all of the algorithms in a virtual machine like VirtualBox which can be established conveniently in local computers, and then create a simple script that can be easily to use for detecting CNVs using algorithms selected by users. We also build a table to elaborate on many kinds of events, such as input requirement, CNV detective ability, for all of the algorithms that can provide users a specification to choose optimum algorithms.Keywords: whole exome sequencing, copy number variations, omictools, pipeline
Procedia PDF Downloads 319803 Chemical Life Cycle Alternative Assessment as a Green Chemical Substitution Framework: A Feasibility Study
Authors: Sami Ayad, Mengshan Lee
Abstract:
The Sustainable Development Goals (SDGs) were designed to be the best possible blueprint to achieve peace, prosperity, and overall, a better and more sustainable future for the Earth and all its people, and such a blueprint is needed more than ever. The SDGs face many hurdles that will prevent them from becoming a reality, one of such hurdles, arguably, is the chemical pollution and unintended chemical impacts generated through the production of various goods and resources that we consume. Chemical Alternatives Assessment has proven to be a viable solution for chemical pollution management in terms of filtering out hazardous chemicals for a greener alternative. However, the current substitution practice lacks crucial quantitative datasets (exposures and life cycle impacts) to ensure no unintended trade-offs occur in the substitution process. A Chemical Life Cycle Alternative Assessment (CLiCAA) framework is proposed as a reliable and replicable alternative to Life Cycle Based Alternative Assessment (LCAA) as it integrates chemical molecular structure analysis and Chemical Life Cycle Collaborative (CLiCC) web-based tool to fill in data gaps that the former frameworks suffer from. The CLiCAA framework consists of a four filtering layers, the first two being mandatory, with the final two being optional assessment and data extrapolation steps. Each layer includes relevant impact categories of each chemical, ranging from human to environmental impacts, that will be assessed and aggregated into unique scores for overall comparable results, with little to no data. A feasibility study will demonstrate the efficiency and accuracy of CLiCAA whilst bridging both cancer potency and exposure limit data, hoping to provide the necessary categorical impact information for every firm possible, especially those disadvantaged in terms of research and resource management.Keywords: chemical alternative assessment, LCA, LCAA, CLiCC, CLiCAA, chemical substitution framework, cancer potency data, chemical molecular structure analysis
Procedia PDF Downloads 92802 Off-Topic Text Detection System Using a Hybrid Model
Authors: Usama Shahid
Abstract:
Be it written documents, news columns, or students' essays, verifying the content can be a time-consuming task. Apart from the spelling and grammar mistakes, the proofreader is also supposed to verify whether the content included in the essay or document is relevant or not. The irrelevant content in any document or essay is referred to as off-topic text and in this paper, we will address the problem of off-topic text detection from a document using machine learning techniques. Our study aims to identify the off-topic content from a document using Echo state network model and we will also compare data with other models. The previous study uses Convolutional Neural Networks and TFIDF to detect off-topic text. We will rearrange the existing datasets and take new classifiers along with new word embeddings and implement them on existing and new datasets in order to compare the results with the previously existing CNN model.Keywords: off topic, text detection, eco state network, machine learning
Procedia PDF Downloads 85801 Training a Neural Network Using Input Dropout with Aggressive Reweighting (IDAR) on Datasets with Many Useless Features
Authors: Stylianos Kampakis
Abstract:
This paper presents a new algorithm for neural networks called “Input Dropout with Aggressive Re-weighting” (IDAR) aimed specifically at datasets with many useless features. IDAR combines two techniques (dropout of input neurons and aggressive re weighting) in order to eliminate the influence of noisy features. The technique can be seen as a generalization of dropout. The algorithm is tested on two different benchmark data sets: a noisy version of the iris dataset and the MADELON data set. Its performance is compared against three other popular techniques for dealing with useless features: L2 regularization, LASSO and random forests. The results demonstrate that IDAR can be an effective technique for handling data sets with many useless features.Keywords: neural networks, feature selection, regularization, aggressive reweighting
Procedia PDF Downloads 455800 Machine Learning Model to Predict TB Bacteria-Resistant Drugs from TB Isolates
Authors: Rosa Tsegaye Aga, Xuan Jiang, Pavel Vazquez Faci, Siqing Liu, Simon Rayner, Endalkachew Alemu, Markos Abebe
Abstract:
Tuberculosis (TB) is a major cause of disease globally. In most cases, TB is treatable and curable, but only with the proper treatment. There is a time when drug-resistant TB occurs when bacteria become resistant to the drugs that are used to treat TB. Current strategies to identify drug-resistant TB bacteria are laboratory-based, and it takes a longer time to identify the drug-resistant bacteria and treat the patient accordingly. But machine learning (ML) and data science approaches can offer new approaches to the problem. In this study, we propose to develop an ML-based model to predict the antibiotic resistance phenotypes of TB isolates in minutes and give the right treatment to the patient immediately. The study has been using the whole genome sequence (WGS) of TB isolates as training data that have been extracted from the NCBI repository and contain different countries’ samples to build the ML models. The reason that different countries’ samples have been included is to generalize the large group of TB isolates from different regions in the world. This supports the model to train different behaviors of the TB bacteria and makes the model robust. The model training has been considering three pieces of information that have been extracted from the WGS data to train the model. These are all variants that have been found within the candidate genes (F1), predetermined resistance-associated variants (F2), and only resistance-associated gene information for the particular drug. Two major datasets have been constructed using these three information. F1 and F2 information have been considered as two independent datasets, and the third information is used as a class to label the two datasets. Five machine learning algorithms have been considered to train the model. These are Support Vector Machine (SVM), Random forest (RF), Logistic regression (LR), Gradient Boosting, and Ada boost algorithms. The models have been trained on the datasets F1, F2, and F1F2 that is the F1 and the F2 dataset merged. Additionally, an ensemble approach has been used to train the model. The ensemble approach has been considered to run F1 and F2 datasets on gradient boosting algorithm and use the output as one dataset that is called F1F2 ensemble dataset and train a model using this dataset on the five algorithms. As the experiment shows, the ensemble approach model that has been trained on the Gradient Boosting algorithm outperformed the rest of the models. In conclusion, this study suggests the ensemble approach, that is, the RF + Gradient boosting model, to predict the antibiotic resistance phenotypes of TB isolates by outperforming the rest of the models.Keywords: machine learning, MTB, WGS, drug resistant TB
Procedia PDF Downloads 52799 Exploring Time-Series Phosphoproteomic Datasets in the Context of Network Models
Authors: Sandeep Kaur, Jenny Vuong, Marcel Julliard, Sean O'Donoghue
Abstract:
Time-series data are useful for modelling as they can enable model-evaluation. However, when reconstructing models from phosphoproteomic data, often non-exact methods are utilised, as the knowledge regarding the network structure, such as, which kinases and phosphatases lead to the observed phosphorylation state, is incomplete. Thus, such reactions are often hypothesised, which gives rise to uncertainty. Here, we propose a framework, implemented via a web-based tool (as an extension to Minardo), which given time-series phosphoproteomic datasets, can generate κ models. The incompleteness and uncertainty in the generated model and reactions are clearly presented to the user via the visual method. Furthermore, we demonstrate, via a toy EGF signalling model, the use of algorithmic verification to verify κ models. Manually formulated requirements were evaluated with regards to the model, leading to the highlighting of the nodes causing unsatisfiability (i.e. error causing nodes). We aim to integrate such methods into our web-based tool and demonstrate how the identified erroneous nodes can be presented to the user via the visual method. Thus, in this research we present a framework, to enable a user to explore phosphorylation proteomic time-series data in the context of models. The observer can visualise which reactions in the model are highly uncertain, and which nodes cause incorrect simulation outputs. A tool such as this enables an end-user to determine the empirical analysis to perform, to reduce uncertainty in the presented model - thus enabling a better understanding of the underlying system.Keywords: κ-models, model verification, time-series phosphoproteomic datasets, uncertainty and error visualisation
Procedia PDF Downloads 255798 Patient-Specific Modeling Algorithm for Medical Data Based on AUC
Authors: Guilherme Ribeiro, Alexandre Oliveira, Antonio Ferreira, Shyam Visweswaran, Gregory Cooper
Abstract:
Patient-specific models are instance-based learning algorithms that take advantage of the particular features of the patient case at hand to predict an outcome. We introduce two patient-specific algorithms based on decision tree paradigm that use AUC as a metric to select an attribute. We apply the patient specific algorithms to predict outcomes in several datasets, including medical datasets. Compared to the patient-specific decision path (PSDP) entropy-based and CART methods, the AUC-based patient-specific decision path models performed equivalently on area under the ROC curve (AUC). Our results provide support for patient-specific methods being a promising approach for making clinical predictions.Keywords: approach instance-based, area under the ROC curve, patient-specific decision path, clinical predictions
Procedia PDF Downloads 479797 Index t-SNE: Tracking Dynamics of High-Dimensional Datasets with Coherent Embeddings
Authors: Gaelle Candel, David Naccache
Abstract:
t-SNE is an embedding method that the data science community has widely used. It helps two main tasks: to display results by coloring items according to the item class or feature value; and for forensic, giving a first overview of the dataset distribution. Two interesting characteristics of t-SNE are the structure preservation property and the answer to the crowding problem, where all neighbors in high dimensional space cannot be represented correctly in low dimensional space. t-SNE preserves the local neighborhood, and similar items are nicely spaced by adjusting to the local density. These two characteristics produce a meaningful representation, where the cluster area is proportional to its size in number, and relationships between clusters are materialized by closeness on the embedding. This algorithm is non-parametric. The transformation from a high to low dimensional space is described but not learned. Two initializations of the algorithm would lead to two different embeddings. In a forensic approach, analysts would like to compare two or more datasets using their embedding. A naive approach would be to embed all datasets together. However, this process is costly as the complexity of t-SNE is quadratic and would be infeasible for too many datasets. Another approach would be to learn a parametric model over an embedding built with a subset of data. While this approach is highly scalable, points could be mapped at the same exact position, making them indistinguishable. This type of model would be unable to adapt to new outliers nor concept drift. This paper presents a methodology to reuse an embedding to create a new one, where cluster positions are preserved. The optimization process minimizes two costs, one relative to the embedding shape and the second relative to the support embedding’ match. The embedding with the support process can be repeated more than once, with the newly obtained embedding. The successive embedding can be used to study the impact of one variable over the dataset distribution or monitor changes over time. This method has the same complexity as t-SNE per embedding, and memory requirements are only doubled. For a dataset of n elements sorted and split into k subsets, the total embedding complexity would be reduced from O(n²) to O(n²=k), and the memory requirement from n² to 2(n=k)², which enables computation on recent laptops. The method showed promising results on a real-world dataset, allowing to observe the birth, evolution, and death of clusters. The proposed approach facilitates identifying significant trends and changes, which empowers the monitoring high dimensional datasets’ dynamics.Keywords: concept drift, data visualization, dimension reduction, embedding, monitoring, reusability, t-SNE, unsupervised learning
Procedia PDF Downloads 144796 Text Emotion Recognition by Multi-Head Attention based Bidirectional LSTM Utilizing Multi-Level Classification
Authors: Vishwanath Pethri Kamath, Jayantha Gowda Sarapanahalli, Vishal Mishra, Siddhesh Balwant Bandgar
Abstract:
Recognition of emotional information is essential in any form of communication. Growing HCI (Human-Computer Interaction) in recent times indicates the importance of understanding of emotions expressed and becomes crucial for improving the system or the interaction itself. In this research work, textual data for emotion recognition is used. The text being the least expressive amongst the multimodal resources poses various challenges such as contextual information and also sequential nature of the language construction. In this research work, the proposal is made for a neural architecture to resolve not less than 8 emotions from textual data sources derived from multiple datasets using google pre-trained word2vec word embeddings and a Multi-head attention-based bidirectional LSTM model with a one-vs-all Multi-Level Classification. The emotions targeted in this research are Anger, Disgust, Fear, Guilt, Joy, Sadness, Shame, and Surprise. Textual data from multiple datasets were used for this research work such as ISEAR, Go Emotions, Affect datasets for creating the emotions’ dataset. Data samples overlap or conflicts were considered with careful preprocessing. Our results show a significant improvement with the modeling architecture and as good as 10 points improvement in recognizing some emotions.Keywords: text emotion recognition, bidirectional LSTM, multi-head attention, multi-level classification, google word2vec word embeddings
Procedia PDF Downloads 174795 Use of Interpretable Evolved Search Query Classifiers for Sinhala Documents
Authors: Prasanna Haddela
Abstract:
Document analysis is a well matured yet still active research field, partly as a result of the intricate nature of building computational tools but also due to the inherent problems arising from the variety and complexity of human languages. Breaking down language barriers is vital in enabling access to a number of recent technologies. This paper investigates the application of document classification methods to new Sinhalese datasets. This language is geographically isolated and rich with many of its own unique features. We will examine the interpretability of the classification models with a particular focus on the use of evolved Lucene search queries generated using a Genetic Algorithm (GA) as a method of document classification. We will compare the accuracy and interpretability of these search queries with other popular classifiers. The results are promising and are roughly in line with previous work on English language datasets.Keywords: evolved search queries, Sinhala document classification, Lucene Sinhala analyzer, interpretable text classification, genetic algorithm
Procedia PDF Downloads 114794 IoT-Based Early Identification of Guava (Psidium guajava) Leaves and Fruits Diseases
Authors: Daudi S. Simbeye, Mbazingwa E. Mkiramweni
Abstract:
Plant diseases have the potential to drastically diminish the quantity and quality of agricultural products. Guava (Psidium guajava), sometimes known as the apple of the tropics, is one of the most widely cultivated fruits in tropical regions. Monitoring plant health and diagnosing illnesses is an essential matter for sustainable agriculture, requiring the inspection of visually evident patterns on plant leaves and fruits. Due to minor variations in the symptoms of various guava illnesses, a professional opinion is required for disease diagnosis. Due to improper pesticide application by farmers, erroneous diagnoses may result in economic losses. This study proposes a method that uses artificial intelligence (AI) to detect and classify the most widespread guava plant by comparing images of its leaves and fruits to datasets. ESP32 CAM is responsible for data collection, which includes images of guava leaves and fruits. By comparing the datasets, these image formats are used as datasets to help in the diagnosis of plant diseases through the leaves and fruits, which is vital for the development of an effective automated agricultural system. The system test yielded the most accurate identification findings (99 percent accuracy in differentiating four guava fruit diseases (Canker, Mummification, Dot, and Rust) from healthy fruit). The proposed model has been interfaced with a mobile application to be used by smartphones to make a quick and responsible judgment, which can help the farmers instantly detect and prevent future production losses by enabling them to take precautions beforehand.Keywords: early identification, guava plants, fruit diseases, deep learning
Procedia PDF Downloads 76793 Exploring the Applications of Neural Networks in the Adaptive Learning Environment
Authors: Baladitya Swaika, Rahul Khatry
Abstract:
Computer Adaptive Tests (CATs) is one of the most efficient ways for testing the cognitive abilities of students. CATs are based on Item Response Theory (IRT) which is based on item selection and ability estimation using statistical methods of maximum information selection/selection from posterior and maximum-likelihood (ML)/maximum a posteriori (MAP) estimators respectively. This study aims at combining both classical and Bayesian approaches to IRT to create a dataset which is then fed to a neural network which automates the process of ability estimation and then comparing it to traditional CAT models designed using IRT. This study uses python as the base coding language, pymc for statistical modelling of the IRT and scikit-learn for neural network implementations. On creation of the model and on comparison, it is found that the Neural Network based model performs 7-10% worse than the IRT model for score estimations. Although performing poorly, compared to the IRT model, the neural network model can be beneficially used in back-ends for reducing time complexity as the IRT model would have to re-calculate the ability every-time it gets a request whereas the prediction from a neural network could be done in a single step for an existing trained Regressor. This study also proposes a new kind of framework whereby the neural network model could be used to incorporate feature sets, other than the normal IRT feature set and use a neural network’s capacity of learning unknown functions to give rise to better CAT models. Categorical features like test type, etc. could be learnt and incorporated in IRT functions with the help of techniques like logistic regression and can be used to learn functions and expressed as models which may not be trivial to be expressed via equations. This kind of a framework, when implemented would be highly advantageous in psychometrics and cognitive assessments. This study gives a brief overview as to how neural networks can be used in adaptive testing, not only by reducing time-complexity but also by being able to incorporate newer and better datasets which would eventually lead to higher quality testing.Keywords: computer adaptive tests, item response theory, machine learning, neural networks
Procedia PDF Downloads 175792 Machine Learning Application in Shovel Maintenance
Authors: Amir Taghizadeh Vahed, Adithya Thaduri
Abstract:
Shovels are the main components in the mining transportation system. The productivity of the mines depends on the availability of shovels due to its high capital and operating costs. The unplanned failure/shutdowns of a shovel results in higher repair costs, increase in downtime, as well as increasing indirect cost (i.e. loss of production and company’s reputation). In order to mitigate these failures, predictive maintenance can be useful approach using failure prediction. The modern mining machinery or shovels collect huge datasets automatically; it consists of reliability and maintenance data. However, the gathered datasets are useless until the information and knowledge of data are extracted. Machine learning as well as data mining, which has a major role in recent studies, has been used for the knowledge discovery process. In this study, data mining and machine learning approaches are implemented to detect not only anomalies but also patterns from a dataset and further detection of failures.Keywords: maintenance, machine learning, shovel, conditional based monitoring
Procedia PDF Downloads 218791 Image Instance Segmentation Using Modified Mask R-CNN
Authors: Avatharam Ganivada, Krishna Shah
Abstract:
The Mask R-CNN is recently introduced by the team of Facebook AI Research (FAIR), which is mainly concerned with instance segmentation in images. Here, the Mask R-CNN is based on ResNet and feature pyramid network (FPN), where a single dropout method is employed. This paper provides a modified Mask R-CNN by adding multiple dropout methods into the Mask R-CNN. The proposed model has also utilized the concepts of Resnet and FPN to extract stage-wise network feature maps, wherein a top-down network path having lateral connections is used to obtain semantically strong features. The proposed model produces three outputs for each object in the image: class label, bounding box coordinates, and object mask. The performance of the proposed network is evaluated in the segmentation of every instance in images using COCO and cityscape datasets. The proposed model achieves better performance than the state-of-the-networks for the datasets.Keywords: instance segmentation, object detection, convolutional neural networks, deep learning, computer vision
Procedia PDF Downloads 72790 Wireless Sensor Anomaly Detection Using Soft Computing
Authors: Mouhammd Alkasassbeh, Alaa Lasasmeh
Abstract:
We live in an era of rapid development as a result of significant scientific growth. Like other technologies, wireless sensor networks (WSNs) are playing one of the main roles. Based on WSNs, ZigBee adds many features to devices, such as minimum cost and power consumption, and increasing the range and connect ability of sensor nodes. ZigBee technology has come to be used in various fields, including science, engineering, and networks, and even in medicinal aspects of intelligence building. In this work, we generated two main datasets, the first being based on tree topology and the second on star topology. The datasets were evaluated by three machine learning (ML) algorithms: J48, meta.j48 and multilayer perceptron (MLP). Each topology was classified into normal and abnormal (attack) network traffic. The dataset used in our work contained simulated data from network simulation 2 (NS2). In each database, the Bayesian network meta.j48 classifier achieved the highest accuracy level among other classifiers, of 99.7% and 99.2% respectively.Keywords: IDS, Machine learning, WSN, ZigBee technology
Procedia PDF Downloads 543789 Acute Hepatitis A Outbreak in Men Who Has Sex with Men in a Medical Center in Northern Taiwan
Authors: Yu-Tzu Hsu, Alice Wu, Hsiang-Kuang Tseng
Abstract:
Introduction: Hepatitis A virus causes acute hepatitis and is usually transmitted by a fecal-oral route of food contamination, which is more prevalent in areas with poor hygienic practices. However, we described a hepatitis A outbreak associated with a fecal-oral route of sexual behavior in men who has sex with men (MSM) in Northern Taiwan. Methods: We retrospectively collected patients with acute HAV infection in MacKay Memorial Hospital, Taipei, Taiwan between July 2015 and November 2016. Demographic data (age, gender, onset time and infection risk), laboratory data (GOT, GPT, bilirubin, HIV status, HBsAg, HCV antibody and syphilis), clinical symptoms and travel history with a foreign tour were analyzed. We compared variables between HIV and non-HIV group. Unless otherwise stated, continuous variables were expressed as mean ± SD, and categorical variables were expressed as number (percentage) for each item. The t test for continuous variables was applied for the comparison between two groups and chi-square for categorical variables were applied for measures of association. Results: We collected 80 cases during the study period. Among them, 54 (67.5%) cases were MSM and 43 (53.8%) cases were HIV positive. The average age was 32.6±7.59 years-old. The average value of initial liver function was 1324 IU/L for AST (GOT), 2100 IU/L for ALT (GPT), and 5.82 mg/dL for bilirubin. We found seven (8.6%) cases were in the status of HBV carrier, five (6.3%) cases were positive for HCV antibody, and 15 (18.6%) cases were co-infected with syphilis. With regards to associated symptoms, 32 (40%) had fever, 46 (57.5%) had nausea, 34 (42.5%) had abdominal discomfort and 46 (57.5%) had general malaise. To compare the non-HIV patients with HIV patients, HIV patients were more likely to be male (p=0.008), MSM (p=0.000), co-infected syphilis (p=0.000) and slowly improving liver function of transaminases (p=0.033, 0.027). Conclusion: The HAV outbreak in Northern Taiwan was mainly occurred in MSM population. Hereafter, our cohort data support a policy in Taiwan to provide one dose of free HAV vaccine shot in this population. Hopefully, the outbreak could be stop by the free vaccine policy and public education.Keywords: acute hepatitis A, men who has sex with men, human immunodeficiency virus, vaccine
Procedia PDF Downloads 203788 Knowledge Graph Development to Connect Earth Metadata and Standard English Queries
Authors: Gabriel Montague, Max Vilgalys, Catherine H. Crawford, Jorge Ortiz, Dava Newman
Abstract:
There has never been so much publicly accessible atmospheric and environmental data. The possibilities of these data are exciting, but the sheer volume of available datasets represents a new challenge for researchers. The task of identifying and working with a new dataset has become more difficult with the amount and variety of available data. Datasets are often documented in ways that differ substantially from the common English used to describe the same topics. This presents a barrier not only for new scientists, but for researchers looking to find comparisons across multiple datasets or specialists from other disciplines hoping to collaborate. This paper proposes a method for addressing this obstacle: creating a knowledge graph to bridge the gap between everyday English language and the technical language surrounding these datasets. Knowledge graph generation is already a well-established field, although there are some unique challenges posed by working with Earth data. One is the sheer size of the databases – it would be infeasible to replicate or analyze all the data stored by an organization like The National Aeronautics and Space Administration (NASA) or the European Space Agency. Instead, this approach identifies topics from metadata available for datasets in NASA’s Earthdata database, which can then be used to directly request and access the raw data from NASA. By starting with a single metadata standard, this paper establishes an approach that can be generalized to different databases, but leaves the challenge of metadata harmonization for future work. Topics generated from the metadata are then linked to topics from a collection of English queries through a variety of standard and custom natural language processing (NLP) methods. The results from this method are then compared to a baseline of elastic search applied to the metadata. This comparison shows the benefits of the proposed knowledge graph system over existing methods, particularly in interpreting natural language queries and interpreting topics in metadata. For the research community, this work introduces an application of NLP to the ecological and environmental sciences, expanding the possibilities of how machine learning can be applied in this discipline. But perhaps more importantly, it establishes the foundation for a platform that can enable common English to access knowledge that previously required considerable effort and experience. By making this public data accessible to the full public, this work has the potential to transform environmental understanding, engagement, and action.Keywords: earth metadata, knowledge graphs, natural language processing, question-answer systems
Procedia PDF Downloads 148787 Drone Classification Using Classification Methods Using Conventional Model With Embedded Audio-Visual Features
Authors: Hrishi Rakshit, Pooneh Bagheri Zadeh
Abstract:
This paper investigates the performance of drone classification methods using conventional DCNN with different hyperparameters, when additional drone audio data is embedded in the dataset for training and further classification. In this paper, first a custom dataset is created using different images of drones from University of South California (USC) datasets and Leeds Beckett university datasets with embedded drone audio signal. The three well-known DCNN architectures namely, Resnet50, Darknet53 and Shufflenet are employed over the created dataset tuning their hyperparameters such as, learning rates, maximum epochs, Mini Batch size with different optimizers. Precision-Recall curves and F1 Scores-Threshold curves are used to evaluate the performance of the named classification algorithms. Experimental results show that Resnet50 has the highest efficiency compared to other DCNN methods.Keywords: drone classifications, deep convolutional neural network, hyperparameters, drone audio signal
Procedia PDF Downloads 104786 Generating Music with More Refined Emotions
Authors: Shao-Di Feng, Von-Wun Soo
Abstract:
To generate symbolic music with specific emotions is a challenging task due to symbolic music datasets that have emotion labels are scarce and incomplete. This research aims to generate more refined emotions based on the training datasets that are only labeled with four quadrants in Russel’s 2D emotion model. We focus on the theory of Music Fadernet and map arousal and valence to the low-level attributes, and build a symbolic music generation model by combining transformer and GM-VAE. We adopt an in-attention mechanism for the model and improve it by allowing modulation by conditional information. And we show the music generation model could control the generation of music according to the emotions specified by users in terms of high-level linguistic expression and by manipulating their corresponding low-level musical attributes. Finally, we evaluate the model performance using a pre-trained emotion classifier against a pop piano midi dataset called EMOPIA, and by subjective listening evaluation, we demonstrate that the model could generate music with more refined emotions correctly.Keywords: music generation, music emotion controlling, deep learning, semi-supervised learning
Procedia PDF Downloads 89785 Unsupervised Domain Adaptive Text Retrieval with Query Generation
Authors: Rui Yin, Haojie Wang, Xun Li
Abstract:
Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.Keywords: dense retrieval, query generation, unsupervised training, text retrieval
Procedia PDF Downloads 73784 Content-Aware Image Augmentation for Medical Imaging Applications
Authors: Filip Rusak, Yulia Arzhaeva, Dadong Wang
Abstract:
Machine learning based Computer-Aided Diagnosis (CAD) is gaining much popularity in medical imaging and diagnostic radiology. However, it requires a large amount of high quality and labeled training image datasets. The training images may come from different sources and be acquired from different radiography machines produced by different manufacturers, digital or digitized copies of film radiographs, with various sizes as well as different pixel intensity distributions. In this paper, a content-aware image augmentation method is presented to deal with these variations. The results of the proposed method have been validated graphically by plotting the removed and added seams of pixels on original images. Two different chest X-ray (CXR) datasets are used in the experiments. The CXRs in the datasets defer in size, some are digital CXRs while the others are digitized from analog CXR films. With the proposed content-aware augmentation method, the Seam Carving algorithm is employed to resize CXRs and the corresponding labels in the form of image masks, followed by histogram matching used to normalize the pixel intensities of digital radiography, based on the pixel intensity values of digitized radiographs. We implemented the algorithms, resized the well-known Montgomery dataset, to the size of the most frequently used Japanese Society of Radiological Technology (JSRT) dataset and normalized our digital CXRs for testing. This work resulted in the unified off-the-shelf CXR dataset composed of radiographs included in both, Montgomery and JSRT datasets. The experimental results show that even though the amount of augmentation is large, our algorithm can preserve the important information in lung fields, local structures, and global visual effect adequately. The proposed method can be used to augment training and testing image data sets so that the trained machine learning model can be used to process CXRs from various sources, and it can be potentially used broadly in any medical imaging applications.Keywords: computer-aided diagnosis, image augmentation, lung segmentation, medical imaging, seam carving
Procedia PDF Downloads 222783 Detection of COVID-19 Cases From X-Ray Images Using Capsule-Based Network
Authors: Donya Ashtiani Haghighi, Amirali Baniasadi
Abstract:
Coronavirus (COVID-19) disease has spread abruptly all over the world since the end of 2019. Computed tomography (CT) scans and X-ray images are used to detect this disease. Different Deep Neural Network (DNN)-based diagnosis solutions have been developed, mainly based on Convolutional Neural Networks (CNNs), to accelerate the identification of COVID-19 cases. However, CNNs lose important information in intermediate layers and require large datasets. In this paper, Capsule Network (CapsNet) is used. Capsule Network performs better than CNNs for small datasets. Accuracy of 0.9885, f1-score of 0.9883, precision of 0.9859, recall of 0.9908, and Area Under the Curve (AUC) of 0.9948 are achieved on the Capsule-based framework with hyperparameter tuning. Moreover, different dropout rates are investigated to decrease overfitting. Accordingly, a dropout rate of 0.1 shows the best results. Finally, we remove one convolution layer and decrease the number of trainable parameters to 146,752, which is a promising result.Keywords: capsule network, dropout, hyperparameter tuning, classification
Procedia PDF Downloads 77782 3D Building Model Utilizing Airborne LiDAR Dataset and Terrestrial Photographic Images
Authors: J. Jasmee, I. Roslina, A. Mohammed Yaziz & A.H Juazer Rizal
Abstract:
The need of an effective building information collection method is vital to support a diversity of land development activities. At present, advances in remote sensing such as airborne LiDAR (Light Detection and Ranging) is an established technology for building information collection, location, and elevation of the reflecting laser points towards the construction of 3D building models. In this study, LiDAR datasets and terrestrial photographic images of buildings towards the construction of 3D building models is explored. It is found that, the quantitative accuracy of the constructed 3D building model, namely in the horizontal and vertical components were ± 0.31m (RMSEx,y) and ± 0.145m (RMSEz) respectively. The accuracies were computed based on sixty nine (69) horizontal and twenty (20) vertical surveyed points. As for the qualitative assessment, it is shown that the appearance of the 3D building model is adequate to support the requirements of LOD3 presentation based on the OGC (Open Geospatial Consortium) standard CityGML.Keywords: LiDAR datasets, DSM, DTM, 3D building models
Procedia PDF Downloads 320781 Evaluation of Ensemble Classifiers for Intrusion Detection
Authors: M. Govindarajan
Abstract:
One of the major developments in machine learning in the past decade is the ensemble method, which finds highly accurate classifier by combining many moderately accurate component classifiers. In this research work, new ensemble classification methods are proposed with homogeneous ensemble classifier using bagging and heterogeneous ensemble classifier using arcing and their performances are analyzed in terms of accuracy. A Classifier ensemble is designed using Radial Basis Function (RBF) and Support Vector Machine (SVM) as base classifiers. The feasibility and the benefits of the proposed approaches are demonstrated by the means of standard datasets of intrusion detection. The main originality of the proposed approach is based on three main parts: preprocessing phase, classification phase, and combining phase. A wide range of comparative experiments is conducted for standard datasets of intrusion detection. The performance of the proposed homogeneous and heterogeneous ensemble classifiers are compared to the performance of other standard homogeneous and heterogeneous ensemble methods. The standard homogeneous ensemble methods include Error correcting output codes, Dagging and heterogeneous ensemble methods include majority voting, stacking. The proposed ensemble methods provide significant improvement of accuracy compared to individual classifiers and the proposed bagged RBF and SVM performs significantly better than ECOC and Dagging and the proposed hybrid RBF-SVM performs significantly better than voting and stacking. Also heterogeneous models exhibit better results than homogeneous models for standard datasets of intrusion detection.Keywords: data mining, ensemble, radial basis function, support vector machine, accuracy
Procedia PDF Downloads 248780 Adaptive Neuro Fuzzy Inference System Model Based on Support Vector Regression for Stock Time Series Forecasting
Authors: Anita Setianingrum, Oki S. Jaya, Zuherman Rustam
Abstract:
Forecasting stock price is a challenging task due to the complex time series of the data. The complexity arises from many variables that affect the stock market. Many time series models have been proposed before, but those previous models still have some problems: 1) put the subjectivity of choosing the technical indicators, and 2) rely upon some assumptions about the variables, so it is limited to be applied to all datasets. Therefore, this paper studied a novel Adaptive Neuro-Fuzzy Inference System (ANFIS) time series model based on Support Vector Regression (SVR) for forecasting the stock market. In order to evaluate the performance of proposed models, stock market transaction data of TAIEX and HIS from January to December 2015 is collected as experimental datasets. As a result, the method has outperformed its counterparts in terms of accuracy.Keywords: ANFIS, fuzzy time series, stock forecasting, SVR
Procedia PDF Downloads 246779 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines
Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma
Abstract:
Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)
Procedia PDF Downloads 274