Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 701

Search results for: time-series phosphoproteomic datasets

251 Improved Classification Procedure for Imbalanced and Overlapped Situations

Abstract:

The issue with imbalance and overlapping in the class distribution becomes important in various applications of data mining. The imbalanced dataset is a special case in classification problems in which the number of observations of one class (i.e., major class) heavily exceeds the number of observations of the other class (i.e., minor class). Overlapped dataset is the case where many observations are shared together between the two classes. Imbalanced and overlapped data can be frequently found in many real examples including fraud and abuse patients in healthcare, quality prediction in manufacturing, text classification, oil spill detection, remote sensing, and so on. The class imbalance and overlap problem is the challenging issue because this situation degrades the performance of most of the standard classification algorithms. In this study, we propose a classification procedure that can effectively handle imbalanced and overlapped datasets by splitting data space into three parts: nonoverlapping, light overlapping, and severe overlapping and applying the classification algorithm in each part. These three parts were determined based on the Hausdorff distance and the margin of the modified support vector machine. An experiments study was conducted to examine the properties of the proposed method and compared it with other classification algorithms. The results showed that the proposed method outperformed the competitors under various imbalanced and overlapped situations. Moreover, the applicability of the proposed method was demonstrated through the experiment with real data.

Keywords: classification, imbalanced data with class overlap, split data space, support vector machine

Procedia PDF Downloads 279

250 The “Bright Side” of COVID-19: Effects of Livestream Affordances on Consumer Purchase Willingness: Explicit IT Affordances Perspective

Authors: Isaac Owusu Asante, Yushi Jiang, Hailin Tao

Abstract:

Live streaming marketing, the new electronic commerce element, became an optional marketing channel following the COVID-19 pandemic. Many sellers have leveraged the features presented by live streaming to increase sales. Studies on live streaming have focused on gaming and consumers’ loyalty to brands through live streaming, using interview questionnaires. This study, however, was conducted to measure real-time observable interactions between consumers and sellers. Based on the affordance theory, this study conceptualized constructs representing the interactive features and examined how they drive consumers’ purchase willingness during live streaming sessions using 1238 datasets from Amazon Live, following the manual observation of transaction records. Using structural equation modeling, the ordinary least square regression suggests that live viewers, new followers, live chats, and likes positively affect purchase willingness. The Sobel and Monte Carlo tests show that new followers, live chats, and likes significantly mediate the relationship between live viewers and purchase willingness. The study introduces a new way of measuring interactions in live streaming commerce and proposes a way to manually gather data on consumer behaviors in live streaming platforms when the application programming interface (API) of such platforms does not support data mining algorithms.

Keywords: livestreaming marketing, live chats, live viewers, likes, new followers, purchase willingness

Procedia PDF Downloads 45

249 A Spatio-Temporal Analysis and Change Detection of Wetlands in Diamond Harbour, West Bengal, India Using Normalized Difference Water Index

Authors: Lopita Pal, Suresh V. Madha

Abstract:

Wetlands are areas of marsh, fen, peat land or water, whether natural or artificial, permanent or temporary, with water that is static or flowing, fresh, brackish or salt, including areas of marine water the depth of which at low tide does not exceed six metres. The rapidly expanding human population, large scale changes in land use/land cover, burgeoning development projects and improper use of watersheds all has caused a substantial decline of wetland resources in the world. Major degradations have been impacted from agricultural, industrial and urban developments leading to various types of pollutions and hydrological perturbations. Regular fishing activities and unsustainable grazing of animals are degrading the wetlands in a slow pace. The paper focuses on the spatio-temporal change detection of the area of the water body and the main cause of this depletion. The total area under study (22°19’87’’ N, 88°20’23’’ E) is a wetland region in West Bengal of 213 sq.km. The procedure used is the Normalized Difference Water Index (NDWI) from multi-spectral imagery and Landsat to detect the presence of surface water, and the datasets have been compared of the years 2016, 2006 and 1996. The result shows a sharp decline in the area of water body due to a rapid increase in the agricultural practices and the growing urbanization.

Keywords: spatio-temporal change, NDWI, urbanization, wetland

Procedia PDF Downloads 260

248 Advanced Data Visualization Techniques for Effective Decision-making in Oil and Gas Exploration and Production

Authors: Deepak Singh, Rail Kuliev

Abstract:

This research article explores the significance of advanced data visualization techniques in enhancing decision-making processes within the oil and gas exploration and production domain. With the oil and gas industry facing numerous challenges, effective interpretation and analysis of vast and diverse datasets are crucial for optimizing exploration strategies, production operations, and risk assessment. The article highlights the importance of data visualization in managing big data, aiding the decision-making process, and facilitating communication with stakeholders. Various advanced data visualization techniques, including 3D visualization, augmented reality (AR), virtual reality (VR), interactive dashboards, and geospatial visualization, are discussed in detail, showcasing their applications and benefits in the oil and gas sector. The article presents case studies demonstrating the successful use of these techniques in optimizing well placement, real-time operations monitoring, and virtual reality training. Additionally, the article addresses the challenges of data integration and scalability, emphasizing the need for future developments in AI-driven visualization. In conclusion, this research emphasizes the immense potential of advanced data visualization in revolutionizing decision-making processes, fostering data-driven strategies, and promoting sustainable growth and improved operational efficiency within the oil and gas exploration and production industry.

Keywords: augmented reality (AR), virtual reality (VR), interactive dashboards, real-time operations monitoring

Procedia PDF Downloads 51

247 Impact of Drought on Agriculture in the Upper Middle Gangetic Plain in India

Authors: Reshmita Nath

Abstract:

In this study, we investigate the spatiotemporal characteristics of drought in India and its impact on agriculture during the summer season (April to September). For our analysis, we have used Standardized Precipitation Evapotranspiration Index (SPEI) datasets between 1982 and 2012 at six-month timescale. Based on the criteria SPEI<-1 we obtain the vulnerability map and have found that the Humid subtropical Upper Middle Gangetic Plain (UMGP) region is highly drought prone with an occurrence frequency of 40-45%. This UMGP region contributes at least 18-20% of India’s annual cereal production. Not only the probability, but the region becomes more and more drought-prone in the recent decades. Moreover, the cereal production in the UMGP has experienced a gradual declining trend from 2000 onwards and this feature is consistent with the increase in drought affected areas from 20-25% to 50-60%, before and after 2000, respectively. The higher correlation coefficient (-0.69) between the changes in cereal production and drought affected areas confirms that at least 50% of the agricultural (cereal) losses is associated with drought. While analyzing the individual impact of precipitation and surface temperature anomalies on SPEI (6), we have found that in the UMGP region surface temperature plays the primary role in lowering of SPEI. The linkage is further confirmed by the correlation analysis between the SPEI (6) and surface temperature rise, which exhibits strong negative values in the UMGP region. Higher temperature might have caused more evaporation and drying, which therefore increases the area affected by drought in the recent decade.

Keywords: drought, agriculture, SPEI, Indo-Gangetic plain

Procedia PDF Downloads 227

246 Comparative Evaluation of Accuracy of Selected Machine Learning Classification Techniques for Diagnosis of Cancer: A Data Mining Approach

Authors: Rajvir Kaur, Jeewani Anupama Ginige

Abstract:

With recent trends in Big Data and advancements in Information and Communication Technologies, the healthcare industry is at the stage of its transition from clinician oriented to technology oriented. Many people around the world die of cancer because the diagnosis of disease was not done at an early stage. Nowadays, the computational methods in the form of Machine Learning (ML) are used to develop automated decision support systems that can diagnose cancer with high confidence in a timely manner. This paper aims to carry out the comparative evaluation of a selected set of ML classifiers on two existing datasets: breast cancer and cervical cancer. The ML classifiers compared in this study are Decision Tree (DT), Support Vector Machine (SVM), k-Nearest Neighbor (k-NN), Logistic Regression, Ensemble (Bagged Tree) and Artificial Neural Networks (ANN). The evaluation is carried out based on standard evaluation metrics Precision (P), Recall (R), F1-score and Accuracy. The experimental results based on the evaluation metrics show that ANN showed the highest-level accuracy (99.4%) when tested with breast cancer dataset. On the other hand, when these ML classifiers are tested with the cervical cancer dataset, Ensemble (Bagged Tree) technique gave better accuracy (93.1%) in comparison to other classifiers.

Keywords: artificial neural networks, breast cancer, classifiers, cervical cancer, f-score, machine learning, precision, recall

Procedia PDF Downloads 252

245 Human Immunodeficiency Virus (HIV) Test Predictive Modeling and Identify Determinants of HIV Testing for People with Age above Fourteen Years in Ethiopia Using Data Mining Techniques: EDHS 2011

Authors: S. Abera, T. Gidey, W. Terefe

Abstract:

Introduction: Testing for HIV is the key entry point to HIV prevention, treatment, and care and support services. Hence, predictive data mining techniques can greatly benefit to analyze and discover new patterns from huge datasets like that of EDHS 2011 data. Objectives: The objective of this study is to build a predictive modeling for HIV testing and identify determinants of HIV testing for adults with age above fourteen years using data mining techniques. Methods: Cross-Industry Standard Process for Data Mining (CRISP-DM) was used to predict the model for HIV testing and explore association rules between HIV testing and the selected attributes among adult Ethiopians. Decision tree, Naïve-Bayes, logistic regression and artificial neural networks of data mining techniques were used to build the predictive models. Results: The target dataset contained 30,625 study participants; of which 16, 515 (53.9%) were women. Nearly two-fifth; 17,719 (58%), have never been tested for HIV while the rest 12,906 (42%) had been tested. Ethiopians with higher wealth index, higher educational level, belonging 20 to 29 years old, having no stigmatizing attitude towards HIV positive person, urban residents, having HIV related knowledge, information about family planning on mass media and knowing a place where to get testing for HIV showed an increased patterns with respect to HIV testing. Conclusion and Recommendation: Public health interventions should consider the identified determinants to promote people to get testing for HIV.

Keywords: data mining, HIV, testing, ethiopia

Procedia PDF Downloads 467

244 Detection of Atrial Fibrillation Using Wearables via Attentional Two-Stream Heterogeneous Networks

Authors: Huawei Bai, Jianguo Yao, Fellow, IEEE

Abstract:

Atrial fibrillation (AF) is the most common form of heart arrhythmia and is closely associated with mortality and morbidity in heart failure, stroke, and coronary artery disease. The development of single spot optical sensors enables widespread photoplethysmography (PPG) screening, especially for AF, since it represents a more convenient and noninvasive approach. To our knowledge, most existing studies based on public and unbalanced datasets can barely handle the multiple noises sources in the real world and, also, lack interpretability. In this paper, we construct a large- scale PPG dataset using measurements collected from PPG wrist- watch devices worn by volunteers and propose an attention-based two-stream heterogeneous neural network (TSHNN). The first stream is a hybrid neural network consisting of a three-layer one-dimensional convolutional neural network (1D-CNN) and two-layer attention- based bidirectional long short-term memory (Bi-LSTM) network to learn representations from temporally sampled signals. The second stream extracts latent representations from the PPG time-frequency spectrogram using a five-layer CNN. The outputs from both streams are fed into a fusion layer for the outcome. Visualization of the attention weights learned demonstrates the effectiveness of the attention mechanism against noise. The experimental results show that the TSHNN outperforms all the competitive baseline approaches and with 98.09% accuracy, achieves state-of-the-art performance.

Keywords: PPG wearables, atrial fibrillation, feature fusion, attention mechanism, hyber network

Procedia PDF Downloads 87

243 Predicting the Compressive Strength of Geopolymer Concrete Using Machine Learning Algorithms: Impact of Chemical Composition and Curing Conditions

Authors: Aya Belal, Ahmed Maher Eltair, Maggie Ahmed Mashaly

Abstract:

Geopolymer concrete is gaining recognition as a sustainable alternative to conventional Portland Cement concrete due to its environmentally friendly nature, which is a key goal for Smart City initiatives. It has demonstrated its potential as a reliable material for the design of structural elements. However, the production of Geopolymer concrete is hindered by batch-to-batch variations, which presents a significant challenge to the widespread adoption of Geopolymer concrete. To date, Machine learning has had a profound impact on various fields by enabling models to learn from large datasets and predict outputs accurately. This paper proposes an integration between the current drift to Artificial Intelligence and the composition of Geopolymer mixtures to predict their mechanical properties. This study employs Python software to develop machine learning model in specific Decision Trees. The research uses the percentage oxides and the chemical composition of the Alkali Solution along with the curing conditions as the input independent parameters, irrespective of the waste products used in the mixture yielding the compressive strength of the mix as the output parameter. The results showed 90 % agreement of the predicted values to the actual values having the ratio of the Sodium Silicate to the Sodium Hydroxide solution being the dominant parameter in the mixture.

Keywords: decision trees, geopolymer concrete, machine learning, smart cities, sustainability

Procedia PDF Downloads 56

242 Decision Making System for Clinical Datasets

Authors: P. Bharathiraja

Abstract:

Computer Aided decision making system is used to enhance diagnosis and prognosis of diseases and also to assist clinicians and junior doctors in clinical decision making. Medical Data used for decision making should be definite and consistent. Data Mining and soft computing techniques are used for cleaning the data and for incorporating human reasoning in decision making systems. Fuzzy rule based inference technique can be used for classification in order to incorporate human reasoning in the decision making process. In this work, missing values are imputed using the mean or mode of the attribute. The data are normalized using min-ma normalization to improve the design and efficiency of the fuzzy inference system. The fuzzy inference system is used to handle the uncertainties that exist in the medical data. Equal-width-partitioning is used to partition the attribute values into appropriate fuzzy intervals. Fuzzy rules are generated using Class Based Associative rule mining algorithm. The system is trained and tested using heart disease data set from the University of California at Irvine (UCI) Machine Learning Repository. The data was split using a hold out approach into training and testing data. From the experimental results it can be inferred that classification using fuzzy inference system performs better than trivial IF-THEN rule based classification approaches. Furthermore it is observed that the use of fuzzy logic and fuzzy inference mechanism handles uncertainty and also resembles human decision making. The system can be used in the absence of a clinical expert to assist junior doctors and clinicians in clinical decision making.

Keywords: decision making, data mining, normalization, fuzzy rule, classification

Procedia PDF Downloads 489

241 Alpha: A Groundbreaking Avatar Merging User Dialogue with OpenAI's GPT-3.5 for Enhanced Reflective Thinking

Authors: Jonas Colin

Abstract:

Standing at the vanguard of AI development, Alpha represents an unprecedented synthesis of logical rigor and human abstraction, meticulously crafted to mirror the user's unique persona and personality, a feat previously unattainable in AI development. Alpha, an avant-garde artefact in the realm of artificial intelligence, epitomizes a paradigmatic shift in personalized digital interaction, amalgamating user-specific dialogic patterns with the sophisticated algorithmic prowess of OpenAI's GPT-3.5 to engender a platform for enhanced metacognitive engagement and individualized user experience. Underpinned by a sophisticated algorithmic framework, Alpha integrates vast datasets through a complex interplay of neural network models and symbolic AI, facilitating a dynamic, adaptive learning process. This integration enables the system to construct a detailed user profile, encompassing linguistic preferences, emotional tendencies, and cognitive styles, tailoring interactions to align with individual characteristics and conversational contexts. Furthermore, Alpha incorporates advanced metacognitive elements, enabling real-time reflection and adaptation in communication strategies. This self-reflective capability ensures continuous refinement of its interaction model, positioning Alpha not just as a technological marvel but as a harbinger of a new era in human-computer interaction, where machines engage with us on a deeply personal and cognitive level, transforming our interaction with the digital world.

Keywords: chatbot, GPT 3.5, metacognition, symbiose

Procedia PDF Downloads 33

240 Web Proxy Detection via Bipartite Graphs and One-Mode Projections

Authors: Zhipeng Chen, Peng Zhang, Qingyun Liu, Li Guo

Abstract:

With the Internet becoming the dominant channel for business and life, many IPs are increasingly masked using web proxies for illegal purposes such as propagating malware, impersonate phishing pages to steal sensitive data or redirect victims to other malicious targets. Moreover, as Internet traffic continues to grow in size and complexity, it has become an increasingly challenging task to detect the proxy service due to their dynamic update and high anonymity. In this paper, we present an approach based on behavioral graph analysis to study the behavior similarity of web proxy users. Specifically, we use bipartite graphs to model host communications from network traffic and build one-mode projections of bipartite graphs for discovering social-behavior similarity of web proxy users. Based on the similarity matrices of end-users from the derived one-mode projection graphs, we apply a simple yet effective spectral clustering algorithm to discover the inherent web proxy users behavior clusters. The web proxy URL may vary from time to time. Still, the inherent interest would not. So, based on the intuition, by dint of our private tools implemented by WebDriver, we examine whether the top URLs visited by the web proxy users are web proxies. Our experiment results based on real datasets show that the behavior clusters not only reduce the number of URLs analysis but also provide an effective way to detect the web proxies, especially for the unknown web proxies.

Keywords: bipartite graph, one-mode projection, clustering, web proxy detection

Procedia PDF Downloads 223

239 Using Risk Management Indicators in Decision Tree Analysis

Authors: Adel Ali Elshaibani

Abstract:

Risk management indicators augment the reporting infrastructure, particularly for the board and senior management, to identify, monitor, and manage risks. This enhancement facilitates improved decision-making throughout the banking organization. Decision tree analysis is a tool that visually outlines potential outcomes, costs, and consequences of complex decisions. It is particularly beneficial for analyzing quantitative data and making decisions based on numerical values. By calculating the expected value of each outcome, decision tree analysis can help assess the best course of action. In the context of banking, decision tree analysis can assist lenders in evaluating a customer’s creditworthiness, thereby preventing losses. However, applying these tools in developing countries may face several limitations, such as data availability, lack of technological infrastructure and resources, lack of skilled professionals, cultural factors, and cost. Moreover, decision trees can create overly complex models that do not generalize well to new data, known as overfitting. They can also be sensitive to small changes in the data, which can result in different tree structures and can become computationally expensive when dealing with large datasets. In conclusion, while risk management indicators and decision tree analysis are beneficial for decision-making in banks, their effectiveness is contingent upon how they are implemented and utilized by the board of directors, especially in the context of developing countries. It’s important to consider these limitations when planning to implement these tools in developing countries.

Keywords: risk management indicators, decision tree analysis, developing countries, board of directors, bank performance, risk management strategy, banking institutions

Procedia PDF Downloads 32

238 Heterogeneity of Soil Moisture and Its Impacts on the Mountainous Watershed Hydrology in Northwest China

Authors: Chansheng He, Zhongfu Wang, Xiao Bai, Jie Tian, Xin Jin

Abstract:

Heterogeneity of soil hydraulic properties directly affects hydrological processes at different scales. Understanding heterogeneity of soil hydraulic properties such as soil moisture is therefore essential for modeling watershed ecohydrological processes, particularly in hard to access, topographically complex mountainous watersheds. This study maps spatial variations of soil moisture by in situ observation network that consists of sampling points, zones, and tributaries, and monitors corresponding hydrological variables of air and soil temperatures, evapotranspiration, infiltration, and runoff in the Upper Reach of the Heihe River Watershed, a second largest inland river (terminal lake) with a drainage area of over 128,000 km² in Northwest China. Subsequently, the study uses a hydrological model, SWAT (Soil and Water Assessment Tool) to simulate the effects of heterogeneity of soil moisture on watershed hydrological processes. The spatial clustering method, Full-Order-CLK was employed to derive five soil heterogeneous zones (Configuration 97, 80, 65, 40, and 20) for soil input to SWAT. Results show the simulations by the SWAT model with the spatially clustered soil hydraulic information from the field sampling data had much better representation of the soil heterogeneity and more accurate performance than the model using the average soil property values for each soil type derived from the coarse soil datasets. Thus, incorporating detailed field sampling soil heterogeneity data greatly improves performance in hydrologic modeling.

Keywords: heterogeneity, soil moisture, SWAT, up-scaling

Procedia PDF Downloads 320

237 Extracellular Enzymes as Promising Soil Health Indicators: Assessing Response to Different Land Uses Using Long-Term Experiments

Authors: Munisath Khandoker, Stephan Haefele, Andy Gregory

Abstract:

Extracellular enzymes play a key role in soil organic carbon (SOC) decomposition and nutrient cycling and are known indicators for soil health; however, it is not understood how these enzymes respond to different land uses and their relationships to other soil properties have not been extensively reviewed. The relationships among the activities of three soil enzymes: β-glucosaminidase (NAG), phosphomonoesterase (PHO) and β-glucosidase (GLU), were examined. The impact of soil organic amendments, soil types and land management on soil enzyme activities were reviewed, and it was hypothesized that soils with increased SOC have increased enzyme activity. Long-term experiments at Rothamsted Research Woburn and Harpenden sites in the UK were used to evaluate how different management practices affect enzyme activity involved in carbon (C) and nitrogen (N) cycling in the soil. Samples were collected from soils with different organic treatments such as straw, farmyard manure (FYM), compost additions, cover crops and permanent grass cover to assess whether SOC can be linked with increased levels of enzymatic activity and what influence, if any, enzymatic activity has on total C and N in the soil. Investigating the interactions of important enzymes with soil characteristics and SOC can help to better understand the health of soils. Studies on long-term experiments with known histories and large datasets can better help with this. SOC tends to decrease during land use changes from natural ecosystems to agricultural systems; therefore, it is imperative that agricultural lands find ways to increase and/or maintain SOC in the soil.

Keywords: biological soil health indicators, extracellular enzymes, soil health, soil, microbiology

Procedia PDF Downloads 47

236 A Convolutional Neural Network-Based Model for Lassa fever Virus Prediction Using Patient Blood Smear Image

Authors: A. M. John-Otumu, M. M. Rahman, M. C. Onuoha, E. P. Ojonugwa

Abstract:

A Convolutional Neural Network (CNN) model for predicting Lassa fever was built using Python 3.8.0 programming language, alongside Keras 2.2.4 and TensorFlow 2.6.1 libraries as the development environment in order to reduce the current high risk of Lassa fever in West Africa, particularly in Nigeria. The study was prompted by some major flaws in existing conventional laboratory equipment for diagnosing Lassa fever (RT-PCR), as well as flaws in AI-based techniques that have been used for probing and prognosis of Lassa fever based on literature. There were 15,679 blood smear microscopic image datasets collected in total. The proposed model was trained on 70% of the dataset and tested on 30% of the microscopic images in avoid overfitting. A 3x3x3 convolution filter was also used in the proposed system to extract features from microscopic images. The proposed CNN-based model had a recall value of 96%, a precision value of 93%, an F1 score of 95%, and an accuracy of 94% in predicting and accurately classifying the images into clean or infected samples. Based on empirical evidence from the results of the literature consulted, the proposed model outperformed other existing AI-based techniques evaluated. If properly deployed, the model will assist physicians, medical laboratory scientists, and patients in making accurate diagnoses for Lassa fever cases, allowing the mortality rate due to the Lassa fever virus to be reduced through sound decision-making.

Keywords: artificial intelligence, ANN, blood smear, CNN, deep learning, Lassa fever

Procedia PDF Downloads 86

235 Progress in Combining Image Captioning and Visual Question Answering Tasks

Authors: Prathiksha Kamath, Pratibha Jamkhandi, Prateek Ghanti, Priyanshu Gupta, M. Lakshmi Neelima

Abstract:

Combining Image Captioning and Visual Question Answering (VQA) tasks have emerged as a new and exciting research area. The image captioning task involves generating a textual description that summarizes the content of the image. VQA aims to answer a natural language question about the image. Both these tasks include computer vision and natural language processing (NLP) and require a deep understanding of the content of the image and semantic relationship within the image and the ability to generate a response in natural language. There has been remarkable growth in both these tasks with rapid advancement in deep learning. In this paper, we present a comprehensive review of recent progress in combining image captioning and visual question-answering (VQA) tasks. We first discuss both image captioning and VQA tasks individually and then the various ways in which both these tasks can be integrated. We also analyze the challenges associated with these tasks and ways to overcome them. We finally discuss the various datasets and evaluation metrics used in these tasks. This paper concludes with the need for generating captions based on the context and captions that are able to answer the most likely asked questions about the image so as to aid the VQA task. Overall, this review highlights the significant progress made in combining image captioning and VQA, as well as the ongoing challenges and opportunities for further research in this exciting and rapidly evolving field, which has the potential to improve the performance of real-world applications such as autonomous vehicles, robotics, and image search.

Keywords: image captioning, visual question answering, deep learning, natural language processing

Procedia PDF Downloads 52

234 A Deep Learning Approach to Online Social Network Account Compromisation

Authors: Edward K. Boahen, Brunel E. Bouya-Moko, Changda Wang

Abstract:

The major threat to online social network (OSN) users is account compromisation. Spammers now spread malicious messages by exploiting the trust relationship established between account owners and their friends. The challenge in detecting a compromised account by service providers is validating the trusted relationship established between the account owners, their friends, and the spammers. Another challenge is the increase in required human interaction with the feature selection. Research available on supervised learning (machine learning) has limitations with the feature selection and accounts that cannot be profiled, like application programming interface (API). Therefore, this paper discusses the various behaviours of the OSN users and the current approaches in detecting a compromised OSN account, emphasizing its limitations and challenges. We propose a deep learning approach that addresses and resolve the constraints faced by the previous schemes. We detailed our proposed optimized nonsymmetric deep auto-encoder (OPT_NDAE) for unsupervised feature learning, which reduces the required human interaction levels in the selection and extraction of features. We evaluated our proposed classifier using the NSL-KDD and KDDCUP'99 datasets in a graphical user interface enabled Weka application. The results obtained indicate that our proposed approach outperformed most of the traditional schemes in OSN compromised account detection with an accuracy rate of 99.86%.

Keywords: computer security, network security, online social network, account compromisation

Procedia PDF Downloads 92

233 Refined Edge Detection Network

Authors: Omar Elharrouss, Youssef Hmamouche, Assia Kamal Idrissi, Btissam El Khamlichi, Amal El Fallah-Seghrouchni

Abstract:

Edge detection is represented as one of the most challenging tasks in computer vision, due to the complexity of detecting the edges or boundaries in real-world images that contains objects of different types and scales like trees, building as well as various backgrounds. Edge detection is represented also as a key task for many computer vision applications. Using a set of backbones as well as attention modules, deep-learning-based methods improved the detection of edges compared with the traditional methods like Sobel and Canny. However, images of complex scenes still represent a challenge for these methods. Also, the detected edges using the existing approaches suffer from non-refined results while the image output contains many erroneous edges. To overcome this, n this paper, by using the mechanism of residual learning, a refined edge detection network is proposed (RED-Net). By maintaining the high resolution of edges during the training process, and conserving the resolution of the edge image during the network stage, we make the pooling outputs at each stage connected with the output of the previous layer. Also, after each layer, we use an affined batch normalization layer as an erosion operation for the homogeneous region in the image. The proposed methods are evaluated using the most challenging datasets including BSDS500, NYUD, and Multicue. The obtained results outperform the designed edge detection networks in terms of performance metrics and quality of output images.

Keywords: edge detection, convolutional neural networks, deep learning, scale-representation, backbone

Procedia PDF Downloads 70

232 Classification of Hyperspectral Image Using Mathematical Morphological Operator-Based Distance Metric

Authors: Geetika Barman, B. S. Daya Sagar

Abstract:

In this article, we proposed a pixel-wise classification of hyperspectral images using a mathematical morphology operator-based distance metric called “dilation distance” and “erosion distance”. This method involves measuring the spatial distance between the spectral features of a hyperspectral image across the bands. The key concept of the proposed approach is that the “dilation distance” is the maximum distance a pixel can be moved without changing its classification, whereas the “erosion distance” is the maximum distance that a pixel can be moved before changing its classification. The spectral signature of the hyperspectral image carries unique class information and shape for each class. This article demonstrates how easily the dilation and erosion distance can measure spatial distance compared to other approaches. This property is used to calculate the spatial distance between hyperspectral image feature vectors across the bands. The dissimilarity matrix is then constructed using both measures extracted from the feature spaces. The measured distance metric is used to distinguish between the spectral features of various classes and precisely distinguish between each class. This is illustrated using both toy data and real datasets. Furthermore, we investigated the role of flat vs. non-flat structuring elements in capturing the spatial features of each class in the hyperspectral image. In order to validate, we compared the proposed approach to other existing methods and demonstrated empirically that mathematical operator-based distance metric classification provided competitive results and outperformed some of them.

Keywords: dilation distance, erosion distance, hyperspectral image classification, mathematical morphology

Procedia PDF Downloads 56

231 Size Effect on Shear Strength of Slender Reinforced Concrete Beams

Authors: Subhan Ahmad, Pradeep Bhargava, Ajay Chourasia

Abstract:

Shear failure in reinforced concrete beams without shear reinforcement leads to loss of property and life since a very little or no warning occurs before failure as in case of flexural failure. Shear strength of reinforced concrete beams decreases as its depth increases. This phenomenon is generally called as the size effect. In this paper, a comparative analysis is performed to estimate the performance of shear strength models in capturing the size effect of reinforced concrete beams made with conventional concrete, self-compacting concrete, and recycled aggregate concrete. Four shear strength models that account for the size effect in shear are selected from the literature and applied on the datasets of slender reinforced concrete beams. Beams prepared with conventional concrete, self-compacting concrete, and recycled aggregate concrete are considered for the analysis. Results showed that all the four models captured the size effect in shear effectively and produced conservative estimates of the shear strength for beams made with normal strength conventional concrete. These models yielded unconservative estimates for high strength conventional concrete beams with larger effective depths ( > 450 mm). Model of Bazant and Kim (1984) captured the size effect precisely and produced conservative estimates of shear strength of self-compacting concrete beams at all the effective depths. Also, shear strength models considered in this study produced unconservative estimates of shear strength for recycled aggregate concrete beams at all effective depths.

Keywords: reinforced concrete beams; shear strength; prediction models; size effect

Procedia PDF Downloads 132

230 Machine Learning Strategies for Data Extraction from Unstructured Documents in Financial Services

Authors: Delphine Vendryes, Dushyanth Sekhar, Baojia Tong, Matthew Theisen, Chester Curme

Abstract:

Much of the data that inform the decisions of governments, corporations and individuals are harvested from unstructured documents. Data extraction is defined here as a process that turns non-machine-readable information into a machine-readable format that can be stored, for instance, in a database. In financial services, introducing more automation in data extraction pipelines is a major challenge. Information sought by financial data consumers is often buried within vast bodies of unstructured documents, which have historically required thorough manual extraction. Automated solutions provide faster access to non-machine-readable datasets, in a context where untimely information quickly becomes irrelevant. Data quality standards cannot be compromised, so automation requires high data integrity. This multifaceted task is broken down into smaller steps: ingestion, table parsing (detection and structure recognition), text analysis (entity detection and disambiguation), schema-based record extraction, user feedback incorporation. Selected intermediary steps are phrased as machine learning problems. Solutions leveraging cutting-edge approaches from the fields of computer vision (e.g. table detection) and natural language processing (e.g. entity detection and disambiguation) are proposed.

Keywords: computer vision, entity recognition, finance, information retrieval, machine learning, natural language processing

Procedia PDF Downloads 87

229 Mining Riding Patterns in Bike-Sharing System Connecting with Public Transportation

Authors: Chong Zhang, Guoming Tang, Bin Ge, Jiuyang Tang

Abstract:

With the fast growing road traffic and increasingly severe traffic congestion, more and more citizens choose to use the public transportation for daily travelling. Meanwhile, the shared bike provides a convenient option for the first and last mile to the public transit. As of 2016, over one thousand cities around the world have deployed the bike-sharing system. The combination of these two transportations have stimulated the development of each other and made significant contribution to the reduction of carbon footprint. A lot of work has been done on mining the riding behaviors in various bike-sharing systems. Most of them, however, treated the bike-sharing system as an isolated system and thus their results provide little reference for the public transit construction and optimization. In this work, we treat the bike-sharing and public transit as a whole and investigate the customers’ bike-and-ride behaviors. Specifically, we develop a spatio-temporal traffic delivery model to study the riding patterns between the two transportation systems and explore the traffic characteristics (e.g., distributions of customer arrival/departure and traffic peak hours) from the time and space dimensions. During the model construction and evaluation, we make use of large open datasets from real-world bike-sharing systems (the CitiBike in New York, GoBike in San Francisco and BIXI in Montreal) along with corresponding public transit information. The developed two-dimension traffic model, as well as the mined bike-and-ride behaviors, can provide great help to the deployment of next-generation intelligent transportation systems.

Keywords: riding pattern mining, bike-sharing system, public transportation, bike-and-ride behavior

Procedia PDF Downloads 745

228 Developing an Out-of-Distribution Generalization Model Selection Framework through Impurity and Randomness Measurements and a Bias Index

Authors: Todd Zhou, Mikhail Yurochkin

Abstract:

Out-of-distribution (OOD) detection is receiving increasing amounts of attention in the machine learning research community, boosted by recent technologies, such as autonomous driving and image processing. This newly-burgeoning field has called for the need for more effective and efficient methods for out-of-distribution generalization methods. Without accessing the label information, deploying machine learning models to out-of-distribution domains becomes extremely challenging since it is impossible to evaluate model performance on unseen domains. To tackle this out-of-distribution detection difficulty, we designed a model selection pipeline algorithm and developed a model selection framework with different impurity and randomness measurements to evaluate and choose the best-performing models for out-of-distribution data. By exploring different randomness scores based on predicted probabilities, we adopted the out-of-distribution entropy and developed a custom-designed score, ”CombinedScore,” as the evaluation criterion. This proposed score was created by adding labeled source information into the judging space of the uncertainty entropy score using harmonic mean. Furthermore, the prediction bias was explored through the equality of opportunity violation measurement. We also improved machine learning model performance through model calibration. The effectiveness of the framework with the proposed evaluation criteria was validated on the Folktables American Community Survey (ACS) datasets.

Keywords: model selection, domain generalization, model fairness, randomness measurements, bias index

Procedia PDF Downloads 102

227 Effects of Dust Storm Events on Tuberculosis Incidence Rate in Northwest of China

Authors: Yun Wang, Ruoyu Wang, Tuo Chen, Guangxiu Liu, Guodong Chen, Wei Zhang

Abstract:

Tuberculosis (TB) is a major public health problem in China. China has the world's second largest tuberculosis epidemic (after India). Xinjiang almost has the highest annual attendance rate of TB in China, and the province is also famous because of its severe dust storms. The epidemic timing starts in February and ends in July, and the dust storm mainly distribute throughout the spring and early summer, which strongly indicate a close linkage between causative agent of TB and dust storm events. However, mechanisms responsible for the observed patterns are still not clearly indentified. By comparing the information on cases of TB from Centers for Disease Control of China annual reports with dust storm atmosphere datasets, we constructed the relationship between the large scale annual occurrence of TB in Xinjiang, a Northwest province of China, and dust storm occurrence. Regional atmospheric indexes of dust storm based on surface wind speed show a clear link between population dynamics of the disease and the climate disaster: the onset of epidemics and the dust storm defined by the atmospheric index share the same mean year. This study is the first that provides a clear demonstration of connections that exist between TB epidemics and dust storm events in China. The development of this study will undoubtedly help early warning for tuberculosis epidemic onset in China and help nationwide and international public health institutions and policy makers to better control TB disease in Norwest China.

Keywords: dust storm, tuberculosis, Xinjiang province, epidemic

Procedia PDF Downloads 417

226 A Review of Effective Gene Selection Methods for Cancer Classification Using Microarray Gene Expression Profile

Authors: Hala Alshamlan, Ghada Badr, Yousef Alohali

Abstract:

Cancer is one of the dreadful diseases, which causes considerable death rate in humans. DNA microarray-based gene expression profiling has been emerged as an efficient technique for cancer classification, as well as for diagnosis, prognosis, and treatment purposes. In recent years, a DNA microarray technique has gained more attraction in both scientific and in industrial fields. It is important to determine the informative genes that cause cancer to improve early cancer diagnosis and to give effective chemotherapy treatment. In order to gain deep insight into the cancer classification problem, it is necessary to take a closer look at the proposed gene selection methods. We believe that they should be an integral preprocessing step for cancer classification. Furthermore, finding an accurate gene selection method is a very significant issue in a cancer classification area because it reduces the dimensionality of microarray dataset and selects informative genes. In this paper, we classify and review the state-of-art gene selection methods. We proceed by evaluating the performance of each gene selection approach based on their classification accuracy and number of informative genes. In our evaluation, we will use four benchmark microarray datasets for the cancer diagnosis (leukemia, colon, lung, and prostate). In addition, we compare the performance of gene selection method to investigate the effective gene selection method that has the ability to identify a small set of marker genes, and ensure high cancer classification accuracy. To the best of our knowledge, this is the first attempt to compare gene selection approaches for cancer classification using microarray gene expression profile.

Keywords: gene selection, feature selection, cancer classification, microarray, gene expression profile

Procedia PDF Downloads 424

225 Physical and Morphological Response to Land Reclamation Projects in a Wave-Dominated Bay

Authors: Florian Monetti, Brett Beamsley, Peter McComb, Simon Weppe

Abstract:

Land reclamation from the ocean has considerably increased over past decades to support worldwide rapid urban growth. Reshaping the coastline, however, inevitably affects coastal systems. One of the main challenges for coastal oceanographers is to predict the physical and morphological responses for nearshore systems to man-made changes over multiple time-scales. Fully-coupled numerical models are powerful tools for simulating the wide range of interactions between flow field and bedform morphology. Restricted and inconsistent measurements, combined with limited computational resources, typically make this exercise complex and uncertain. In the present study, we investigate the impact of proposed land reclamation within a wave-dominated bay in New Zealand. For this purpose, we first calibrated our morphological model based on the long-term evolution of the bay resulting from land reclamation carried out in the 1950s. This included the application of sedimentological spin-up and reduction techniques based on historical bathymetry datasets. The updated bathymetry, including the proposed modifications of the bay, was then used to predict the effect of the proposed land reclamation on the wave climate and morphology of the bay after one decade. We show that reshaping the bay induces a distinct symmetrical response of the shoreline which likely will modify the nearshore wave patterns and consequently recreational activities in the area.

Keywords: coastal waves, impact of land reclamation, long-term coastal evolution, morphodynamic modeling

Procedia PDF Downloads 146

224 Enhancer: An Effective Transformer Architecture for Single Image Super Resolution

Authors: Pitigalage Chamath Chandira Peiris

Abstract:

A widely researched domain in the field of image processing in recent times has been single image super-resolution, which tries to restore a high-resolution image from a single low-resolution image. Many more single image super-resolution efforts have been completed utilizing equally traditional and deep learning methodologies, as well as a variety of other methodologies. Deep learning-based super-resolution methods, in particular, have received significant interest. As of now, the most advanced image restoration approaches are based on convolutional neural networks; nevertheless, only a few efforts have been performed using Transformers, which have demonstrated excellent performance on high-level vision tasks. The effectiveness of CNN-based algorithms in image super-resolution has been impressive. However, these methods cannot completely capture the non-local features of the data. Enhancer is a simple yet powerful Transformer-based approach for enhancing the resolution of images. A method for single image super-resolution was developed in this study, which utilized an efficient and effective transformer design. This proposed architecture makes use of a locally enhanced window transformer block to alleviate the enormous computational load associated with non-overlapping window-based self-attention. Additionally, it incorporates depth-wise convolution in the feed-forward network to enhance its ability to capture local context. This study is assessed by comparing the results obtained for popular datasets to those obtained by other techniques in the domain.

Keywords: single image super resolution, computer vision, vision transformers, image restoration

Procedia PDF Downloads 78

223 Neural Graph Matching for Modification Similarity Applied to Electronic Document Comparison

Authors: Po-Fang Hsu, Chiching Wei

Abstract:

In this paper, we present a novel neural graph matching approach applied to document comparison. Document comparison is a common task in the legal and financial industries. In some cases, the most important differences may be the addition or omission of words, sentences, clauses, or paragraphs. However, it is a challenging task without recording or tracing the whole edited process. Under many temporal uncertainties, we explore the potentiality of our approach to proximate the accurate comparison to make sure which element blocks have a relation of edition with others. In the beginning, we apply a document layout analysis that combines traditional and modern technics to segment layouts in blocks of various types appropriately. Then we transform this issue into a problem of layout graph matching with textual awareness. Regarding graph matching, it is a long-studied problem with a broad range of applications. However, different from previous works focusing on visual images or structural layout, we also bring textual features into our model for adapting this domain. Specifically, based on the electronic document, we introduce an encoder to deal with the visual presentation decoding from PDF. Additionally, because the modifications can cause the inconsistency of document layout analysis between modified documents and the blocks can be merged and split, Sinkhorn divergence is adopted in our neural graph approach, which tries to overcome both these issues with many-to-many block matching. We demonstrate this on two categories of layouts, as follows., legal agreement and scientific articles, collected from our real-case datasets.

Keywords: document comparison, graph matching, graph neural network, modification similarity, multi-modal

Procedia PDF Downloads 152

222 Evaluation of Real-Time Background Subtraction Technique for Moving Object Detection Using Fast-Independent Component Analysis

Authors: Naoum Abderrahmane, Boumehed Meriem, Alshaqaqi Belal

Abstract:

Background subtraction algorithm is a larger used technique for detecting moving objects in video surveillance to extract the foreground objects from a reference background image. There are many challenges to test a good background subtraction algorithm, like changes in illumination, dynamic background such as swinging leaves, rain, snow, and the changes in the background, for example, moving and stopping of vehicles. In this paper, we propose an efficient and accurate background subtraction method for moving object detection in video surveillance. The main idea is to use a developed fast-independent component analysis (ICA) algorithm to separate background, noise, and foreground masks from an image sequence in practical environments. The fast-ICA algorithm is adapted and adjusted with a matrix calculation and searching for an optimum non-quadratic function to be faster and more robust. Moreover, in order to estimate the de-mixing matrix and the denoising de-mixing matrix parameters, we propose to convert all images to YCrCb color space, where the luma component Y (brightness of the color) gives suitable results. The proposed technique has been verified on the publicly available datasets CD net 2012 and CD net 2014, and experimental results show that our algorithm can detect competently and accurately moving objects in challenging conditions compared to other methods in the literature in terms of quantitative and qualitative evaluations with real-time frame rate.

Keywords: background subtraction, moving object detection, fast-ICA, de-mixing matrix

Procedia PDF Downloads 71