Search results for: LiDAR datasets
726 On an Approach for Rule Generation in Association Rule Mining
Authors: B. Chandra
Abstract:
In Association Rule Mining, much attention has been paid for developing algorithms for large (frequent/closed/maximal) itemsets but very little attention has been paid to improve the performance of rule generation algorithms. Rule generation is an important part of Association Rule Mining. In this paper, a novel approach named NARG (Association Rule using Antecedent Support) has been proposed for rule generation that uses memory resident data structure named FCET (Frequent Closed Enumeration Tree) to find frequent/closed itemsets. In addition, the computational speed of NARG is enhanced by giving importance to the rules that have lower antecedent support. Comparative performance evaluation of NARG with fast association rule mining algorithm for rule generation has been done on synthetic datasets and real life datasets (taken from UCI Machine Learning Repository). Performance analysis shows that NARG is computationally faster in comparison to the existing algorithms for rule generation.Keywords: knowledge discovery, association rule mining, antecedent support, rule generation
Procedia PDF Downloads 324725 Using Machine Learning Techniques for Autism Spectrum Disorder Analysis and Detection in Children
Authors: Norah Mohammed Alshahrani, Abdulaziz Almaleh
Abstract:
Autism Spectrum Disorder (ASD) is a condition related to issues with brain development that affects how a person recognises and communicates with others which results in difficulties with interaction and communication socially and it is constantly growing. Early recognition of ASD allows children to lead safe and healthy lives and helps doctors with accurate diagnoses and management of conditions. Therefore, it is crucial to develop a method that will achieve good results and with high accuracy for the measurement of ASD in children. In this paper, ASD datasets of toddlers and children have been analyzed. We employed the following machine learning techniques to attempt to explore ASD and they are Random Forest (RF), Decision Tree (DT), Na¨ıve Bayes (NB) and Support Vector Machine (SVM). Then Feature selection was used to provide fewer attributes from ASD datasets while preserving model performance. As a result, we found that the best result has been provided by the Support Vector Machine (SVM), achieving 0.98% in the toddler dataset and 0.99% in the children dataset.Keywords: autism spectrum disorder, machine learning, feature selection, support vector machine
Procedia PDF Downloads 152724 Efficient Recommendation System for Frequent and High Utility Itemsets over Incremental Datasets
Authors: J. K. Kavitha, D. Manjula, U. Kanimozhi
Abstract:
Mining frequent and high utility item sets have gained much significance in the recent years. When the data arrives sporadically, incremental and interactive rule mining and utility mining approaches can be adopted to handle user’s dynamic environmental needs and avoid redundancies, using previous data structures, and mining results. The dependence on recommendation systems has exponentially risen since the advent of search engines. This paper proposes a model for building a recommendation system that suggests frequent and high utility item sets over dynamic datasets for a cluster based location prediction strategy to predict user’s trajectories using the Efficient Incremental Rule Mining (EIRM) algorithm and the Fast Update Utility Pattern Tree (FUUP) algorithm. Through comprehensive evaluations by experiments, this scheme has shown to deliver excellent performance.Keywords: data sets, recommendation system, utility item sets, frequent item sets mining
Procedia PDF Downloads 293723 Mapping of Urban Micro-Climate in Lyon (France) by Integrating Complementary Predictors at Different Scales into Multiple Linear Regression Models
Authors: Lucille Alonso, Florent Renard
Abstract:
The characterizations of urban heat island (UHI) and their interactions with climate change and urban climates are the main research and public health issue, due to the increasing urbanization of the population. These solutions require a better knowledge of the UHI and micro-climate in urban areas, by combining measurements and modelling. This study is part of this topic by evaluating microclimatic conditions in dense urban areas in the Lyon Metropolitan Area (France) using a combination of data traditionally used such as topography, but also from LiDAR (Light Detection And Ranging) data, Landsat 8 satellite observation and Sentinel and ground measurements by bike. These bicycle-dependent weather data collections are used to build the database of the variable to be modelled, the air temperature, over Lyon’s hyper-center. This study aims to model the air temperature, measured during 6 mobile campaigns in Lyon in clear weather, using multiple linear regressions based on 33 explanatory variables. They are of various categories such as meteorological parameters from remote sensing, topographic variables, vegetation indices, the presence of water, humidity, bare soil, buildings, radiation, urban morphology or proximity and density to various land uses (water surfaces, vegetation, bare soil, etc.). The acquisition sources are multiple and come from the Landsat 8 and Sentinel satellites, LiDAR points, and cartographic products downloaded from an open data platform in Greater Lyon. Regarding the presence of low, medium, and high vegetation, the presence of buildings and ground, several buffers close to these factors were tested (5, 10, 20, 25, 50, 100, 200 and 500m). The buffers with the best linear correlations with air temperature for ground are 5m around the measurement points, for low and medium vegetation, and for building 50m and for high vegetation is 100m. The explanatory model of the dependent variable is obtained by multiple linear regression of the remaining explanatory variables (Pearson correlation matrix with a |r| < 0.7 and VIF with < 5) by integrating a stepwise sorting algorithm. Moreover, holdout cross-validation is performed, due to its ability to detect over-fitting of multiple regression, although multiple regression provides internal validation and randomization (80% training, 20% testing). Multiple linear regression explained, on average, 72% of the variance for the study days, with an average RMSE of only 0.20°C. The impact on the model of surface temperature in the estimation of air temperature is the most important variable. Other variables are recurrent such as distance to subway stations, distance to water areas, NDVI, digital elevation model, sky view factor, average vegetation density, or building density. Changing urban morphology influences the city's thermal patterns. The thermal atmosphere in dense urban areas can only be analysed on a microscale to be able to consider the local impact of trees, streets, and buildings. There is currently no network of fixed weather stations sufficiently deployed in central Lyon and most major urban areas. Therefore, it is necessary to use mobile measurements, followed by modelling to characterize the city's multiple thermal environments.Keywords: air temperature, LIDAR, multiple linear regression, surface temperature, urban heat island
Procedia PDF Downloads 137722 Multimodal Data Fusion Techniques in Audiovisual Speech Recognition
Authors: Hadeer M. Sayed, Hesham E. El Deeb, Shereen A. Taie
Abstract:
In the big data era, we are facing a diversity of datasets from different sources in different domains that describe a single life event. These datasets consist of multiple modalities, each of which has a different representation, distribution, scale, and density. Multimodal fusion is the concept of integrating information from multiple modalities in a joint representation with the goal of predicting an outcome through a classification task or regression task. In this paper, multimodal fusion techniques are classified into two main classes: model-agnostic techniques and model-based approaches. It provides a comprehensive study of recent research in each class and outlines the benefits and limitations of each of them. Furthermore, the audiovisual speech recognition task is expressed as a case study of multimodal data fusion approaches, and the open issues through the limitations of the current studies are presented. This paper can be considered a powerful guide for interested researchers in the field of multimodal data fusion and audiovisual speech recognition particularly.Keywords: multimodal data, data fusion, audio-visual speech recognition, neural networks
Procedia PDF Downloads 112721 An Experimental Study on Some Conventional and Hybrid Models of Fuzzy Clustering
Authors: Jeugert Kujtila, Kristi Hoxhalli, Ramazan Dalipi, Erjon Cota, Ardit Murati, Erind Bedalli
Abstract:
Clustering is a versatile instrument in the analysis of collections of data providing insights of the underlying structures of the dataset and enhancing the modeling capabilities. The fuzzy approach to the clustering problem increases the flexibility involving the concept of partial memberships (some value in the continuous interval [0, 1]) of the instances in the clusters. Several fuzzy clustering algorithms have been devised like FCM, Gustafson-Kessel, Gath-Geva, kernel-based FCM, PCM etc. Each of these algorithms has its own advantages and drawbacks, so none of these algorithms would be able to perform superiorly in all datasets. In this paper we will experimentally compare FCM, GK, GG algorithm and a hybrid two-stage fuzzy clustering model combining the FCM and Gath-Geva algorithms. Firstly we will theoretically dis-cuss the advantages and drawbacks for each of these algorithms and we will describe the hybrid clustering model exploiting the advantages and diminishing the drawbacks of each algorithm. Secondly we will experimentally compare the accuracy of the hybrid model by applying it on several benchmark and synthetic datasets.Keywords: fuzzy clustering, fuzzy c-means algorithm (FCM), Gustafson-Kessel algorithm, hybrid clustering model
Procedia PDF Downloads 514720 Land Cover Mapping Using Sentinel-2, Landsat-8 Satellite Images, and Google Earth Engine: A Study Case of the Beterou Catchment
Authors: Ella Sèdé Maforikan
Abstract:
Accurate land cover mapping is essential for effective environmental monitoring and natural resources management. This study focuses on assessing the classification performance of two satellite datasets and evaluating the impact of different input feature combinations on classification accuracy in the Beterou catchment, situated in the northern part of Benin. Landsat-8 and Sentinel-2 images from June 1, 2020, to March 31, 2021, were utilized. Employing the Random Forest (RF) algorithm on Google Earth Engine (GEE), a supervised classification categorized the land into five classes: forest, savannas, cropland, settlement, and water bodies. GEE was chosen due to its high-performance computing capabilities, mitigating computational burdens associated with traditional land cover classification methods. By eliminating the need for individual satellite image downloads and providing access to an extensive archive of remote sensing data, GEE facilitated efficient model training on remote sensing data. The study achieved commendable overall accuracy (OA), ranging from 84% to 85%, even without incorporating spectral indices and terrain metrics into the model. Notably, the inclusion of additional input sources, specifically terrain features like slope and elevation, enhanced classification accuracy. The highest accuracy was achieved with Sentinel-2 (OA = 91%, Kappa = 0.88), slightly surpassing Landsat-8 (OA = 90%, Kappa = 0.87). This underscores the significance of combining diverse input sources for optimal accuracy in land cover mapping. The methodology presented herein not only enables the creation of precise, expeditious land cover maps but also demonstrates the prowess of cloud computing through GEE for large-scale land cover mapping with remarkable accuracy. The study emphasizes the synergy of different input sources to achieve superior accuracy. As a future recommendation, the application of Light Detection and Ranging (LiDAR) technology is proposed to enhance vegetation type differentiation in the Beterou catchment. Additionally, a cross-comparison between Sentinel-2 and Landsat-8 for assessing long-term land cover changes is suggested.Keywords: land cover mapping, Google Earth Engine, random forest, Beterou catchment
Procedia PDF Downloads 63719 Named Entity Recognition System for Tigrinya Language
Authors: Sham Kidane, Fitsum Gaim, Ibrahim Abdella, Sirak Asmerom, Yoel Ghebrihiwot, Simon Mulugeta, Natnael Ambassager
Abstract:
The lack of annotated datasets is a bottleneck to the progress of NLP in low-resourced languages. The work presented here consists of large-scale annotated datasets and models for the named entity recognition (NER) system for the Tigrinya language. Our manually constructed corpus comprises over 340K words tagged for NER, with over 118K of the tokens also having parts-of-speech (POS) tags, annotated with 12 distinct classes of entities, represented using several types of tagging schemes. We conducted extensive experiments covering convolutional neural networks and transformer models; the highest performance achieved is 88.8% weighted F1-score. These results are especially noteworthy given the unique challenges posed by Tigrinya’s distinct grammatical structure and complex word morphologies. The system can be an essential building block for the advancement of NLP systems in Tigrinya and other related low-resourced languages and serve as a bridge for cross-referencing against higher-resourced languages.Keywords: Tigrinya NER corpus, TiBERT, TiRoBERTa, BiLSTM-CRF
Procedia PDF Downloads 130718 Multilabel Classification with Neural Network Ensemble Method
Authors: Sezin Ekşioğlu
Abstract:
Multilabel classification has a huge importance for several applications, it is also a challenging research topic. It is a kind of supervised learning that contains binary targets. The distance between multilabel and binary classification is having more than one class in multilabel classification problems. Features can belong to one class or many classes. There exists a wide range of applications for multi label prediction such as image labeling, text categorization, gene functionality. Even though features are classified in many classes, they may not always be properly classified. There are many ensemble methods for the classification. However, most of the researchers have been concerned about better multilabel methods. Especially little ones focus on both efficiency of classifiers and pairwise relationships at the same time in order to implement better multilabel classification. In this paper, we worked on modified ensemble methods by getting benefit from k-Nearest Neighbors and neural network structure to address issues within a beneficial way and to get better impacts from the multilabel classification. Publicly available datasets (yeast, emotion, scene and birds) are performed to demonstrate the developed algorithm efficiency and the technique is measured by accuracy, F1 score and hamming loss metrics. Our algorithm boosts benchmarks for each datasets with different metrics.Keywords: multilabel, classification, neural network, KNN
Procedia PDF Downloads 155717 Evaluation of NASA POWER and CRU Precipitation and Temperature Datasets over a Desert-prone Yobe River Basin: An Investigation of the Impact of Drought in the North-East Arid Zone of Nigeria
Authors: Yusuf Dawa Sidi, Abdulrahman Bulama Bizi
Abstract:
The most dependable and precise source of climate data is often gauge observation. However, long-term records of gauge observations, on the other hand, are unavailable in many regions around the world. In recent years, a number of gridded climate datasets with high spatial and temporal resolutions have emerged as viable alternatives to gauge-based measurements. However, it is crucial to thoroughly evaluate their performance prior to utilising them in hydroclimatic applications. Therefore, this study aims to assess the effectiveness of NASA Prediction of Worldwide Energy Resources (NASA POWER) and Climate Research Unit (CRU) datasets in accurately estimating precipitation and temperature patterns within the dry region of Nigeria from 1990 to 2020. The study employs widely used statistical metrics and the Standardised Precipitation Index (SPI) to effectively capture the monthly variability of precipitation and temperature and inter-annual anomalies in rainfall. The findings suggest that CRU exhibited superior performance compared to NASA POWER in terms of monthly precipitation and minimum and maximum temperatures, demonstrating a high correlation and much lower error values for both RMSE and MAE. Nevertheless, NASA POWER has exhibited a moderate agreement with gauge observations in accurately replicating monthly precipitation. The analysis of the SPI reveals that the CRU product exhibits superior performance compared to NASA POWER in accurately reflecting inter-annual variations in rainfall anomalies. The findings of this study indicate that the CRU gridded product is often regarded as the most favourable gridded precipitation product.Keywords: CRU, climate change, precipitation, SPI, temperature
Procedia PDF Downloads 89716 Improving Fake News Detection Using K-means and Support Vector Machine Approaches
Authors: Kasra Majbouri Yazdi, Adel Majbouri Yazdi, Saeid Khodayi, Jingyu Hou, Wanlei Zhou, Saeed Saedy
Abstract:
Fake news and false information are big challenges of all types of media, especially social media. There is a lot of false information, fake likes, views and duplicated accounts as big social networks such as Facebook and Twitter admitted. Most information appearing on social media is doubtful and in some cases misleading. They need to be detected as soon as possible to avoid a negative impact on society. The dimensions of the fake news datasets are growing rapidly, so to obtain a better result of detecting false information with less computation time and complexity, the dimensions need to be reduced. One of the best techniques of reducing data size is using feature selection method. The aim of this technique is to choose a feature subset from the original set to improve the classification performance. In this paper, a feature selection method is proposed with the integration of K-means clustering and Support Vector Machine (SVM) approaches which work in four steps. First, the similarities between all features are calculated. Then, features are divided into several clusters. Next, the final feature set is selected from all clusters, and finally, fake news is classified based on the final feature subset using the SVM method. The proposed method was evaluated by comparing its performance with other state-of-the-art methods on several specific benchmark datasets and the outcome showed a better classification of false information for our work. The detection performance was improved in two aspects. On the one hand, the detection runtime process decreased, and on the other hand, the classification accuracy increased because of the elimination of redundant features and the reduction of datasets dimensions.Keywords: clustering, fake news detection, feature selection, machine learning, social media, support vector machine
Procedia PDF Downloads 176715 A Novel Heuristic for Analysis of Large Datasets by Selecting Wrapper-Based Features
Authors: Bushra Zafar, Usman Qamar
Abstract:
Large data sample size and dimensions render the effectiveness of conventional data mining methodologies. A data mining technique are important tools for collection of knowledgeable information from variety of databases and provides supervised learning in the form of classification to design models to describe vital data classes while structure of the classifier is based on class attribute. Classification efficiency and accuracy are often influenced to great extent by noisy and undesirable features in real application data sets. The inherent natures of data set greatly masks its quality analysis and leave us with quite few practical approaches to use. To our knowledge first time, we present a new approach for investigation of structure and quality of datasets by providing a targeted analysis of localization of noisy and irrelevant features of data sets. Machine learning is based primarily on feature selection as pre-processing step which offers us to select few features from number of features as a subset by reducing the space according to certain evaluation criterion. The primary objective of this study is to trim down the scope of the given data sample by searching a small set of important features which may results into good classification performance. For this purpose, a heuristic for wrapper-based feature selection using genetic algorithm and for discriminative feature selection an external classifier are used. Selection of feature based on its number of occurrence in the chosen chromosomes. Sample dataset has been used to demonstrate proposed idea effectively. A proposed method has improved average accuracy of different datasets is about 95%. Experimental results illustrate that proposed algorithm increases the accuracy of prediction of different diseases.Keywords: data mining, generic algorithm, KNN algorithms, wrapper based feature selection
Procedia PDF Downloads 316714 Quality Assurance for the Climate Data Store
Authors: Judith Klostermann, Miguel Segura, Wilma Jans, Dragana Bojovic, Isadora Christel Jimenez, Francisco Doblas-Reyees, Judit Snethlage
Abstract:
The Climate Data Store (CDS), developed by the Copernicus Climate Change Service (C3S) implemented by the European Centre for Medium-Range Weather Forecasts (ECMWF) on behalf of the European Union, is intended to become a key instrument for exploring climate data. The CDS contains both raw and processed data to provide information to the users about the past, present and future climate of the earth. It allows for easy and free access to climate data and indicators, presenting an important asset for scientists and stakeholders on the path for achieving a more sustainable future. The C3S Evaluation and Quality Control (EQC) is assessing the quality of the CDS by undertaking a comprehensive user requirement assessment to measure the users’ satisfaction. Recommendations will be developed for the improvement and expansion of the CDS datasets and products. User requirements will be identified on the fitness of the datasets, the toolbox, and the overall CDS service. The EQC function of the CDS will help C3S to make the service more robust: integrated by validated data that follows high-quality standards while being user-friendly. This function will be closely developed with the users of the service. Through their feedback, suggestions, and contributions, the CDS can become more accessible and meet the requirements for a diverse range of users. Stakeholders and their active engagement are thus an important aspect of CDS development. This will be achieved with direct interactions with users such as meetings, interviews or workshops as well as different feedback mechanisms like surveys or helpdesk services at the CDS. The results provided by the users will be categorized as a function of CDS products so that their specific interests will be monitored and linked to the right product. Through this procedure, we will identify the requirements and criteria for data and products in order to build the correspondent recommendations for the improvement and expansion of the CDS datasets and products.Keywords: climate data store, Copernicus, quality, user engagement
Procedia PDF Downloads 146713 An Unsupervised Domain-Knowledge Discovery Framework for Fake News Detection
Authors: Yulan Wu
Abstract:
With the rapid development of social media, the issue of fake news has gained considerable prominence, drawing the attention of both the public and governments. The widespread dissemination of false information poses a tangible threat across multiple domains of society, including politics, economy, and health. However, much research has concentrated on supervised training models within specific domains, their effectiveness diminishes when applied to identify fake news across multiple domains. To solve this problem, some approaches based on domain labels have been proposed. By segmenting news to their specific area in advance, judges in the corresponding field may be more accurate on fake news. However, these approaches disregard the fact that news records can pertain to multiple domains, resulting in a significant loss of valuable information. In addition, the datasets used for training must all be domain-labeled, which creates unnecessary complexity. To solve these problems, an unsupervised domain knowledge discovery framework for fake news detection is proposed. Firstly, to effectively retain the multidomain knowledge of the text, a low-dimensional vector for each news text to capture domain embeddings is generated. Subsequently, a feature extraction module utilizing the unsupervisedly discovered domain embeddings is used to extract the comprehensive features of news. Finally, a classifier is employed to determine the authenticity of the news. To verify the proposed framework, a test is conducted on the existing widely used datasets, and the experimental results demonstrate that this method is able to improve the detection performance for fake news across multiple domains. Moreover, even in datasets that lack domain labels, this method can still effectively transfer domain knowledge, which can educe the time consumed by tagging without sacrificing the detection accuracy.Keywords: fake news, deep learning, natural language processing, multiple domains
Procedia PDF Downloads 96712 Assessing Building Rooftop Potential for Solar Photovoltaic Energy and Rainwater Harvesting: A Sustainable Urban Plan for Atlantis, Western Cape
Authors: Adedayo Adeleke, Dineo Pule
Abstract:
The ongoing load-shedding in most parts of South Africa, combined with climate change causing severe drought conditions in Cape Town, has left electricity consumers seeking alternative sources of power and water. Solar energy, which is abundant in most parts of South Africa and is regarded as a clean and renewable source of energy, allows for the generation of electricity via solar photovoltaic systems. Rainwater harvesting is the collection and storage of rainwater from building rooftops, allowing people without access to water to collect it. The lack of dependable energy and water source must be addressed by shifting to solar energy via solar photovoltaic systems and rainwater harvesting. Before this can be done, the potential of building rooftops must be assessed to determine whether solar energy and rainwater harvesting will be able to meet or significantly contribute to Atlantis industrial areas' electricity and water demands. This research project presents methods and approaches for automatically extracting building rooftops in Atlantis industrial areas and evaluating their potential for solar photovoltaics and rainwater harvesting systems using Light Detection and Ranging (LiDAR) data and aerial imagery. The four objectives were to: (1) identify an optimal method of extracting building rooftops from aerial imagery and LiDAR data; (2) identify a suitable solar radiation model that can provide a global solar radiation estimate of the study area; (3) estimate solar photovoltaic potential overbuilding rooftop; and (4) estimate the amount of rainwater that can be harvested from the building rooftop in the study area. Mapflow, a plugin found in Quantum Geographic Information System(GIS) was used to automatically extract building rooftops using aerial imagery. The mean annual rainfall in Cape Town was obtained from a 29-year rainfall period (1991- 2020) and used to calculate the amount of rainwater that can be harvested from building rooftops. The potential for rainwater harvesting and solar photovoltaic systems was assessed, and it can be concluded that there is potential for these systems but only to supplement the existing resource supply and offer relief in times of drought and load-shedding.Keywords: roof potential, rainwater harvesting, urban plan, roof extraction
Procedia PDF Downloads 115711 Algorithm for Modelling Land Surface Temperature and Land Cover Classification and Their Interaction
Authors: Jigg Pelayo, Ricardo Villar, Einstine Opiso
Abstract:
The rampant and unintended spread of urban areas resulted in increasing artificial component features in the land cover types of the countryside and bringing forth the urban heat island (UHI). This paved the way to wide range of negative influences on the human health and environment which commonly relates to air pollution, drought, higher energy demand, and water shortage. Land cover type also plays a relevant role in the process of understanding the interaction between ground surfaces with the local temperature. At the moment, the depiction of the land surface temperature (LST) at city/municipality scale particularly in certain areas of Misamis Oriental, Philippines is inadequate as support to efficient mitigations and adaptations of the surface urban heat island (SUHI). Thus, this study purposely attempts to provide application on the Landsat 8 satellite data and low density Light Detection and Ranging (LiDAR) products in mapping out quality automated LST model and crop-level land cover classification in a local scale, through theoretical and algorithm based approach utilizing the principle of data analysis subjected to multi-dimensional image object model. The paper also aims to explore the relationship between the derived LST and land cover classification. The results of the presented model showed the ability of comprehensive data analysis and GIS functionalities with the integration of object-based image analysis (OBIA) approach on automating complex maps production processes with considerable efficiency and high accuracy. The findings may potentially lead to expanded investigation of temporal dynamics of land surface UHI. It is worthwhile to note that the environmental significance of these interactions through combined application of remote sensing, geographic information tools, mathematical morphology and data analysis can provide microclimate perception, awareness and improved decision-making for land use planning and characterization at local and neighborhood scale. As a result, it can aid in facilitating problem identification, support mitigations and adaptations more efficiently.Keywords: LiDAR, OBIA, remote sensing, local scale
Procedia PDF Downloads 282710 Optimizing Machine Learning Through Python Based Image Processing Techniques
Authors: Srinidhi. A, Naveed Ahmed, Twinkle Hareendran, Vriksha Prakash
Abstract:
This work reviews some of the advanced image processing techniques for deep learning applications. Object detection by template matching, image denoising, edge detection, and super-resolution modelling are but a few of the tasks. The paper looks in into great detail, given that such tasks are crucial preprocessing steps that increase the quality and usability of image datasets in subsequent deep learning tasks. We review some of the methods for the assessment of image quality, more specifically sharpness, which is crucial to ensure a robust performance of models. Further, we will discuss the development of deep learning models specific to facial emotion detection, age classification, and gender classification, which essentially includes the preprocessing techniques interrelated with model performance. Conclusions from this study pinpoint the best practices in the preparation of image datasets, targeting the best trade-off between computational efficiency and retaining important image features critical for effective training of deep learning models.Keywords: image processing, machine learning applications, template matching, emotion detection
Procedia PDF Downloads 14709 Privacy Preservation Concerns and Information Disclosure on Social Networks: An Ongoing Research
Authors: Aria Teimourzadeh, Marc Favier, Samaneh Kakavand
Abstract:
The emergence of social networks has revolutionized the exchange of information. Every behavior on these platforms contributes to the generation of data known as social network data that are processed, stored and published by the social network service providers. Hence, it is vital to investigate the role of these platforms in user data by considering the privacy measures, especially when we observe the increased number of individuals and organizations engaging with the current virtual platforms without being aware that the data related to their positioning, connections and behavior is uncovered and used by third parties. Performing analytics on social network datasets may result in the disclosure of confidential information about the individuals or organizations which are the members of these virtual environments. Analyzing separate datasets can reveal private information about relationships, interests and more, especially when the datasets are analyzed jointly. Intentional breaches of privacy is the result of such analysis. Addressing these privacy concerns requires an understanding of the nature of data being accumulated and relevant data privacy regulations, as well as motivations for disclosure of personal information on social network platforms. Some significant points about how user's online information is controlled by the influence of social factors and to what extent the users are concerned about future use of their personal information by the organizations, are highlighted in this paper. Firstly, this research presents a short literature review about the structure of a network and concept of privacy in Online Social Networks. Secondly, the factors of user behavior related to privacy protection and self-disclosure on these virtual communities are presented. In other words, we seek to demonstrates the impact of identified variables on user information disclosure that could be taken into account to explain the privacy preservation of individuals on social networking platforms. Thirdly, a few research directions are discussed to address this topic for new researchers.Keywords: information disclosure, privacy measures, privacy preservation, social network analysis, user experience
Procedia PDF Downloads 281708 Domain Adaptive Dense Retrieval with Query Generation
Authors: Rui Yin, Haojie Wang, Xun Li
Abstract:
Recently, mainstream dense retrieval methods have obtained state-of-the-art results on some datasets and tasks. However, they require large amounts of training data, which is not available in most domains. The severe performance degradation of dense retrievers on new data domains has limited the use of dense retrieval methods to only a few domains with large training datasets. In this paper, we propose an unsupervised domain-adaptive approach based on query generation. First, a generative model is used to generate relevant queries for each passage in the target corpus, and then, the generated queries are used for mining negative passages. Finally, the query-passage pairs are labeled with a cross-encoder and used to train a domain-adapted dense retriever. We also explore contrastive learning as a method for training domain-adapted dense retrievers and show that it leads to strong performance in various retrieval settings. Experiments show that our approach is more robust than previous methods in target domains that require less unlabeled data.Keywords: dense retrieval, query generation, contrastive learning, unsupervised training
Procedia PDF Downloads 104707 Bag of Local Features for Person Re-Identification on Large-Scale Datasets
Authors: Yixiu Liu, Yunzhou Zhang, Jianning Chi, Hao Chu, Rui Zheng, Libo Sun, Guanghao Chen, Fangtong Zhou
Abstract:
In the last few years, large-scale person re-identification has attracted a lot of attention from video surveillance since it has a potential application prospect in public safety management. However, it is still a challenging job considering the variation in human pose, the changing illumination conditions and the lack of paired samples. Although the accuracy has been significantly improved, the data dependence of the sample training is serious. To tackle this problem, a new strategy is proposed based on bag of visual words (BoVW) model of designing the feature representation which has been widely used in the field of image retrieval. The local features are extracted, and more discriminative feature representation is obtained by cross-view dictionary learning (CDL), then the assignment map is obtained through k-means clustering. Finally, the BoVW histograms are formed which encodes the images with the statistics of the feature classes in the assignment map. Experiments conducted on the CUHK03, Market1501 and MARS datasets show that the proposed method performs favorably against existing approaches.Keywords: bag of visual words, cross-view dictionary learning, person re-identification, reranking
Procedia PDF Downloads 195706 Real-Time Big-Data Warehouse a Next-Generation Enterprise Data Warehouse and Analysis Framework
Authors: Abbas Raza Ali
Abstract:
Big Data technology is gradually becoming a dire need of large enterprises. These enterprises are generating massively large amount of off-line and streaming data in both structured and unstructured formats on daily basis. It is a challenging task to effectively extract useful insights from the large scale datasets, even though sometimes it becomes a technology constraint to manage transactional data history of more than a few months. This paper presents a framework to efficiently manage massively large and complex datasets. The framework has been tested on a communication service provider producing massively large complex streaming data in binary format. The communication industry is bound by the regulators to manage history of their subscribers’ call records where every action of a subscriber generates a record. Also, managing and analyzing transactional data allows service providers to better understand their customers’ behavior, for example, deep packet inspection requires transactional internet usage data to explain internet usage behaviour of the subscribers. However, current relational database systems limit service providers to only maintain history at semantic level which is aggregated at subscriber level. The framework addresses these challenges by leveraging Big Data technology which optimally manages and allows deep analysis of complex datasets. The framework has been applied to offload existing Intelligent Network Mediation and relational Data Warehouse of the service provider on Big Data. The service provider has 50+ million subscriber-base with yearly growth of 7-10%. The end-to-end process takes not more than 10 minutes which involves binary to ASCII decoding of call detail records, stitching of all the interrogations against a call (transformations) and aggregations of all the call records of a subscriber.Keywords: big data, communication service providers, enterprise data warehouse, stream computing, Telco IN Mediation
Procedia PDF Downloads 175705 Learning from Small Amount of Medical Data with Noisy Labels: A Meta-Learning Approach
Authors: Gorkem Algan, Ilkay Ulusoy, Saban Gonul, Banu Turgut, Berker Bakbak
Abstract:
Computer vision systems recently made a big leap thanks to deep neural networks. However, these systems require correctly labeled large datasets in order to be trained properly, which is very difficult to obtain for medical applications. Two main reasons for label noise in medical applications are the high complexity of the data and conflicting opinions of experts. Moreover, medical imaging datasets are commonly tiny, which makes each data very important in learning. As a result, if not handled properly, label noise significantly degrades the performance. Therefore, a label-noise-robust learning algorithm that makes use of the meta-learning paradigm is proposed in this article. The proposed solution is tested on retinopathy of prematurity (ROP) dataset with a very high label noise of 68%. Results show that the proposed algorithm significantly improves the classification algorithm's performance in the presence of noisy labels.Keywords: deep learning, label noise, robust learning, meta-learning, retinopathy of prematurity
Procedia PDF Downloads 161704 SiamMask++: More Accurate Object Tracking through Layer Wise Aggregation in Visual Object Tracking
Authors: Hyunbin Choi, Jihyeon Noh, Changwon Lim
Abstract:
In this paper, we propose SiamMask++, an architecture that performs layer-wise aggregation and depth-wise cross-correlation and introduce multi-RPN module and multi-MASK module to improve EAO (Expected Average Overlap), a representative performance evaluation metric for Visual Object Tracking (VOT) challenge. The proposed architecture, SiamMask++, has two versions, namely, bi_SiamMask++, which satisfies the real time (56fps) on systems equipped with GPUs (Titan XP), and rf_SiamMask++, which combines mask refinement modules for EAO improvements. Tests are performed on VOT2016, VOT2018 and VOT2019, the representative datasets of Visual Object Tracking tasks labeled as rotated bounding boxes. SiamMask++ perform better than SiamMask on all the three datasets tested. SiamMask++ is achieved performance of 62.6% accuracy, 26.2% robustness and 39.8% EAO, especially on the VOT2018 dataset. Compared to SiamMask, this is an improvement of 4.18%, 37.17%, 23.99%, respectively. In addition, we do an experimental in-depth analysis of how much the introduction of features and multi modules extracted from the backbone affects the performance of our model in the VOT task.Keywords: visual object tracking, video, deep learning, layer wise aggregation, Siamese network
Procedia PDF Downloads 159703 Systematic Evaluation of Convolutional Neural Network on Land Cover Classification from Remotely Sensed Images
Authors: Eiman Kattan, Hong Wei
Abstract:
In using Convolutional Neural Network (CNN) for classification, there is a set of hyperparameters available for the configuration purpose. This study aims to evaluate the impact of a range of parameters in CNN architecture i.e. AlexNet on land cover classification based on four remotely sensed datasets. The evaluation tests the influence of a set of hyperparameters on the classification performance. The parameters concerned are epoch values, batch size, and convolutional filter size against input image size. Thus, a set of experiments were conducted to specify the effectiveness of the selected parameters using two implementing approaches, named pertained and fine-tuned. We first explore the number of epochs under several selected batch size values (32, 64, 128 and 200). The impact of kernel size of convolutional filters (1, 3, 5, 7, 10, 15, 20, 25 and 30) was evaluated against the image size under testing (64, 96, 128, 180 and 224), which gave us insight of the relationship between the size of convolutional filters and image size. To generalise the validation, four remote sensing datasets, AID, RSD, UCMerced and RSCCN, which have different land covers and are publicly available, were used in the experiments. These datasets have a wide diversity of input data, such as number of classes, amount of labelled data, and texture patterns. A specifically designed interactive deep learning GPU training platform for image classification (Nvidia Digit) was employed in the experiments. It has shown efficiency in both training and testing. The results have shown that increasing the number of epochs leads to a higher accuracy rate, as expected. However, the convergence state is highly related to datasets. For the batch size evaluation, it has shown that a larger batch size slightly decreases the classification accuracy compared to a small batch size. For example, selecting the value 32 as the batch size on the RSCCN dataset achieves the accuracy rate of 90.34 % at the 11th epoch while decreasing the epoch value to one makes the accuracy rate drop to 74%. On the other extreme, setting an increased value of batch size to 200 decreases the accuracy rate at the 11th epoch is 86.5%, and 63% when using one epoch only. On the other hand, selecting the kernel size is loosely related to data set. From a practical point of view, the filter size 20 produces 70.4286%. The last performed image size experiment shows a dependency in the accuracy improvement. However, an expensive performance gain had been noticed. The represented conclusion opens the opportunities toward a better classification performance in various applications such as planetary remote sensing.Keywords: CNNs, hyperparamters, remote sensing, land cover, land use
Procedia PDF Downloads 168702 Hyperspectral Band Selection for Oil Spill Detection Using Deep Neural Network
Authors: Asmau Mukhtar Ahmed, Olga Duran
Abstract:
Hydrocarbon (HC) spills constitute a significant problem that causes great concern to the environment. With the latest technology (hyperspectral images) and state of the earth techniques (image processing tools), hydrocarbon spills can easily be detected at an early stage to mitigate the effects caused by such menace. In this study; a controlled laboratory experiment was used, and clay soil was mixed and homogenized with different hydrocarbon types (diesel, bio-diesel, and petrol). The different mixtures were scanned with HYSPEX hyperspectral camera under constant illumination to generate the hypersectral datasets used for this experiment. So far, the Short Wave Infrared Region (SWIR) has been exploited in detecting HC spills with excellent accuracy. However, the Near-Infrared Region (NIR) is somewhat unexplored with regards to HC contamination and how it affects the spectrum of soils. In this study, Deep Neural Network (DNN) was applied to the controlled datasets to detect and quantify the amount of HC spills in soils in the Near-Infrared Region. The initial results are extremely encouraging because it indicates that the DNN was able to identify features of HC in the Near-Infrared Region with a good level of accuracy.Keywords: hydrocarbon, Deep Neural Network, short wave infrared region, near-infrared region, hyperspectral image
Procedia PDF Downloads 113701 SPARK: An Open-Source Knowledge Discovery Platform That Leverages Non-Relational Databases and Massively Parallel Computational Power for Heterogeneous Genomic Datasets
Authors: Thilina Ranaweera, Enes Makalic, John L. Hopper, Adrian Bickerstaffe
Abstract:
Data are the primary asset of biomedical researchers, and the engine for both discovery and research translation. As the volume and complexity of research datasets increase, especially with new technologies such as large single nucleotide polymorphism (SNP) chips, so too does the requirement for software to manage, process and analyze the data. Researchers often need to execute complicated queries and conduct complex analyzes of large-scale datasets. Existing tools to analyze such data, and other types of high-dimensional data, unfortunately suffer from one or more major problems. They typically require a high level of computing expertise, are too simplistic (i.e., do not fit realistic models that allow for complex interactions), are limited by computing power, do not exploit the computing power of large-scale parallel architectures (e.g. supercomputers, GPU clusters etc.), or are limited in the types of analysis available, compounded by the fact that integrating new analysis methods is not straightforward. Solutions to these problems, such as those developed and implemented on parallel architectures, are currently available to only a relatively small portion of medical researchers with access and know-how. The past decade has seen a rapid expansion of data management systems for the medical domain. Much attention has been given to systems that manage phenotype datasets generated by medical studies. The introduction of heterogeneous genomic data for research subjects that reside in these systems has highlighted the need for substantial improvements in software architecture. To address this problem, we have developed SPARK, an enabling and translational system for medical research, leveraging existing high performance computing resources, and analysis techniques currently available or being developed. It builds these into The Ark, an open-source web-based system designed to manage medical data. SPARK provides a next-generation biomedical data management solution that is based upon a novel Micro-Service architecture and Big Data technologies. The system serves to demonstrate the applicability of Micro-Service architectures for the development of high performance computing applications. When applied to high-dimensional medical datasets such as genomic data, relational data management approaches with normalized data structures suffer from unfeasibly high execution times for basic operations such as insert (i.e. importing a GWAS dataset) and the queries that are typical of the genomics research domain. SPARK resolves these problems by incorporating non-relational NoSQL databases that have been driven by the emergence of Big Data. SPARK provides researchers across the world with user-friendly access to state-of-the-art data management and analysis tools while eliminating the need for high-level informatics and programming skills. The system will benefit health and medical research by eliminating the burden of large-scale data management, querying, cleaning, and analysis. SPARK represents a major advancement in genome research technologies, vastly reducing the burden of working with genomic datasets, and enabling cutting edge analysis approaches that have previously been out of reach for many medical researchers.Keywords: biomedical research, genomics, information systems, software
Procedia PDF Downloads 270700 Extraction of Urban Building Damage Using Spectral, Height and Corner Information
Authors: X. Wang
Abstract:
Timely and accurate information on urban building damage caused by earthquake is important basis for disaster assessment and emergency relief. Very high resolution (VHR) remotely sensed imagery containing abundant fine-scale information offers a large quantity of data for detecting and assessing urban building damage in the aftermath of earthquake disasters. However, the accuracy obtained using spectral features alone is comparatively low, since building damage, intact buildings and pavements are spectrally similar. Therefore, it is of great significance to detect urban building damage effectively using multi-source data. Considering that in general height or geometric structure of buildings change dramatically in the devastated areas, a novel multi-stage urban building damage detection method, using bi-temporal spectral, height and corner information, was proposed in this study. The pre-event height information was generated using stereo VHR images acquired from two different satellites, while the post-event height information was produced from airborne LiDAR data. The corner information was extracted from pre- and post-event panchromatic images. The proposed method can be summarized as follows. To reduce the classification errors caused by spectral similarity and errors in extracting height information, ground surface, shadows, and vegetation were first extracted using the post-event VHR image and height data and were masked out. Two different types of building damage were then extracted from the remaining areas: the height difference between pre- and post-event was used for detecting building damage showing significant height change; the difference in the density of corners between pre- and post-event was used for extracting building damage showing drastic change in geometric structure. The initial building damage result was generated by combining above two building damage results. Finally, a post-processing procedure was adopted to refine the obtained initial result. The proposed method was quantitatively evaluated and compared to two existing methods in Port au Prince, Haiti, which was heavily hit by an earthquake in January 2010, using pre-event GeoEye-1 image, pre-event WorldView-2 image, post-event QuickBird image and post-event LiDAR data. The results showed that the method proposed in this study significantly outperformed the two comparative methods in terms of urban building damage extraction accuracy. The proposed method provides a fast and reliable method to detect urban building collapse, which is also applicable to relevant applications.Keywords: building damage, corner, earthquake, height, very high resolution (VHR)
Procedia PDF Downloads 213699 Shifted Window Based Self-Attention via Swin Transformer for Zero-Shot Learning
Authors: Yasaswi Palagummi, Sareh Rowlands
Abstract:
Generalised Zero-Shot Learning, often known as GZSL, is an advanced variant of zero-shot learning in which the samples in the unseen category may be either seen or unseen. GZSL methods typically have a bias towards the seen classes because they learn a model to perform recognition for both the seen and unseen classes using data samples from the seen classes. This frequently leads to the misclassification of data from the unseen classes into the seen classes, making the task of GZSL more challenging. In this work of ours, to solve the GZSL problem, we propose an approach leveraging the Shifted Window based Self-Attention in the Swin Transformer (Swin-GZSL) to work in the inductive GSZL problem setting. We run experiments on three popular benchmark datasets: CUB, SUN, and AWA2, which are specifically used for ZSL and its other variants. The results show that our model based on Swin Transformer has achieved state-of-the-art harmonic mean for two datasets -AWA2 and SUN and near-state-of-the-art for the other dataset - CUB. More importantly, this technique has a linear computational complexity, which reduces training time significantly. We have also observed less bias than most of the existing GZSL models.Keywords: generalised, zero-shot learning, inductive learning, shifted-window attention, Swin transformer, vision transformer
Procedia PDF Downloads 71698 An Ensemble Deep Learning Architecture for Imbalanced Classification of Thoracic Surgery Patients
Authors: Saba Ebrahimi, Saeed Ahmadian, Hedie Ashrafi
Abstract:
Selecting appropriate patients for surgery is one of the main issues in thoracic surgery (TS). Both short-term and long-term risks and benefits of surgery must be considered in the patient selection criteria. There are some limitations in the existing datasets of TS patients because of missing values of attributes and imbalanced distribution of survival classes. In this study, a novel ensemble architecture of deep learning networks is proposed based on stacking different linear and non-linear layers to deal with imbalance datasets. The categorical and numerical features are split using different layers with ability to shrink the unnecessary features. Then, after extracting the insight from the raw features, a novel biased-kernel layer is applied to reinforce the gradient of the minority class and cause the network to be trained better comparing the current methods. Finally, the performance and advantages of our proposed model over the existing models are examined for predicting patient survival after thoracic surgery using a real-life clinical data for lung cancer patients.Keywords: deep learning, ensemble models, imbalanced classification, lung cancer, TS patient selection
Procedia PDF Downloads 145697 Enhancing Spatial Interpolation: A Multi-Layer Inverse Distance Weighting Model for Complex Regression and Classification Tasks in Spatial Data Analysis
Authors: Yakin Hajlaoui, Richard Labib, Jean-François Plante, Michel Gamache
Abstract:
This study introduces the Multi-Layer Inverse Distance Weighting Model (ML-IDW), inspired by the mathematical formulation of both multi-layer neural networks (ML-NNs) and Inverse Distance Weighting model (IDW). ML-IDW leverages ML-NNs' processing capabilities, characterized by compositions of learnable non-linear functions applied to input features, and incorporates IDW's ability to learn anisotropic spatial dependencies, presenting a promising solution for nonlinear spatial interpolation and learning from complex spatial data. it employ gradient descent and backpropagation to train ML-IDW, comparing its performance against conventional spatial interpolation models such as Kriging and standard IDW on regression and classification tasks using simulated spatial datasets of varying complexity. the results highlight the efficacy of ML-IDW, particularly in handling complex spatial datasets, exhibiting lower mean square error in regression and higher F1 score in classification.Keywords: deep learning, multi-layer neural networks, gradient descent, spatial interpolation, inverse distance weighting
Procedia PDF Downloads 52