Search results for: LiDAR datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 846

Search results for: LiDAR datasets

636 Fake News Detection Based on Fusion of Domain Knowledge and Expert Knowledge

Authors: Yulan Wu

Abstract:

The spread of fake news on social media has posed significant societal harm to the public and the nation, with its threats spanning various domains, including politics, economics, health, and more. News on social media often covers multiple domains, and existing models studied by researchers and relevant organizations often perform well on datasets from a single domain. However, when these methods are applied to social platforms with news spanning multiple domains, their performance significantly deteriorates. Existing research has attempted to enhance the detection performance of multi-domain datasets by adding single-domain labels to the data. However, these methods overlook the fact that a news article typically belongs to multiple domains, leading to the loss of domain knowledge information contained within the news text. To address this issue, research has found that news records in different domains often use different vocabularies to describe their content. In this paper, we propose a fake news detection framework that combines domain knowledge and expert knowledge. Firstly, it utilizes an unsupervised domain discovery module to generate a low-dimensional vector for each news article, representing domain embeddings, which can retain multi-domain knowledge of the news content. Then, a feature extraction module uses the domain embeddings discovered through unsupervised domain knowledge to guide multiple experts in extracting news knowledge for the total feature representation. Finally, a classifier is used to determine whether the news is fake or not. Experiments show that this approach can improve multi-domain fake news detection performance while reducing the cost of manually labeling domain labels.

Keywords: fake news, deep learning, natural language processing, multiple domains

Procedia PDF Downloads 73
635 Tractography Analysis and the Evolutionary Origin of Schizophrenia

Authors: Mouktafi Amine, Tahiri Asmaa

Abstract:

A substantial number of traditional medical research has been put forward to managing and treating mental disorders. At the present time, to our best knowledge, it is believed that a fundamental understanding of the underlying causes of the majority of psychological disorders needs to be explored further to inform early diagnosis, managing symptoms and treatment. The emerging field of evolutionary psychology is a promising prospect to address the origin of mental disorders, potentially leading to more effective treatments. Schizophrenia as a topical mental disorder has been linked to the evolutionary adaptation of the human brain represented in the brain connectivity and asymmetry directly linked to humans' higher brain cognition in contrast to other primates being our direct living representation of the structure and connectivity of our earliest common African ancestors. As proposed in the evolutionary psychology scientific literature, the pathophysiology of schizophrenia is expressed and directly linked to altered connectivity between the Hippocampal Formation (HF) and Dorsolateral Prefrontal Cortex (DLPFC). This research paper presents the results of the use of tractography analysis using multiple open access Diffusion Weighted Imaging (DWI) datasets of healthy subjects, schizophrenia-affected subjects and primates to illustrate the relevance of the aforementioned brain regions' connectivity and the underlying evolutionary changes in the human brain. Deterministic fiber tracking and streamline analysis were used to generate connectivity matrices from the DWI datasets overlaid to compute distances and highlight disconnectivity patterns in conjunction with other fiber tracking metrics: Fractional Anisotropy (FA), Mean Diffusivity (MD) and Radial Diffusivity (RD).

Keywords: tractography, diffusion weighted imaging, schizophrenia, evolutionary psychology

Procedia PDF Downloads 49
634 Using Visualization Techniques to Support Common Clinical Tasks in Clinical Documentation

Authors: Jonah Kenei, Elisha Opiyo

Abstract:

Electronic health records, as a repository of patient information, is nowadays the most commonly used technology to record, store and review patient clinical records and perform other clinical tasks. However, the accurate identification and retrieval of relevant information from clinical records is a difficult task due to the unstructured nature of clinical documents, characterized in particular by a lack of clear structure. Therefore, medical practice is facing a challenge thanks to the rapid growth of health information in electronic health records (EHRs), mostly in narrative text form. As a result, it's becoming important to effectively manage the growing amount of data for a single patient. As a result, there is currently a requirement to visualize electronic health records (EHRs) in a way that aids physicians in clinical tasks and medical decision-making. Leveraging text visualization techniques to unstructured clinical narrative texts is a new area of research that aims to provide better information extraction and retrieval to support clinical decision support in scenarios where data generated continues to grow. Clinical datasets in electronic health records (EHR) offer a lot of potential for training accurate statistical models to classify facets of information which can then be used to improve patient care and outcomes. However, in many clinical note datasets, the unstructured nature of clinical texts is a common problem. This paper examines the very issue of getting raw clinical texts and mapping them into meaningful structures that can support healthcare professionals utilizing narrative texts. Our work is the result of a collaborative design process that was aided by empirical data collected through formal usability testing.

Keywords: classification, electronic health records, narrative texts, visualization

Procedia PDF Downloads 118
633 Factors Affecting Air Surface Temperature Variations in the Philippines

Authors: John Christian Lequiron, Gerry Bagtasa, Olivia Cabrera, Leoncio Amadore, Tolentino Moya

Abstract:

Changes in air surface temperature play an important role in the Philippine’s economy, industry, health, and food production. While increasing global mean temperature in the recent several decades has prompted a number of climate change and variability studies in the Philippines, most studies still focus on rainfall and tropical cyclones. This study aims to investigate the trend and variability of observed air surface temperature and determine its major influencing factor/s in the Philippines. A non-parametric Mann-Kendall trend test was applied to monthly mean temperature of 17 synoptic stations covering 56 years from 1960 to 2015 and a mean change of 0.58 °C or a positive trend of 0.0105 °C/year (p < 0.05) was found. In addition, wavelet decomposition was used to determine the frequency of temperature variability show a 12-month, 30-80-month and more than 120-month cycles. This indicates strong annual variations, interannual variations that coincide with ENSO events, and interdecadal variations that are attributed to PDO and CO2 concentrations. Air surface temperature was also correlated with smoothed sunspot number and galactic cosmic rays, the results show a low to no effect. The influence of ENSO teleconnection on temperature, wind pattern, cloud cover, and outgoing longwave radiation on different ENSO phases had significant effects on regional temperature variability. Particularly, an anomalous anticyclonic (cyclonic) flow east of the Philippines during the peak and decay phase of El Niño (La Niña) events leads to the advection of warm southeasterly (cold northeasterly) air mass over the country. Furthermore, an apparent increasing cloud cover trend is observed over the West Philippine Sea including portions of the Philippines, and this is believed to lessen the effect of the increasing air surface temperature. However, relative humidity was also found to be increasing especially on the central part of the country, which results in a high positive trend of heat index, exacerbating the effects on human discomfort. Finally, an assessment of gridded temperature datasets was done to look at the viability of using three high-resolution datasets in future climate analysis and model calibration and verification. Several error statistics (i.e. Pearson correlation, Bias, MAE, and RMSE) were used for this validation. Results show that gridded temperature datasets generally follows the observed surface temperature change and anomalies. In addition, it is more representative of regional temperature rather than a substitute to station-observed air temperature.

Keywords: air surface temperature, carbon dioxide, ENSO, galactic cosmic rays, smoothed sunspot number

Procedia PDF Downloads 323
632 Similar Script Character Recognition on Kannada and Telugu

Authors: Gurukiran Veerapur, Nytik Birudavolu, Seetharam U. N., Chandravva Hebbi, R. Praneeth Reddy

Abstract:

This work presents a robust approach for the recognition of characters in Telugu and Kannada, two South Indian scripts with structural similarities in characters. To recognize the characters exhaustive datasets are required, but there are only a few publicly available datasets. As a result, we decided to create a dataset for one language (source language),train the model with it, and then test it with the target language.Telugu is the target language in this work, whereas Kannada is the source language. The suggested method makes use of Canny edge features to increase character identification accuracy on pictures with noise and different lighting. A dataset of 45,150 images containing printed Kannada characters was created. The Nudi software was used to automatically generate printed Kannada characters with different writing styles and variations. Manual labelling was employed to ensure the accuracy of the character labels. The deep learning models like CNN (Convolutional Neural Network) and Visual Attention neural network (VAN) are used to experiment with the dataset. A Visual Attention neural network (VAN) architecture was adopted, incorporating additional channels for Canny edge features as the results obtained were good with this approach. The model's accuracy on the combined Telugu and Kannada test dataset was an outstanding 97.3%. Performance was better with Canny edge characteristics applied than with a model that solely used the original grayscale images. The accuracy of the model was found to be 80.11% for Telugu characters and 98.01% for Kannada words when it was tested with these languages. This model, which makes use of cutting-edge machine learning techniques, shows excellent accuracy when identifying and categorizing characters from these scripts.

Keywords: base characters, modifiers, guninthalu, aksharas, vattakshara, VAN

Procedia PDF Downloads 53
631 Audit of TPS photon beam dataset for small field output factors using OSLDs against RPC standard dataset

Authors: Asad Yousuf

Abstract:

Purpose: The aim of the present study was to audit treatment planning system beam dataset for small field output factors against standard dataset produced by radiological physics center (RPC) from a multicenter study. Such data are crucial for validity of special techniques, i.e., IMRT or stereotactic radiosurgery. Materials/Method: In this study, multiple small field size output factor datasets were measured and calculated for 6 to 18 MV x-ray beams using the RPC recommend methods. These beam datasets were measured at 10 cm depth for 10 × 10 cm2 to 2 × 2 cm2 field sizes, defined by collimator jaws at 100 cm. The measurements were made with a Landauer’s nanoDot OSLDs whose volume is small enough to gather a full ionization reading even for the 1×1 cm2 field size. At our institute the beam data including output factors have been commissioned at 5 cm depth with an SAD setup. For comparison with the RPC data, the output factors were converted to an SSD setup using tissue phantom ratios. SSD setup also enables coverage of the ion chamber in 2×2 cm2 field size. The measured output factors were also compared with those calculated by Eclipse™ treatment planning software. Result: The measured and calculated output factors are in agreement with RPC dataset within 1% and 4% respectively. The large discrepancies in TPS reflect the increased challenge in converting measured data into a commissioned beam model for very small fields. Conclusion: OSLDs are simple, durable, and accurate tool to verify doses that delivered using small photon beam fields down to a 1x1 cm2 field sizes. The study emphasizes that the treatment planning system should always be evaluated for small field out factors for the accurate dose delivery in clinical setting.

Keywords: small field dosimetry, optically stimulated luminescence, audit treatment, radiological physics center

Procedia PDF Downloads 327
630 Applied Bayesian Regularized Artificial Neural Network for Up-Scaling Wind Speed Profile and Distribution

Authors: Aghbalou Nihad, Charki Abderafi, Saida Rahali, Reklaoui Kamal

Abstract:

Maximize the benefit from the wind energy potential is the most interest of the wind power stakeholders. As a result, the wind tower size is radically increasing. Nevertheless, choosing an appropriate wind turbine for a selected site require an accurate estimate of vertical wind profile. It is also imperative from cost and maintenance strategy point of view. Then, installing tall towers or even more expensive devices such as LIDAR or SODAR raises the costs of a wind power project. Various models were developed coming within this framework. However, they suffer from complexity, generalization and lacks accuracy. In this work, we aim to investigate the ability of neural network trained using the Bayesian Regularization technique to estimate wind speed profile up to height of 100 m based on knowledge of wind speed lower heights. Results show that the proposed approach can achieve satisfactory predictions and proof the suitability of the proposed method for generating wind speed profile and probability distributions based on knowledge of wind speed at lower heights.

Keywords: bayesian regularization, neural network, wind shear, accuracy

Procedia PDF Downloads 501
629 Observed Changes in Constructed Precipitation at High Resolution in Southern Vietnam

Authors: Nguyen Tien Thanh, Günter Meon

Abstract:

Precipitation plays a key role in water cycle, defining the local climatic conditions and in ecosystem. It is also an important input parameter for water resources management and hydrologic models. With spatial continuous data, a certainty of discharge predictions or other environmental factors is unquestionably better than without. This is, however, not always willingly available to acquire for a small basin, especially for coastal region in Vietnam due to a low network of meteorological stations (30 stations) on long coast of 3260 km2. Furthermore, available gridded precipitation datasets are not fine enough when applying to hydrologic models. Under conditions of global warming, an application of spatial interpolation methods is a crucial for the climate change impact studies to obtain the spatial continuous data. In recent research projects, although some methods can perform better than others do, no methods draw the best results for all cases. The objective of this paper therefore, is to investigate different spatial interpolation methods for daily precipitation over a small basin (approximately 400 km2) located in coastal region, Southern Vietnam and find out the most efficient interpolation method on this catchment. The five different interpolation methods consisting of cressman, ordinary kriging, regression kriging, dual kriging and inverse distance weighting have been applied to identify the best method for the area of study on the spatio-temporal scale (daily, 10 km x 10 km). A 30-year precipitation database was created and merged into available gridded datasets. Finally, observed changes in constructed precipitation were performed. The results demonstrate that the method of ordinary kriging interpolation is an effective approach to analyze the daily precipitation. The mixed trends of increasing and decreasing monthly, seasonal and annual precipitation have documented at significant levels.

Keywords: interpolation, precipitation, trend, vietnam

Procedia PDF Downloads 275
628 Automatic Near-Infrared Image Colorization Using Synthetic Images

Authors: Yoganathan Karthik, Guhanathan Poravi

Abstract:

Colorizing near-infrared (NIR) images poses unique challenges due to the absence of color information and the nuances in light absorption. In this paper, we present an approach to NIR image colorization utilizing a synthetic dataset generated from visible light images. Our method addresses two major challenges encountered in NIR image colorization: accurately colorizing objects with color variations and avoiding over/under saturation in dimly lit scenes. To tackle these challenges, we propose a Generative Adversarial Network (GAN)-based framework that learns to map NIR images to their corresponding colorized versions. The synthetic dataset ensures diverse color representations, enabling the model to effectively handle objects with varying hues and shades. Furthermore, the GAN architecture facilitates the generation of realistic colorizations while preserving the integrity of dimly lit scenes, thus mitigating issues related to over/under saturation. Experimental results on benchmark NIR image datasets demonstrate the efficacy of our approach in producing high-quality colorizations with improved color accuracy and naturalness. Quantitative evaluations and comparative studies validate the superiority of our method over existing techniques, showcasing its robustness and generalization capability across diverse NIR image scenarios. Our research not only contributes to advancing NIR image colorization but also underscores the importance of synthetic datasets and GANs in addressing domain-specific challenges in image processing tasks. The proposed framework holds promise for various applications in remote sensing, medical imaging, and surveillance where accurate color representation of NIR imagery is crucial for analysis and interpretation.

Keywords: computer vision, near-infrared images, automatic image colorization, generative adversarial networks, synthetic data

Procedia PDF Downloads 43
627 Transcriptomine: The Nuclear Receptor Signaling Transcriptome Database

Authors: Scott A. Ochsner, Christopher M. Watkins, Apollo McOwiti, David L. Steffen Lauren B. Becnel, Neil J. McKenna

Abstract:

Understanding signaling by nuclear receptors (NRs) requires an appreciation of their cognate ligand- and tissue-specific transcriptomes. While target gene regulation data are abundant in this field, they reside in hundreds of discrete publications in formats refractory to routine query and analysis and, accordingly, their full value to the NR signaling community has not been realized. One of the mandates of the Nuclear Receptor Signaling Atlas (NURSA) is to facilitate access of the community to existing public datasets. Pursuant to this mandate we are developing a freely-accessible community web resource, Transcriptomine, to bring together the sum total of available expression array and RNA-Seq data points generated by the field in a single location. Transcriptomine currently contains over 25,000,000 gene fold change datapoints from over 1200 contrasts relevant to over 100 NRs, ligands and coregulators in over 200 tissues and cell lines. Transcriptomine is designed to accommodate a spectrum of end users ranging from the bench researcher to those with advanced bioinformatic training. Visualization tools allow users to build custom charts to compare and contrast patterns of gene regulation across different tissues and in response to different ligands. Our resource affords an entirely new paradigm for leveraging gene expression data in the NR signaling field, empowering users to query gene fold changes across diverse regulatory molecules, tissues and cell lines, target genes, biological functions and disease associations, and that would otherwise be prohibitive in terms of time and effort. Transcriptomine will be regularly updated with gene lists from future genome-wide expression array and expression-sequencing datasets in the NR signaling field.

Keywords: target gene database, informatics, gene expression, transcriptomics

Procedia PDF Downloads 273
626 Gene Expression Meta-Analysis of Potential Shared and Unique Pathways Between Autoimmune Diseases Under anti-TNFα Therapy

Authors: Charalabos Antonatos, Mariza Panoutsopoulou, Georgios K. Georgakilas, Evangelos Evangelou, Yiannis Vasilopoulos

Abstract:

The extended tissue damage and severe clinical outcomes of autoimmune diseases, accompanied by the high annual costs to the overall health care system, highlight the need for an efficient therapy. Increasing knowledge over the pathophysiology of specific chronic inflammatory diseases, namely Psoriasis (PsO), Inflammatory Bowel Diseases (IBD) consisting of Crohn’s disease (CD) and Ulcerative colitis (UC), and Rheumatoid Arthritis (RA), has provided insights into the underlying mechanisms that lead to the maintenance of the inflammation, such as Tumor Necrosis Factor alpha (TNF-α). Hence, the anti-TNFα biological agents pose as an ideal therapeutic approach. Despite the efficacy of anti-TNFα agents, several clinical trials have shown that 20-40% of patients do not respond to treatment. Nowadays, high-throughput technologies have been recruited in order to elucidate the complex interactions in multifactorial phenotypes, with the most ubiquitous ones referring to transcriptome quantification analyses. In this context, a random effects meta-analysis of available gene expression cDNA microarray datasets was performed between responders and non-responders to anti-TNFα therapy in patients with IBD, PsO, and RA. Publicly available datasets were systematically searched from inception to 10th of November 2020 and selected for further analysis if they assessed the response to anti-TNFα therapy with clinical score indexes from inflamed biopsies. Specifically, 4 IBD (79 responders/72 non-responders), 3 PsO (40 responders/11 non-responders) and 2 RA (16 responders/6 non-responders) datasetswere selected. After the separate pre-processing of each dataset, 4 separate meta-analyses were conducted; three disease-specific and a single combined meta-analysis on the disease-specific results. The MetaVolcano R package (v.1.8.0) was utilized for a random-effects meta-analysis through theRestricted Maximum Likelihood (RELM) method. The top 1% of the most consistently perturbed genes in the included datasets was highlighted through the TopConfects approach while maintaining a 5% False Discovery Rate (FDR). Genes were considered as Differentialy Expressed (DEGs) as those with P ≤ 0.05, |log2(FC)| ≥ log2(1.25) and perturbed in at least 75% of the included datasets. Over-representation analysis was performed using Gene Ontology and Reactome Pathways for both up- and down-regulated genes in all 4 performed meta-analyses. Protein-Protein interaction networks were also incorporated in the subsequentanalyses with STRING v11.5 and Cytoscape v3.9. Disease-specific meta-analyses detected multiple distinct pro-inflammatory and immune-related down-regulated genes for each disease, such asNFKBIA, IL36, and IRAK1, respectively. Pathway analyses revealed unique and shared pathways between each disease, such as Neutrophil Degranulation and Signaling by Interleukins. The combined meta-analysis unveiled 436 DEGs, 86 out of which were up- and 350 down-regulated, confirming the aforementioned shared pathways and genes, as well as uncovering genes that participate in anti-inflammatory pathways, namely IL-10 signaling. The identification of key biological pathways and regulatory elements is imperative for the accurate prediction of the patient’s response to biological drugs. Meta-analysis of such gene expression data could aid the challenging approach to unravel the complex interactions implicated in the response to anti-TNFα therapy in patients with PsO, IBD, and RA, as well as distinguish gene clusters and pathways that are altered through this heterogeneous phenotype.

Keywords: anti-TNFα, autoimmune, meta-analysis, microarrays

Procedia PDF Downloads 180
625 Coastal Flood Mapping of Vulnerability Due to Sea Level Rise and Extreme Weather Events: A Case Study of St. Ives, UK

Authors: S. Vavias, T. R. Brewer, T. S. Farewell

Abstract:

Coastal floods have been identified as an important natural hazard that can cause significant damage to the populated built-up areas, related infrastructure and also ecosystems and habitats. This study attempts to fill the gap associated with the development of preliminary assessments of coastal flood vulnerability for compliance with the EU Directive on the Assessment and Management of Flood Risks (2007/60/EC). In this context, a methodology has been created by taking into account three major parameters; the maximum wave run-up modelled from historical weather observations, the highest tide according to historic time series, and the sea level rise projections due to climate change. A high resolution digital terrain model (DTM) derived from LIDAR data has been used to integrate the estimated flood events in a GIS environment. The flood vulnerability map created shows potential risk areas and can play a crucial role in the coastal zone planning process. The proposed method has the potential to be a powerful tool for policy and decision makers for spatial planning and strategic management.

Keywords: coastal floods, vulnerability mapping, climate change, extreme weather events

Procedia PDF Downloads 397
624 Feature Evaluation Based on Random Subspace and Multiple-K Ensemble

Authors: Jaehong Yu, Seoung Bum Kim

Abstract:

Clustering analysis can facilitate the extraction of intrinsic patterns in a dataset and reveal its natural groupings without requiring class information. For effective clustering analysis in high dimensional datasets, unsupervised dimensionality reduction is an important task. Unsupervised dimensionality reduction can generally be achieved by feature extraction or feature selection. In many situations, feature selection methods are more appropriate than feature extraction methods because of their clear interpretation with respect to the original features. The unsupervised feature selection can be categorized as feature subset selection and feature ranking method, and we focused on unsupervised feature ranking methods which evaluate the features based on their importance scores. Recently, several unsupervised feature ranking methods were developed based on ensemble approaches to achieve their higher accuracy and stability. However, most of the ensemble-based feature ranking methods require the true number of clusters. Furthermore, these algorithms evaluate the feature importance depending on the ensemble clustering solution, and they produce undesirable evaluation results if the clustering solutions are inaccurate. To address these limitations, we proposed an ensemble-based feature ranking method with random subspace and multiple-k ensemble (FRRM). The proposed FRRM algorithm evaluates the importance of each feature with the random subspace ensemble, and all evaluation results are combined with the ensemble importance scores. Moreover, FRRM does not require the determination of the true number of clusters in advance through the use of the multiple-k ensemble idea. Experiments on various benchmark datasets were conducted to examine the properties of the proposed FRRM algorithm and to compare its performance with that of existing feature ranking methods. The experimental results demonstrated that the proposed FRRM outperformed the competitors.

Keywords: clustering analysis, multiple-k ensemble, random subspace-based feature evaluation, unsupervised feature ranking

Procedia PDF Downloads 339
623 Multi-scale Spatial and Unified Temporal Feature-fusion Network for Multivariate Time Series Anomaly Detection

Authors: Hang Yang, Jichao Li, Kewei Yang, Tianyang Lei

Abstract:

Multivariate time series anomaly detection is a significant research topic in the field of data mining, encompassing a wide range of applications across various industrial sectors such as traffic roads, financial logistics, and corporate production. The inherent spatial dependencies and temporal characteristics present in multivariate time series introduce challenges to the anomaly detection task. Previous studies have typically been based on the assumption that all variables belong to the same spatial hierarchy, neglecting the multi-level spatial relationships. To address this challenge, this paper proposes a multi-scale spatial and unified temporal feature fusion network, denoted as MSUT-Net, for multivariate time series anomaly detection. The proposed model employs a multi-level modeling approach, incorporating both temporal and spatial modules. The spatial module is designed to capture the spatial characteristics of multivariate time series data, utilizing an adaptive graph structure learning model to identify the multi-level spatial relationships between data variables and their attributes. The temporal module consists of a unified temporal processing module, which is tasked with capturing the temporal features of multivariate time series. This module is capable of simultaneously identifying temporal dependencies among different variables. Extensive testing on multiple publicly available datasets confirms that MSUT-Net achieves superior performance on the majority of datasets. Our method is able to model and accurately detect systems data with multi-level spatial relationships from a spatial-temporal perspective, providing a novel perspective for anomaly detection analysis.

Keywords: data mining, industrial system, multivariate time series, anomaly detection

Procedia PDF Downloads 14
622 Rd-PLS Regression: From the Analysis of Two Blocks of Variables to Path Modeling

Authors: E. Tchandao Mangamana, V. Cariou, E. Vigneau, R. Glele Kakai, E. M. Qannari

Abstract:

A new definition of a latent variable associated with a dataset makes it possible to propose variants of the PLS2 regression and the multi-block PLS (MB-PLS). We shall refer to these variants as Rd-PLS regression and Rd-MB-PLS respectively because they are inspired by both Redundancy analysis and PLS regression. Usually, a latent variable t associated with a dataset Z is defined as a linear combination of the variables of Z with the constraint that the length of the loading weights vector equals 1. Formally, t=Zw with ‖w‖=1. Denoting by Z' the transpose of Z, we define herein, a latent variable by t=ZZ’q with the constraint that the auxiliary variable q has a norm equal to 1. This new definition of a latent variable entails that, as previously, t is a linear combination of the variables in Z and, in addition, the loading vector w=Z’q is constrained to be a linear combination of the rows of Z. More importantly, t could be interpreted as a kind of projection of the auxiliary variable q onto the space generated by the variables in Z, since it is collinear to the first PLS1 component of q onto Z. Consider the situation in which we aim to predict a dataset Y from another dataset X. These two datasets relate to the same individuals and are assumed to be centered. Let us consider a latent variable u=YY’q to which we associate the variable t= XX’YY’q. Rd-PLS consists in seeking q (and therefore u and t) so that the covariance between t and u is maximum. The solution to this problem is straightforward and consists in setting q to the eigenvector of YY’XX’YY’ associated with the largest eigenvalue. For the determination of higher order components, we deflate X and Y with respect to the latent variable t. Extending Rd-PLS to the context of multi-block data is relatively easy. Starting from a latent variable u=YY’q, we consider its ‘projection’ on the space generated by the variables of each block Xk (k=1, ..., K) namely, tk= XkXk'YY’q. Thereafter, Rd-MB-PLS seeks q in order to maximize the average of the covariances of u with tk (k=1, ..., K). The solution to this problem is given by q, eigenvector of YY’XX’YY’, where X is the dataset obtained by horizontally merging datasets Xk (k=1, ..., K). For the determination of latent variables of order higher than 1, we use a deflation of Y and Xk with respect to the variable t= XX’YY’q. In the same vein, extending Rd-MB-PLS to the path modeling setting is straightforward. Methods are illustrated on the basis of case studies and performance of Rd-PLS and Rd-MB-PLS in terms of prediction is compared to that of PLS2 and MB-PLS.

Keywords: multiblock data analysis, partial least squares regression, path modeling, redundancy analysis

Procedia PDF Downloads 147
621 Combinational Therapeutic Targeting of BRD4 and CDK7 Synergistically Induces Anticancer Effects in Hepatocellular Carcinoma

Authors: Xinxiu Li, Chuqian Zheng, Yanyan Qian, Hong Fan

Abstract:

Objectives: In hepatocellular carcinoma (HCC), oncogenes are continuously and robustly transcribed due to aberrant expression of essential components of the trans-acting super-enhancers (SE) complex. Preclinical and clinical trials are now being conducted on small-molecule inhibitors that target core-transcriptional components, including as transcriptional bromodomain protein 4 (BRD4) and cyclin-dependent kinase 7 (CDK7), in a number of malignant tumors. This study aims to explore whether co-overexpression of BRD4 and CDK7 is a potential marker of worse prognosis and a combined therapeutic target in HCC. Methods: The expression pattern of BRD4 and CDK7 and their correlation with prognosis in HCC were analyzed by RNA sequencing data and survival data of HCC patients from TCGA and GEO datasets. The protein levels of BRD4 and CDK7 were determined by immunohistochemistry (IHC), and survival data of patients were analyzed using the Kaplan-Meier method. The mRNA expression levels of genes in HCC cell lines were evaluated by quantitative PCR (q-PCR). CCK-8 and colony formation assays were conducted to assess cell proliferation of HCC upon treatment with BRD4 inhibitor JQ1 or/and CDK7 inhibitor THZ1. Results: It was shown that BRD4 and CDK7 were often overexpressed in HCCs and were associated with poor prognosis of HCC by analyzing the TCGA and GEO datasets. BRD4 or CDK7 overexpression was related to a lower survival rate. It's interesting to note that co-overexpression of CDK7 and BRD4 was a worse prognostic factor in HCC. Treatment with JQ1 or THZ1 alone had an inhibitory effect on cell proliferation; however, when JQ1 and THZ1 were combined, there was a more notable suppression of cell growth. At the same time, the combined use of JQ1 and THZ1 synergistically suppresses the expression of HCC driver genes. Conclusion: Our research revealed that BRD4 and CDK7 coupled can be a useful biomarker in HCC prognosis and the combination of JQ1 and THZ1 can be a promising therapeutic therapy against HCC.

Keywords: BRD4, CDK7, cell proliferation, combined inhibition

Procedia PDF Downloads 54
620 Machine Learning Approaches Based on Recency, Frequency, Monetary (RFM) and K-Means for Predicting Electrical Failures and Voltage Reliability in Smart Cities

Authors: Panaya Sudta, Wanchalerm Patanacharoenwong, Prachya Bumrungkun

Abstract:

As With the evolution of smart grids, ensuring the reliability and efficiency of electrical systems in smart cities has become crucial. This paper proposes a distinct approach that combines advanced machine learning techniques to accurately predict electrical failures and address voltage reliability issues. This approach aims to improve the accuracy and efficiency of reliability evaluations in smart cities. The aim of this research is to develop a comprehensive predictive model that accurately predicts electrical failures and voltage reliability in smart cities. This model integrates RFM analysis, K-means clustering, and LSTM networks to achieve this objective. The research utilizes RFM analysis, traditionally used in customer value assessment, to categorize and analyze electrical components based on their failure recency, frequency, and monetary impact. K-means clustering is employed to segment electrical components into distinct groups with similar characteristics and failure patterns. LSTM networks are used to capture the temporal dependencies and patterns in customer data. This integration of RFM, K-means, and LSTM results in a robust predictive tool for electrical failures and voltage reliability. The proposed model has been tested and validated on diverse electrical utility datasets. The results show a significant improvement in prediction accuracy and reliability compared to traditional methods, achieving an accuracy of 92.78% and an F1-score of 0.83. This research contributes to the proactive maintenance and optimization of electrical infrastructures in smart cities. It also enhances overall energy management and sustainability. The integration of advanced machine learning techniques in the predictive model demonstrates the potential for transforming the landscape of electrical system management within smart cities. The research utilizes diverse electrical utility datasets to develop and validate the predictive model. RFM analysis, K-means clustering, and LSTM networks are applied to these datasets to analyze and predict electrical failures and voltage reliability. The research addresses the question of how accurately electrical failures and voltage reliability can be predicted in smart cities. It also investigates the effectiveness of integrating RFM analysis, K-means clustering, and LSTM networks in achieving this goal. The proposed approach presents a distinct, efficient, and effective solution for predicting and mitigating electrical failures and voltage issues in smart cities. It significantly improves prediction accuracy and reliability compared to traditional methods. This advancement contributes to the proactive maintenance and optimization of electrical infrastructures, overall energy management, and sustainability in smart cities.

Keywords: electrical state prediction, smart grids, data-driven method, long short-term memory, RFM, k-means, machine learning

Procedia PDF Downloads 56
619 Arabic Lexicon Learning to Analyze Sentiment in Microblogs

Authors: Mahmoud B. Rokaya

Abstract:

The study of opinion mining and sentiment analysis includes analysis of opinions, sentiments, evaluations, attitudes, and emotions. The rapid growth of social media, social networks, reviews, forum discussions, microblogs, and Twitter, leads to a parallel growth in the field of sentiment analysis. The field of sentiment analysis tries to develop effective tools to make it possible to capture the trends of people. There are two approaches in the field, lexicon-based and corpus-based methods. A lexicon-based method uses a sentiment lexicon which includes sentiment words and phrases with assigned numeric scores. These scores reveal if sentiment phrases are positive or negative, their intensity, and/or their emotional orientations. Creation of manual lexicons is hard. This brings the need for adaptive automated methods for generating a lexicon. The proposed method generates dynamic lexicons based on the corpus and then classifies text using these lexicons. In the proposed method, different approaches are combined to generate lexicons from text. The proposed method classifies the tweets into 5 classes instead of +ve or –ve classes. The sentiment classification problem is written as an optimization problem, finding optimum sentiment lexicons are the goal of the optimization process. The solution was produced based on mathematical programming approaches to find the best lexicon to classify texts. A genetic algorithm was written to find the optimal lexicon. Then, extraction of a meta-level feature was done based on the optimal lexicon. The experiments were conducted on several datasets. Results, in terms of accuracy, recall and F measure, outperformed the state-of-the-art methods proposed in the literature in some of the datasets. A better understanding of the Arabic language and culture of Arab Twitter users and sentiment orientation of words in different contexts can be achieved based on the sentiment lexicons proposed by the algorithm.

Keywords: social media, Twitter sentiment, sentiment analysis, lexicon, genetic algorithm, evolutionary computation

Procedia PDF Downloads 188
618 Employing Remotely Sensed Soil and Vegetation Indices and Predicting ‎by Long ‎Short-Term Memory to Irrigation Scheduling Analysis

Authors: Elham Koohikerade, Silvio Jose Gumiere

Abstract:

In this research, irrigation is highlighted as crucial for improving both the yield and quality of ‎potatoes due to their high sensitivity to soil moisture changes. The study presents a hybrid Long ‎Short-Term Memory (LSTM) model aimed at optimizing irrigation scheduling in potato fields in ‎Quebec City, Canada. This model integrates model-based and satellite-derived datasets to simulate ‎soil moisture content, addressing the limitations of field data. Developed under the guidance of the ‎Food and Agriculture Organization (FAO), the simulation approach compensates for the lack of direct ‎soil sensor data, enhancing the LSTM model's predictions. The model was calibrated using indices ‎like Surface Soil Moisture (SSM), Normalized Vegetation Difference Index (NDVI), Enhanced ‎Vegetation Index (EVI), and Normalized Multi-band Drought Index (NMDI) to effectively forecast ‎soil moisture reductions. Understanding soil moisture and plant development is crucial for assessing ‎drought conditions and determining irrigation needs. This study validated the spectral characteristics ‎of vegetation and soil using ECMWF Reanalysis v5 (ERA5) and Moderate Resolution Imaging ‎Spectrometer (MODIS) data from 2019 to 2023, collected from agricultural areas in Dolbeau and ‎Peribonka, Quebec. Parameters such as surface volumetric soil moisture (0-7 cm), NDVI, EVI, and ‎NMDI were extracted from these images. A regional four-year dataset of soil and vegetation moisture ‎was developed using a machine learning approach combining model-based and satellite-based ‎datasets. The LSTM model predicts soil moisture dynamics hourly across different locations and ‎times, with its accuracy verified through cross-validation and comparison with existing soil moisture ‎datasets. The model effectively captures temporal dynamics, making it valuable for applications ‎requiring soil moisture monitoring over time, such as anomaly detection and memory analysis. By ‎identifying typical peak soil moisture values and observing distribution shapes, irrigation can be ‎scheduled to maintain soil moisture within Volumetric Soil Moisture (VSM) values of 0.25 to 0.30 ‎m²/m², avoiding under and over-watering. The strong correlations between parcels suggest that a ‎uniform irrigation strategy might be effective across multiple parcels, with adjustments based on ‎specific parcel characteristics and historical data trends. The application of the LSTM model to ‎predict soil moisture and vegetation indices yielded mixed results. While the model effectively ‎captures the central tendency and temporal dynamics of soil moisture, it struggles with accurately ‎predicting EVI, NDVI, and NMDI.‎

Keywords: irrigation scheduling, LSTM neural network, remotely sensed indices, soil and vegetation ‎monitoring

Procedia PDF Downloads 41
617 River Habitat Modeling for the Entire Macroinvertebrate Community

Authors: Pinna Beatrice., Laini Alex, Negro Giovanni, Burgazzi Gemma, Viaroli Pierluigi, Vezza Paolo

Abstract:

Habitat models rarely consider macroinvertebrates as ecological targets in rivers. Available approaches mainly focus on single macroinvertebrate species, not addressing the ecological needs and functionality of the entire community. This research aimed to provide an approach to model the habitat of the macroinvertebrate community. The approach is based on the recently developed Flow-T index, together with a Random Forest (RF) regression, which is employed to apply the Flow-T index at the meso-habitat scale. Using different datasets gathered from both field data collection and 2D hydrodynamic simulations, the model has been calibrated in the Trebbia river (2019 campaign), and then validated in the Trebbia, Taro, and Enza rivers (2020 campaign). The three rivers are characterized by a braiding morphology, gravel riverbeds, and summer low flows. The RF model selected 12 mesohabitat descriptors as important for the macroinvertebrate community. These descriptors belong to different frequency classes of water depth, flow velocity, substrate grain size, and connectivity to the main river channel. The cross-validation R² coefficient (R²𝒸ᵥ) of the training dataset is 0.71 for the Trebbia River (2019), whereas the R² coefficient for the validation datasets (Trebbia, Taro, and Enza Rivers 2020) is 0.63. The agreement between the simulated results and the experimental data shows sufficient accuracy and reliability. The outcomes of the study reveal that the model can identify the ecological response of the macroinvertebrate community to possible flow regime alterations and to possible river morphological modifications. Lastly, the proposed approach allows extending the MesoHABSIM methodology, widely used for the fish habitat assessment, to a different ecological target community. Further applications of the approach can be related to flow design in both perennial and non-perennial rivers, including river reaches in which fish fauna is absent.

Keywords: ecological flows, macroinvertebrate community, mesohabitat, river habitat modeling

Procedia PDF Downloads 94
616 Robustness of the Deep Chroma Extractor and Locally-Normalized Quarter Tone Filters in Automatic Chord Estimation under Reverberant Conditions

Authors: Luis Alvarado, Victor Poblete, Isaac Gonzalez, Yetzabeth Gonzalez

Abstract:

In MIREX 2016 (http://www.music-ir.org/mirex), the deep neural network (DNN)-Deep Chroma Extractor, proposed by Korzeniowski and Wiedmer, reached the highest score in an audio chord recognition task. In the present paper, this tool is assessed under acoustic reverberant environments and distinct source-microphone distances. The evaluation dataset comprises The Beatles and Queen datasets. These datasets are sequentially re-recorded with a single microphone in a real reverberant chamber at four reverberation times (0 -anechoic-, 1, 2, and 3 s, approximately), as well as four source-microphone distances (32, 64, 128, and 256 cm). It is expected that the performance of the trained DNN will dramatically decrease under these acoustic conditions with signals degraded by room reverberation and distance to the source. Recently, the effect of the bio-inspired Locally-Normalized Cepstral Coefficients (LNCC), has been assessed in a text independent speaker verification task using speech signals degraded by additive noise at different signal-to-noise ratios with variations of recording distance, and it has also been assessed under reverberant conditions with variations of recording distance. LNCC showed a performance so high as the state-of-the-art Mel Frequency Cepstral Coefficient filters. Based on these results, this paper proposes a variation of locally-normalized triangular filters called Locally-Normalized Quarter Tone (LNQT) filters. By using the LNQT spectrogram, robustness improvements of the trained Deep Chroma Extractor are expected, compared with classical triangular filters, and thus compensating the music signal degradation improving the accuracy of the chord recognition system.

Keywords: chord recognition, deep neural networks, feature extraction, music information retrieval

Procedia PDF Downloads 232
615 Towards Dynamic Estimation of Residential Building Energy Consumption in Germany: Leveraging Machine Learning and Public Data from England and Wales

Authors: Philipp Sommer, Amgad Agoub

Abstract:

The construction sector significantly impacts global CO₂ emissions, particularly through the energy usage of residential buildings. To address this, various governments, including Germany's, are focusing on reducing emissions via sustainable refurbishment initiatives. This study examines the application of machine learning (ML) to estimate energy demands dynamically in residential buildings and enhance the potential for large-scale sustainable refurbishment. A major challenge in Germany is the lack of extensive publicly labeled datasets for energy performance, as energy performance certificates, which provide critical data on building-specific energy requirements and consumption, are not available for all buildings or require on-site inspections. Conversely, England and other countries in the European Union (EU) have rich public datasets, providing a viable alternative for analysis. This research adapts insights from these English datasets to the German context by developing a comprehensive data schema and calibration dataset capable of predicting building energy demand effectively. The study proposes a minimal feature set, determined through feature importance analysis, to optimize the ML model. Findings indicate that ML significantly improves the scalability and accuracy of energy demand forecasts, supporting more effective emissions reduction strategies in the construction industry. Integrating energy performance certificates into municipal heat planning in Germany highlights the transformative impact of data-driven approaches on environmental sustainability. The goal is to identify and utilize key features from open data sources that significantly influence energy demand, creating an efficient forecasting model. Using Extreme Gradient Boosting (XGB) and data from energy performance certificates, effective features such as building type, year of construction, living space, insulation level, and building materials were incorporated. These were supplemented by data derived from descriptions of roofs, walls, windows, and floors, integrated into three datasets. The emphasis was on features accessible via remote sensing, which, along with other correlated characteristics, greatly improved the model's accuracy. The model was further validated using SHapley Additive exPlanations (SHAP) values and aggregated feature importance, which quantified the effects of individual features on the predictions. The refined model using remote sensing data showed a coefficient of determination (R²) of 0.64 and a mean absolute error (MAE) of 4.12, indicating predictions based on efficiency class 1-100 (G-A) may deviate by 4.12 points. This R² increased to 0.84 with the inclusion of more samples, with wall type emerging as the most predictive feature. After optimizing and incorporating related features like estimated primary energy consumption, the R² score for the training and test set reached 0.94, demonstrating good generalization. The study concludes that ML models significantly improve prediction accuracy over traditional methods, illustrating the potential of ML in enhancing energy efficiency analysis and planning. This supports better decision-making for energy optimization and highlights the benefits of developing and refining data schemas using open data to bolster sustainability in the building sector. The study underscores the importance of supporting open data initiatives to collect similar features and support the creation of comparable models in Germany, enhancing the outlook for environmental sustainability.

Keywords: machine learning, remote sensing, residential building, energy performance certificates, data-driven, heat planning

Procedia PDF Downloads 57
614 Simulations to Predict Solar Energy Potential by ERA5 Application at North Africa

Authors: U. Ali Rahoma, Nabil Esawy, Fawzia Ibrahim Moursy, A. H. Hassan, Samy A. Khalil, Ashraf S. Khamees

Abstract:

The design of any solar energy conversion system requires the knowledge of solar radiation data obtained over a long period. Satellite data has been widely used to estimate solar energy where no ground observation of solar radiation is available, yet there are limitations on the temporal coverage of satellite data. Reanalysis is a “retrospective analysis” of the atmosphere parameters generated by assimilating observation data from various sources, including ground observation, satellites, ships, and aircraft observation with the output of NWP (Numerical Weather Prediction) models, to develop an exhaustive record of weather and climate parameters. The evaluation of the performance of reanalysis datasets (ERA-5) for North Africa against high-quality surface measured data was performed using statistical analysis. The estimation of global solar radiation (GSR) distribution over six different selected locations in North Africa during ten years from the period time 2011 to 2020. The root means square error (RMSE), mean bias error (MBE) and mean absolute error (MAE) of reanalysis data of solar radiation range from 0.079 to 0.222, 0.0145 to 0.198, and 0.055 to 0.178, respectively. The seasonal statistical analysis was performed to study seasonal variation of performance of datasets, which reveals the significant variation of errors in different seasons—the performance of the dataset changes by changing the temporal resolution of the data used for comparison. The monthly mean values of data show better performance, but the accuracy of data is compromised. The solar radiation data of ERA-5 is used for preliminary solar resource assessment and power estimation. The correlation coefficient (R2) varies from 0.93 to 99% for the different selected sites in North Africa in the present research. The goal of this research is to give a good representation for global solar radiation to help in solar energy application in all fields, and this can be done by using gridded data from European Centre for Medium-Range Weather Forecasts ECMWF and producing a new model to give a good result.

Keywords: solar energy, solar radiation, ERA-5, potential energy

Procedia PDF Downloads 211
613 Rapid Building Detection in Population-Dense Regions with Overfitted Machine Learning Models

Authors: V. Mantey, N. Findlay, I. Maddox

Abstract:

The quality and quantity of global satellite data have been increasing exponentially in recent years as spaceborne systems become more affordable and the sensors themselves become more sophisticated. This is a valuable resource for many applications, including disaster management and relief. However, while more information can be valuable, the volume of data available is impossible to manually examine. Therefore, the question becomes how to extract as much information as possible from the data with limited manpower. Buildings are a key feature of interest in satellite imagery with applications including telecommunications, population models, and disaster relief. Machine learning tools are fast becoming one of the key resources to solve this problem, and models have been developed to detect buildings in optical satellite imagery. However, by and large, most models focus on affluent regions where buildings are generally larger and constructed further apart. This work is focused on the more difficult problem of detection in populated regions. The primary challenge with detecting small buildings in densely populated regions is both the spatial and spectral resolution of the optical sensor. Densely packed buildings with similar construction materials will be difficult to separate due to a similarity in color and because the physical separation between structures is either non-existent or smaller than the spatial resolution. This study finds that training models until they are overfitting the input sample can perform better in these areas than a more robust, generalized model. An overfitted model takes less time to fine-tune from a generalized pre-trained model and requires fewer input data. The model developed for this study has also been fine-tuned using existing, open-source, building vector datasets. This is particularly valuable in the context of disaster relief, where information is required in a very short time span. Leveraging existing datasets means that little to no manpower or time is required to collect data in the region of interest. The training period itself is also shorter for smaller datasets. Requiring less data means that only a few quality areas are necessary, and so any weaknesses or underpopulated regions in the data can be skipped over in favor of areas with higher quality vectors. In this study, a landcover classification model was developed in conjunction with the building detection tool to provide a secondary source to quality check the detected buildings. This has greatly reduced the false positive rate. The proposed methodologies have been implemented and integrated into a configurable production environment and have been employed for a number of large-scale commercial projects, including continent-wide DEM production, where the extracted building footprints are being used to enhance digital elevation models. Overfitted machine learning models are often considered too specific to have any predictive capacity. However, this study demonstrates that, in cases where input data is scarce, overfitted models can be judiciously applied to solve time-sensitive problems.

Keywords: building detection, disaster relief, mask-RCNN, satellite mapping

Procedia PDF Downloads 169
612 Measures of Phylogenetic Support for Phylogenomic and the Whole Genomes of Two Lungfish Restate Lungfish and Origin of Land Vertebrates

Authors: Yunfeng Shan, Xiaoliang Wang, Youjun Zhou

Abstract:

Whole-genome data from two lungfish species, along with other species, present a valuable opportunity to reassess the longstanding debate regarding the evolutionary relationships among tetrapods, lungfishes, and coelacanths. However, the use of bootstrap support has become outdated for large-scale phylogenomic data. Without robust phylogenetic support, the phylogenetic trees become meaningless. Therefore, it is necessary to re-evaluate the phylogenies of tetrapods, lungfishes, and coelacanths using novel measures of phylogenetic support specifically designed for phylogenomic data, as the previous phylogenies were based on 100% bootstrap support. Our findings consistently provide strong evidence favoring lungfish as the closest living relative of tetrapods. This conclusion is based on high gene support confidence with confidence intervals exceeding 95%, high internode certainty, and high gene concordance factor. The evidence stems from two datasets containing recently deciphered whole genomes of two lungfish species, as well as five previous datasets derived from lungfish transcriptomes. These results yield fresh insights into the three hypotheses regarding the phylogenies of tetrapods, lungfishes, and coelacanths. Importantly, these hypotheses are not mere conjectures but are substantiated by a significant number of genes. Analyzing real biological data further demonstrates that the inclusion of additional taxa diminishes the number of orthologues and leads to more diverse tree topologies. Consequently, gene trees and species trees may not be identical even when whole-genome sequencing data is utilized. However, it is worth noting that many gene trees can accurately reflect the species tree if an appropriate number of taxa, typically ranging from six to ten, are sampled. Therefore, it is crucial to carefully select the number of taxa and an appropriate outgroup while excluding fast-evolving taxa as outgroups to mitigate the adverse effects of long-branch attraction (LBA) and achieve an accurate reconstruction of the species tree. This is particularly important as more whole-genome sequencing data becomes available.

Keywords: gene support confidence (GSC), origin of land vertebrates, coelacanth, two whole genomes of lungfishes, confidence intervals

Procedia PDF Downloads 87
611 A Dataset of Program Educational Objectives Mapped to ABET Outcomes: Data Cleansing, Exploratory Data Analysis and Modeling

Authors: Addin Osman, Anwar Ali Yahya, Mohammed Basit Kamal

Abstract:

Datasets or collections are becoming important assets by themselves and now they can be accepted as a primary intellectual output of a research. The quality and usage of the datasets depend mainly on the context under which they have been collected, processed, analyzed, validated, and interpreted. This paper aims to present a collection of program educational objectives mapped to student’s outcomes collected from self-study reports prepared by 32 engineering programs accredited by ABET. The manual mapping (classification) of this data is a notoriously tedious, time consuming process. In addition, it requires experts in the area, which are mostly not available. It has been shown the operational settings under which the collection has been produced. The collection has been cleansed, preprocessed, some features have been selected and preliminary exploratory data analysis has been performed so as to illustrate the properties and usefulness of the collection. At the end, the collection has been benchmarked using nine of the most widely used supervised multiclass classification techniques (Binary Relevance, Label Powerset, Classifier Chains, Pruned Sets, Random k-label sets, Ensemble of Classifier Chains, Ensemble of Pruned Sets, Multi-Label k-Nearest Neighbors and Back-Propagation Multi-Label Learning). The techniques have been compared to each other using five well-known measurements (Accuracy, Hamming Loss, Micro-F, Macro-F, and Macro-F). The Ensemble of Classifier Chains and Ensemble of Pruned Sets have achieved encouraging performance compared to other experimented multi-label classification methods. The Classifier Chains method has shown the worst performance. To recap, the benchmark has achieved promising results by utilizing preliminary exploratory data analysis performed on the collection, proposing new trends for research and providing a baseline for future studies.

Keywords: ABET, accreditation, benchmark collection, machine learning, program educational objectives, student outcomes, supervised multi-class classification, text mining

Procedia PDF Downloads 172
610 Between Reality and Fiction: Self-Representation as an Avatar and Its Effects on Self-Presence

Authors: Leonie Laskowitz

Abstract:

A self-confident appearance is a basic prerequisite for success in the world of work 4.0. Within a few seconds, people convey a first impression that usually lasts. Artificial intelligence is making it increasingly important how our virtual selves appear and communicate (nonverbally) in digital worlds such as the metaverse. In addition to the modified creation of an avatar, the field of photogrammetry is developing fast, creating exact likenesses of ourselves in virtual environments. Given the importance of self-representation in virtual space for future collaborations, it is important to investigate the impact of phenotype in virtual worlds and how an avatar type can profitably be used situationally. We analyzed the effect of self-similar versus desirable self-presentation as an avatar on one's self-awareness, considering various theoretical constructs in the area of self-awareness and stress stimuli. The avatars were arbitrarily created on the one hand and scanned on the other hand with the help of a lidar sensor, the state-of-the-art photogrammetry method. All subjects were exposed to the established Trier Social Stress Test. The results showed that especially insecure people prefer to create rather than be scanned when confronted with a stressful work situation. (1) If they are in a casual work environment and a relaxed situation, they prefer a 3D photorealistic avatar that reflects them in detail. (2) Confident people will give their avatar their true appearance in any situation, while insecure people would only do so for honesty and authenticity. (3) Thus, the choice of avatar type has considerable impact on self-confidence in different situations.

Keywords: avatar, virtual identity, self-presentation, metaverse, virtual reality, self-awareness

Procedia PDF Downloads 148
609 Fault Diagnosis of Manufacturing Systems Using AntTreeStoch with Parameter Optimization by ACO

Authors: Ouahab Kadri, Leila Hayet Mouss

Abstract:

In this paper, we present three diagnostic modules for complex and dynamic systems. These modules are based on three ant colony algorithms, which are AntTreeStoch, Lumer & Faieta and Binary ant colony. We chose these algorithms for their simplicity and their wide application range. However, we cannot use these algorithms in their basement forms as they have several limitations. To use these algorithms in a diagnostic system, we have proposed three variants. We have tested these algorithms on datasets issued from two industrial systems, which are clinkering system and pasteurization system.

Keywords: ant colony algorithms, complex and dynamic systems, diagnosis, classification, optimization

Procedia PDF Downloads 298
608 Geospatial Multi-Criteria Evaluation to Predict Landslide Hazard Potential in the Catchment of Lake Naivasha, Kenya

Authors: Abdel Rahman Khider Hassan

Abstract:

This paper describes a multi-criteria geospatial model for prediction of landslide hazard zonation (LHZ) for Lake Naivasha catchment (Kenya), based on spatial analysis of integrated datasets of location intrinsic parameters (slope stability factors) and external landslides triggering factors (natural and man-made factors). The intrinsic dataset included: lithology, geometry of slope (slope inclination, aspect, elevation, and curvature) and land use/land cover. The landslides triggering factors included: rainfall as the climatic factor, in addition to the destructive effects reflected by proximity of roads and drainage network to areas that are susceptible to landslides. No published study on landslides has been obtained for this area. Thus, digital datasets of the above spatial parameters were conveniently acquired, stored, manipulated and analyzed in a Geographical Information System (GIS) using a multi-criteria grid overlay technique (in ArcGIS 10.2.2 environment). Deduction of landslide hazard zonation is done by applying weights based on relative contribution of each parameter to the slope instability, and finally, the weighted parameters grids were overlaid together to generate a map of the potential landslide hazard zonation (LHZ) for the lake catchment. From the total surface of 3200 km² of the lake catchment, most of the region (78.7 %; 2518.4 km²) is susceptible to moderate landslide hazards, whilst about 13% (416 km²) is occurring under high hazards. Only 1.0% (32 km²) of the catchment is displaying very high landslide hazards, and the remaining area (7.3 %; 233.6 km²) displays low probability of landslide hazards. This result confirms the importance of steep slope angles, lithology, vegetation land cover and slope orientation (aspect) as the major determining factors of slope failures. The information provided by the produced map of landslide hazard zonation (LHZ) could lay the basis for decision making as well as mitigation and applications in avoiding potential losses caused by landslides in the Lake Naivasha catchment in the Kenya Highlands.

Keywords: decision making, geospatial, landslide, multi-criteria, Naivasha

Procedia PDF Downloads 206
607 Developing Primary Care Datasets for a National Asthma Audit

Authors: Rachael Andrews, Viktoria McMillan, Shuaib Nasser, Christopher M. Roberts

Abstract:

Background and objective: The National Review of Asthma Deaths (NRAD) found that asthma management and care was inadequate in 26% of cases reviewed. Major shortfalls identified were adherence to national guidelines and standards and, particularly, the organisation of care, including supervision and monitoring in primary care, with 70% of cases reviewed having at least one avoidable factor in this area. 5.4 million people in the UK are diagnosed with and actively treated for asthma, and approximately 60,000 are admitted to hospital with acute exacerbations each year. The majority of people with asthma receive management and treatment solely in primary care. This has therefore created concern that many people within the UK are receiving sub-optimal asthma care resulting in unnecessary morbidity and risk of adverse outcome. NRAD concluded that a national asthma audit programme should be established to measure and improve processes, organisation, and outcomes of asthma care. Objective: To develop a primary care dataset enabling extraction of information from GP practices in Wales and providing robust data by which results and lessons could be drawn and drive service development and improvement. Methods: A multidisciplinary group of experts, including general practitioners, primary care organisation representatives, and asthma patients was formed and used as a source of governance and guidance. A review of asthma literature, guidance, and standards took place and was used to identify areas of asthma care which, if improved, would lead to better patient outcomes. Modified Delphi methodology was used to gain consensus from the expert group on which of the areas identified were to be prioritised, and an asthma patient and carer focus group held to seek views and feedback on areas of asthma care that were important to them. Areas of asthma care identified by both groups were mapped to asthma guidelines and standards to inform and develop primary and secondary care datasets covering both adult and pediatric care. Dataset development consisted of expert review and a targeted consultation process in order to seek broad stakeholder views and feedback. Results: Areas of asthma care identified as requiring prioritisation by the National Asthma Audit were: (i) Prescribing, (ii) Asthma diagnosis (iii) Asthma Reviews (iv) Personalised Asthma Action Plans (PAAPs) (v) Primary care follow-up after discharge from hospital (vi) Methodologies and primary care queries were developed to cover each of the areas of poor and variable asthma care identified and the queries designed to extract information directly from electronic patients’ records. Conclusion: This paper describes the methodological approach followed to develop primary care datasets for a National Asthma Audit. It sets out the principles behind the establishment of a National Asthma Audit programme in response to a national asthma mortality review and describes the development activities undertaken. Key process elements included: (i) mapping identified areas of poor and variable asthma care to national guidelines and standards, (ii) early engagement of experts, including clinicians and patients in the process, and (iii) targeted consultation of the queries to provide further insight into measures that were collectable, reproducible and relevant.

Keywords: asthma, primary care, general practice, dataset development

Procedia PDF Downloads 175