Search results for: imbalanced datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 803

Search results for: imbalanced datasets

173 Ending Wars Over Water: Evaluating the Extent to Which Artificial Intelligence Can Be Used to Predict and Prevent Transboundary Water Conflicts

Authors: Akhila Potluru

Abstract:

Worldwide, more than 250 bodies of water are transboundary, meaning they cross the political boundaries of multiple countries. This creates a system of hydrological, economic, and social interdependence between communities reliant on these water sources. Transboundary water conflicts can occur as a result of this intense interdependence. Many factors contribute to the sparking of transboundary water conflicts, ranging from natural hydrological factors to hydro-political interactions. Previous attempts to predict transboundary water conflicts by analysing changes or trends in the contributing factors have typically failed because patterns in the data are hard to identify. However, there is potential for artificial intelligence and machine learning to fill this gap and identify future ‘hotspots’ up to a year in advance by identifying patterns in data where humans can’t. This research determines the extent to which AI can be used to predict and prevent transboundary water conflicts. This is done via a critical literature review of previous case studies and datasets where AI was deployed to predict water conflict. This research not only delivered a more nuanced understanding of previously undervalued factors that contribute toward transboundary water conflicts (in particular, culture and disinformation) but also by detecting conflict early, governance bodies can engage in processes to de-escalate conflict by providing pre-emptive solutions. Looking forward, this gives rise to significant policy implications and water-sharing agreements, which may be able to prevent water conflicts from developing into wide-scale disasters. Additionally, AI can be used to gain a fuller picture of water-based conflicts in areas where security concerns mean it is not possible to have staff on the ground. Therefore, AI enhances not only the depth of our knowledge about transboundary water conflicts but also the breadth of our knowledge. With demand for water constantly growing, competition between countries over shared water will increasingly lead to water conflict. There has never been a more significant time for us to be able to accurately predict and take precautions to prevent global water conflicts.

Keywords: artificial intelligence, machine learning, transboundary water conflict, water management

Procedia PDF Downloads 107
172 Optimizing Perennial Plants Image Classification by Fine-Tuning Deep Neural Networks

Authors: Khairani Binti Supyan, Fatimah Khalid, Mas Rina Mustaffa, Azreen Bin Azman, Amirul Azuani Romle

Abstract:

Perennial plant classification plays a significant role in various agricultural and environmental applications, assisting in plant identification, disease detection, and biodiversity monitoring. Nevertheless, attaining high accuracy in perennial plant image classification remains challenging due to the complex variations in plant appearance, the diverse range of environmental conditions under which images are captured, and the inherent variability in image quality stemming from various factors such as lighting conditions, camera settings, and focus. This paper proposes an adaptation approach to optimize perennial plant image classification by fine-tuning the pre-trained DNNs model. This paper explores the efficacy of fine-tuning prevalent architectures, namely VGG16, ResNet50, and InceptionV3, leveraging transfer learning to tailor the models to the specific characteristics of perennial plant datasets. A subset of the MYLPHerbs dataset consisted of 6 perennial plant species of 13481 images under various environmental conditions that were used in the experiments. Different strategies for fine-tuning, including adjusting learning rates, training set sizes, data augmentation, and architectural modifications, were investigated. The experimental outcomes underscore the effectiveness of fine-tuning deep neural networks for perennial plant image classification, with ResNet50 showcasing the highest accuracy of 99.78%. Despite ResNet50's superior performance, both VGG16 and InceptionV3 achieved commendable accuracy of 99.67% and 99.37%, respectively. The overall outcomes reaffirm the robustness of the fine-tuning approach across different deep neural network architectures, offering insights into strategies for optimizing model performance in the domain of perennial plant image classification.

Keywords: perennial plants, image classification, deep neural networks, fine-tuning, transfer learning, VGG16, ResNet50, InceptionV3

Procedia PDF Downloads 67
171 Source Identification Model Based on Label Propagation and Graph Ordinary Differential Equations

Authors: Fuyuan Ma, Yuhan Wang, Junhe Zhang, Ying Wang

Abstract:

Identifying the sources of information dissemination is a pivotal task in the study of collective behaviors in networks, enabling us to discern and intercept the critical pathways through which information propagates from its origins. This allows for the control of the information’s dissemination impact in its early stages. Numerous methods for source detection rely on pre-existing, underlying propagation models as prior knowledge. Current models that eschew prior knowledge attempt to harness label propagation algorithms to model the statistical characteristics of propagation states or employ Graph Neural Networks (GNNs) for deep reverse modeling of the diffusion process. These approaches are either deficient in modeling the propagation patterns of information or are constrained by the over-smoothing problem inherent in GNNs, which limits the stacking of sufficient model depth to excavate global propagation patterns. Consequently, we introduce the ODESI model. Initially, the model employs a label propagation algorithm to delineate the distribution density of infected states within a graph structure and extends the representation of infected states from integers to state vectors, which serve as the initial states of nodes. Subsequently, the model constructs a deep architecture based on GNNs-coupled Ordinary Differential Equations (ODEs) to model the global propagation patterns of continuous propagation processes. Addressing the challenges associated with solving ODEs on graphs, we approximate the analytical solutions to reduce computational costs. Finally, we conduct simulation experiments on two real-world social network datasets, and the results affirm the efficacy of our proposed ODESI model in source identification tasks.

Keywords: source identification, ordinary differential equations, label propagation, complex networks

Procedia PDF Downloads 22
170 Hybrid CNN-SAR and Lee Filtering for Enhanced InSAR Phase Unwrapping and Coherence Optimization

Authors: Hadj Sahraoui Omar, Kebir Lahcen Wahib, Bennia Ahmed

Abstract:

Interferometric Synthetic Aperture Radar (InSAR) coherence is a crucial parameter for accurately monitoring ground deformation and environmental changes. However, coherence can be degraded by various factors such as temporal decorrelation, atmospheric disturbances, and geometric misalignments, limiting the reliability of InSAR measurements (Omar Hadj‐Sahraoui and al. 2019). To address this challenge, we propose an innovative hybrid approach that combines artificial intelligence (AI) with advanced filtering techniques to optimize interferometric coherence in InSAR data. Specifically, we introduce a Convolutional Neural Network (CNN) integrated with the Lee filter to enhance the performance of radar interferometry. This hybrid method leverages the strength of CNNs to automatically identify and mitigate the primary sources of decorrelation, while the Lee filter effectively reduces speckle noise, improving the overall quality of interferograms. We develop a deep learning-based model trained on multi-temporal and multi-frequency SAR datasets, enabling it to predict coherence patterns and enhance low-coherence regions. This hybrid CNN-SAR with Lee filtering significantly reduces noise and phase unwrapping errors, leading to more precise deformation maps. Experimental results demonstrate that our approach improves coherence by up to 30% compared to traditional filtering techniques, making it a robust solution for challenging scenarios such as urban environments, vegetated areas, and rapidly changing landscapes. Our method has potential applications in geohazard monitoring, urban planning, and environmental studies, offering a new avenue for enhancing InSAR data reliability through AI-powered optimization combined with robust filtering techniques.

Keywords: CNN-SAR, Lee Filter, hybrid optimization, coherence, InSAR phase unwrapping, speckle noise reduction

Procedia PDF Downloads 14
169 Identification of Blood Biomarkers Unveiling Early Alzheimer's Disease Diagnosis Through Single-Cell RNA Sequencing Data and Autoencoders

Authors: Hediyeh Talebi, Shokoofeh Ghiam, Changiz Eslahchi

Abstract:

Traditionally, Alzheimer’s disease research has focused on genes with significant fold changes, potentially neglecting subtle but biologically important alterations. Our study introduces an integrative approach that highlights genes crucial to underlying biological processes, regardless of their fold change magnitude. Alzheimer's Single-cell RNA-seq data related to the peripheral blood mononuclear cells (PBMC) was extracted from the Gene Expression Omnibus (GEO). After quality control, normalization, scaling, batch effect correction, and clustering, differentially expressed genes (DEGs) were identified with adjusted p-values less than 0.05. These DEGs were categorized based on cell-type, resulting in four datasets, each corresponding to a distinct cell type. To distinguish between cells from healthy individuals and those with Alzheimer's, an adversarial autoencoder with a classifier was employed. This allowed for the separation of healthy and diseased samples. To identify the most influential genes in this classification, the weight matrices in the network, which includes the encoder and classifier components, were multiplied, and focused on the top 20 genes. The analysis revealed that while some of these genes exhibit a high fold change, others do not. These genes, which may be overlooked by previous methods due to their low fold change, were shown to be significant in our study. The findings highlight the critical role of genes with subtle alterations in diagnosing Alzheimer's disease, a facet frequently overlooked by conventional methods. These genes demonstrate remarkable discriminatory power, underscoring the need to integrate biological relevance with statistical measures in gene prioritization. This integrative approach enhances our understanding of the molecular mechanisms in Alzheimer’s disease and provides a promising direction for identifying potential therapeutic targets.

Keywords: alzheimer's disease, single-cell RNA-seq, neural networks, blood biomarkers

Procedia PDF Downloads 67
168 Predicting Radioactive Waste Glass Viscosity, Density and Dissolution with Machine Learning

Authors: Joseph Lillington, Tom Gout, Mike Harrison, Ian Farnan

Abstract:

The vitrification of high-level nuclear waste within borosilicate glass and its incorporation within a multi-barrier repository deep underground is widely accepted as the preferred disposal method. However, for this to happen, any safety case will require validation that the initially localized radionuclides will not be considerably released into the near/far-field. Therefore, accurate mechanistic models are necessary to predict glass dissolution, and these should be robust to a variety of incorporated waste species and leaching test conditions, particularly given substantial variations across international waste-streams. Here, machine learning is used to predict glass material properties (viscosity, density) and glass leaching model parameters from large-scale industrial data. A variety of different machine learning algorithms have been compared to assess performance. Density was predicted solely from composition, whereas viscosity additionally considered temperature. To predict suitable glass leaching model parameters, a large simulated dataset was created by coupling MATLAB and the chemical reactive-transport code HYTEC, considering the state-of-the-art GRAAL model (glass reactivity in allowance of the alteration layer). The trained models were then subsequently applied to the large-scale industrial, experimental data to identify potentially appropriate model parameters. Results indicate that ensemble methods can accurately predict viscosity as a function of temperature and composition across all three industrial datasets. Glass density prediction shows reliable learning performance with predictions primarily being within the experimental uncertainty of the test data. Furthermore, machine learning can predict glass dissolution model parameters behavior, demonstrating potential value in GRAAL model development and in assessing suitable model parameters for large-scale industrial glass dissolution data.

Keywords: machine learning, predictive modelling, pattern recognition, radioactive waste glass

Procedia PDF Downloads 117
167 The Systems Biology Verification Endeavor: Harness the Power of the Crowd to Address Computational and Biological Challenges

Authors: Stephanie Boue, Nicolas Sierro, Julia Hoeng, Manuel C. Peitsch

Abstract:

Systems biology relies on large numbers of data points and sophisticated methods to extract biologically meaningful signal and mechanistic understanding. For example, analyses of transcriptomics and proteomics data enable to gain insights into the molecular differences in tissues exposed to diverse stimuli or test items. Whereas the interpretation of endpoints specifically measuring a mechanism is relatively straightforward, the interpretation of big data is more complex and would benefit from comparing results obtained with diverse analysis methods. The sbv IMPROVER project was created to implement solutions to verify systems biology data, methods, and conclusions. Computational challenges leveraging the wisdom of the crowd allow benchmarking methods for specific tasks, such as signature extraction and/or samples classification. Four challenges have already been successfully conducted and confirmed that the aggregation of predictions often leads to better results than individual predictions and that methods perform best in specific contexts. Whenever the scientific question of interest does not have a gold standard, but may greatly benefit from the scientific community to come together and discuss their approaches and results, datathons are set up. The inaugural sbv IMPROVER datathon was held in Singapore on 23-24 September 2016. It allowed bioinformaticians and data scientists to consolidate their ideas and work on the most promising methods as teams, after having initially reflected on the problem on their own. The outcome is a set of visualization and analysis methods that will be shared with the scientific community via the Garuda platform, an open connectivity platform that provides a framework to navigate through different applications, databases and services in biology and medicine. We will present the results we obtained when analyzing data with our network-based method, and introduce a datathon that will take place in Japan to encourage the analysis of the same datasets with other methods to allow for the consolidation of conclusions.

Keywords: big data interpretation, datathon, systems toxicology, verification

Procedia PDF Downloads 278
166 An Artificial Intelligence Framework to Forecast Air Quality

Authors: Richard Ren

Abstract:

Air pollution is a serious danger to international well-being and economies - it will kill an estimated 7 million people every year, costing world economies $2.6 trillion by 2060 due to sick days, healthcare costs, and reduced productivity. In the United States alone, 60,000 premature deaths are caused by poor air quality. For this reason, there is a crucial need to develop effective methods to forecast air quality, which can mitigate air pollution’s detrimental public health effects and associated costs by helping people plan ahead and avoid exposure. The goal of this study is to propose an artificial intelligence framework for predicting future air quality based on timing variables (i.e. season, weekday/weekend), future weather forecasts, as well as past pollutant and air quality measurements. The proposed framework utilizes multiple machine learning algorithms (logistic regression, random forest, neural network) with different specifications and averages the results of the three top-performing models to eliminate inaccuracies, weaknesses, and biases from any one individual model. Over time, the proposed framework uses new data to self-adjust model parameters and increase prediction accuracy. To demonstrate its applicability, a prototype of this framework was created to forecast air quality in Los Angeles, California using datasets from the RP4 weather data repository and EPA pollutant measurement data. The results showed good agreement between the framework’s predictions and real-life observations, with an overall 92% model accuracy. The combined model is able to predict more accurately than any of the individual models, and it is able to reliably forecast season-based variations in air quality levels. Top air quality predictor variables were identified through the measurement of mean decrease in accuracy. This study proposed and demonstrated the efficacy of a comprehensive air quality prediction framework leveraging multiple machine learning algorithms to overcome individual algorithm shortcomings. Future enhancements should focus on expanding and testing a greater variety of modeling techniques within the proposed framework, testing the framework in different locations, and developing a platform to automatically publish future predictions in the form of a web or mobile application. Accurate predictions from this artificial intelligence framework can in turn be used to save and improve lives by allowing individuals to protect their health and allowing governments to implement effective pollution control measures.Air pollution is a serious danger to international wellbeing and economies - it will kill an estimated 7 million people every year, costing world economies $2.6 trillion by 2060 due to sick days, healthcare costs, and reduced productivity. In the United States alone, 60,000 premature deaths are caused by poor air quality. For this reason, there is a crucial need to develop effective methods to forecast air quality, which can mitigate air pollution’s detrimental public health effects and associated costs by helping people plan ahead and avoid exposure. The goal of this study is to propose an artificial intelligence framework for predicting future air quality based on timing variables (i.e. season, weekday/weekend), future weather forecasts, as well as past pollutant and air quality measurements. The proposed framework utilizes multiple machine learning algorithms (logistic regression, random forest, neural network) with different specifications and averages the results of the three top-performing models to eliminate inaccuracies, weaknesses, and biases from any one individual model. Over time, the proposed framework uses new data to self-adjust model parameters and increase prediction accuracy. To demonstrate its applicability, a prototype of this framework was created to forecast air quality in Los Angeles, California using datasets from the RP4 weather data repository and EPA pollutant measurement data. The results showed good agreement between the framework’s predictions and real-life observations, with an overall 92% model accuracy. The combined model is able to predict more accurately than any of the individual models, and it is able to reliably forecast season-based variations in air quality levels. Top air quality predictor variables were identified through the measurement of mean decrease in accuracy. This study proposed and demonstrated the efficacy of a comprehensive air quality prediction framework leveraging multiple machine learning algorithms to overcome individual algorithm shortcomings. Future enhancements should focus on expanding and testing a greater variety of modeling techniques within the proposed framework, testing the framework in different locations, and developing a platform to automatically publish future predictions in the form of a web or mobile application. Accurate predictions from this artificial intelligence framework can in turn be used to save and improve lives by allowing individuals to protect their health and allowing governments to implement effective pollution control measures.Air pollution is a serious danger to international wellbeing and economies - it will kill an estimated 7 million people every year, costing world economies $2.6 trillion by 2060 due to sick days, healthcare costs, and reduced productivity. In the United States alone, 60,000 premature deaths are caused by poor air quality. For this reason, there is a crucial need to develop effective methods to forecast air quality, which can mitigate air pollution’s detrimental public health effects and associated costs by helping people plan ahead and avoid exposure. The goal of this study is to propose an artificial intelligence framework for predicting future air quality based on timing variables (i.e. season, weekday/weekend), future weather forecasts, as well as past pollutant and air quality measurements. The proposed framework utilizes multiple machine learning algorithms (logistic regression, random forest, neural network) with different specifications and averages the results of the three top-performing models to eliminate inaccuracies, weaknesses, and biases from any one individual model. Over time, the proposed framework uses new data to self-adjust model parameters and increase prediction accuracy. To demonstrate its applicability, a prototype of this framework was created to forecast air quality in Los Angeles, California using datasets from the RP4 weather data repository and EPA pollutant measurement data. The results showed good agreement between the framework’s predictions and real-life observations, with an overall 92% model accuracy. The combined model is able to predict more accurately than any of the individual models, and it is able to reliably forecast season-based variations in air quality levels. Top air quality predictor variables were identified through the measurement of mean decrease in accuracy. This study proposed and demonstrated the efficacy of a comprehensive air quality prediction framework leveraging multiple machine learning algorithms to overcome individual algorithm shortcomings. Future enhancements should focus on expanding and testing a greater variety of modeling techniques within the proposed framework, testing the framework in different locations, and developing a platform to automatically publish future predictions in the form of a web or mobile application. Accurate predictions from this artificial intelligence framework can in turn be used to save and improve lives by allowing individuals to protect their health and allowing governments to implement effective pollution control measures.

Keywords: air quality prediction, air pollution, artificial intelligence, machine learning algorithms

Procedia PDF Downloads 130
165 BiFormerDTA: Structural Embedding of Protein in Drug Target Affinity Prediction Using BiFormer

Authors: Leila Baghaarabani, Parvin Razzaghi, Mennatolla Magdy Mostafa, Ahmad Albaqsami, Al Warith Al Rushaidi, Masoud Al Rawahi

Abstract:

Predicting the interaction between drugs and their molecular targets is pivotal for advancing drug development processes. Due to the time and cost limitations, computational approaches have emerged as an effective approach to drug-target interaction (DTI) prediction. Most of the introduced computational based approaches utilize the drug molecule and protein sequence as input. This study does not only utilize these inputs, it also introduces a protein representation developed using a masked protein language model. In this representation, for every individual amino acid residue within the protein sequence, there exists a corresponding probability distribution that indicates the likelihood of each amino acid being present at that particular position. Then, the similarity between each pair of amino-acids is computed to create similarity matrix. To encode the knowledge of the similarity matrix, Bi-Level Routing Attention (BiFormer) is utilized, which combines aspects of transformer-based models with protein sequence analysis and represents a significant advancement in the field of drug-protein interaction prediction. BiFormer has the ability to pinpoint the most effective regions of the protein sequence that are responsible for facilitating interactions between the protein and drugs, thereby enhancing the understanding of these critical interactions. Thus, it appears promising in its ability to capture the local structural relationship of the proteins by enhancing the understanding of how it contributes to drug protein interactions, thereby facilitating more accurate predictions. To evaluate the proposed method, it was tested on two widely recognized datasets: Davis and KIBA. A comprehensive series of experiments was conducted to illustrate its effectiveness in comparison to cuttingedge techniques.

Keywords: BiFormer, transformer, protein language processing, self-attention mechanism, binding affinity, drug target interaction, similarity matrix, protein masked representation, protein language model

Procedia PDF Downloads 15
164 Resisting Adversarial Assaults: A Model-Agnostic Autoencoder Solution

Authors: Massimo Miccoli, Luca Marangoni, Alberto Aniello Scaringi, Alessandro Marceddu, Alessandro Amicone

Abstract:

The susceptibility of deep neural networks (DNNs) to adversarial manipulations is a recognized challenge within the computer vision domain. Adversarial examples, crafted by adding subtle yet malicious alterations to benign images, exploit this vulnerability. Various defense strategies have been proposed to safeguard DNNs against such attacks, stemming from diverse research hypotheses. Building upon prior work, our approach involves the utilization of autoencoder models. Autoencoders, a type of neural network, are trained to learn representations of training data and reconstruct inputs from these representations, typically minimizing reconstruction errors like mean squared error (MSE). Our autoencoder was trained on a dataset of benign examples; learning features specific to them. Consequently, when presented with significantly perturbed adversarial examples, the autoencoder exhibited high reconstruction errors. The architecture of the autoencoder was tailored to the dimensions of the images under evaluation. We considered various image sizes, constructing models differently for 256x256 and 512x512 images. Moreover, the choice of the computer vision model is crucial, as most adversarial attacks are designed with specific AI structures in mind. To mitigate this, we proposed a method to replace image-specific dimensions with a structure independent of both dimensions and neural network models, thereby enhancing robustness. Our multi-modal autoencoder reconstructs the spectral representation of images across the red-green-blue (RGB) color channels. To validate our approach, we conducted experiments using diverse datasets and subjected them to adversarial attacks using models such as ResNet50 and ViT_L_16 from the torch vision library. The autoencoder extracted features used in a classification model, resulting in an MSE (RGB) of 0.014, a classification accuracy of 97.33%, and a precision of 99%.

Keywords: adversarial attacks, malicious images detector, binary classifier, multimodal transformer autoencoder

Procedia PDF Downloads 114
163 Global Solar Irradiance: Data Imputation to Analyze Complementarity Studies of Energy in Colombia

Authors: Jeisson A. Estrella, Laura C. Herrera, Cristian A. Arenas

Abstract:

The Colombian electricity sector has been transforming through the insertion of new energy sources to generate electricity, one of them being solar energy, which is being promoted by companies interested in photovoltaic technology. The study of this technology is important for electricity generation in general and for the planning of the sector from the perspective of energy complementarity. Precisely in this last approach is where the project is located; we are interested in answering the concerns about the reliability of the electrical system when climatic phenomena such as El Niño occur or in defining whether it is viable to replace or expand thermoelectric plants. Reliability of the electrical system when climatic phenomena such as El Niño occur, or to define whether it is viable to replace or expand thermoelectric plants with renewable electricity generation systems. In this regard, some difficulties related to the basic information on renewable energy sources from measured data must first be solved, as these come from automatic weather stations. Basic information on renewable energy sources from measured data, since these come from automatic weather stations administered by the Institute of Hydrology, Meteorology and Environmental Studies (IDEAM) and, in the range of study (2005-2019), have significant amounts of missing data. For this reason, the overall objective of the project is to complete the global solar irradiance datasets to obtain time series to develop energy complementarity analyses in a subsequent project. Global solar irradiance data sets to obtain time series that will allow the elaboration of energy complementarity analyses in the following project. The filling of the databases will be done through numerical and statistical methods, which are basic techniques for undergraduate students in technical areas who are starting out as researchers technical areas who are starting out as researchers.

Keywords: time series, global solar irradiance, imputed data, energy complementarity

Procedia PDF Downloads 71
162 Automatic Generation of Census Enumeration Area and National Sampling Frame to Achieve Sustainable Development Goals

Authors: Sarchil H. Qader, Andrew Harfoot, Mathias Kuepie, Sabrina Juran, Attila Lazar, Andrew J. Tatem

Abstract:

The need for high-quality, reliable, and timely population data, including demographic information, to support the achievement of the sustainable development goals (SDGs) in all countries was recognized by the United Nations' 2030 Agenda for sustainable development. However, many low and middle-income countries lack reliable and recent census data. To achieve reliable and accurate census and survey outputs, up-to-date census enumeration areas and digital national sampling frames are critical. Census enumeration areas (EAs) are the smallest geographic units for collection, disseminating, and analyzing census data and are often used as a national sampling frame to serve various socio-economic surveys. Even for countries that are wealthy and stable, creating and updating EAs is a difficult yet crucial step in preparing for a national census. Such a process is commonly done manually, either by digitizing small geographic units on high-resolution satellite imagery or walking the boundaries of units, both of which are extremely expensive. We have developed a user-friendly tool that could be employed to generate draft EA boundaries automatically. The tool is based on high-resolution gridded population and settlement datasets, GPS household locations, building footprints and uses publicly available natural, man-made and administrative boundaries. Initial outputs were produced in Burkina Faso, Paraguay, Somalia, Togo, Niger, Guinea, and Zimbabwe. The results indicate that the EAs are in line with international standards, including boundaries that are easily identifiable and follow ground features, have no overlaps, are compact and free of pockets and disjoints, and the boundaries are nested within administrative boundaries.

Keywords: enumeration areas, national sampling frame, gridded population data, preEA tool

Procedia PDF Downloads 146
161 Re-Stating the Origin of Tetrapod Using Measures of Phylogenetic Support for Phylogenomic Data

Authors: Yunfeng Shan, Xiaoliang Wang, Youjun Zhou

Abstract:

Whole-genome data from two lungfish species, along with other species, present a valuable opportunity to re-investigate the longstanding debate regarding the evolutionary relationships among tetrapods, lungfishes, and coelacanths. However, the use of bootstrap support has become outdated for large-scale phylogenomic data. Without robust phylogenetic support, the phylogenetic trees become meaningless. Therefore, it is necessary to re-evaluate the phylogenies of tetrapods, lungfishes, and coelacanths using novel measures of phylogenetic support specifically designed for phylogenomic data, as the previous phylogenies were based on 100% bootstrap support. Our findings consistently provide strong evidence favoring lungfish as the closest living relative of tetrapods. This conclusion is based on high internode certainty, relative gene support, and high gene concordance factor. The evidence stems from five previous datasets derived from lungfish transcriptomes. These results yield fresh insights into the three hypotheses regarding the phylogenies of tetrapods, lungfishes, and coelacanths. Importantly, these hypotheses are not mere conjectures but are substantiated by a significant number of genes. Analyzing real biological data further demonstrates that the inclusion of additional taxa leads to more diverse tree topologies. Consequently, gene trees and species trees may not be identical even when whole-genome sequencing data is utilized. However, it is worth noting that many gene trees can accurately reflect the species tree if an appropriate number of taxa, typically ranging from six to ten, are sampled. Therefore, it is crucial to carefully select the number of taxa and an appropriate outgroup, such as slow-evolving species, while excluding fast-evolving taxa as outgroups to mitigate the adverse effects of long-branch attraction and achieve an accurate reconstruction of the species tree. This is particularly important as more whole-genome sequencing data becomes available.

Keywords: novel measures of phylogenetic support for phylogenomic data, gene concordance factor confidence, relative gene support, internode certainty, origin of tetrapods

Procedia PDF Downloads 60
160 Estimating Precipitable Water Vapour Using the Global Positioning System and Radio Occultation over Ethiopian Regions

Authors: Asmamaw Yehun, Tsegaye Gogie, Martin Vermeer, Addisu Hunegnaw

Abstract:

The Global Positioning System (GPS) is a space-based radio positioning system, which is capable of providing continuous position, velocity, and time information to users anywhere on or near the surface of the Earth. The main objective of this work was to estimate the integrated precipitable water vapour (IPWV) using ground GPS and Low Earth Orbit (LEO) Radio Occultation (RO) to study spatial-temporal variability. For LEO-GPS RO, we used Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC) datasets. We estimated the daily and monthly mean of IPWV using six selected ground-based GPS stations over a period of range from 2012 to 2016 (i.e. five-years period). The main perspective for selecting the range period from 2012 to 2016 is that, continuous data were available during these periods at all Ethiopian GPS stations. We studied temporal, seasonal, diurnal, and vertical variations of precipitable water vapour using GPS observables extracted from the precise geodetic GAMIT-GLOBK software package. Finally, we determined the cross-correlation of our GPS-derived IPWV values with those of the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-40 Interim reanalysis and of the second generation National Oceanic and Atmospheric Administration (NOAA) model ensemble Forecast System Reforecast (GEFS/R) for validation and static comparison. There are higher values of the IPWV range from 30 to 37.5 millimetres (mm) in Gambela and Southern Regions of Ethiopia. Some parts of Tigray, Amhara, and Oromia regions had low IPWV ranges from 8.62 to 15.27 mm. The correlation coefficient between GPS-derived IPWV with ECMWF and GEFS/R exceeds 90%. We conclude that there are highly temporal, seasonal, diurnal, and vertical variations of precipitable water vapour in the study area.

Keywords: GNSS, radio occultation, atmosphere, precipitable water vapour

Procedia PDF Downloads 86
159 Personalizing Human Physical Life Routines Recognition over Cloud-based Sensor Data via AI and Machine Learning

Authors: Kaushik Sathupadi, Sandesh Achar

Abstract:

Pervasive computing is a growing research field that aims to acknowledge human physical life routines (HPLR) based on body-worn sensors such as MEMS sensors-based technologies. The use of these technologies for human activity recognition is progressively increasing. On the other hand, personalizing human life routines using numerous machine-learning techniques has always been an intriguing topic. In contrast, various methods have demonstrated the ability to recognize basic movement patterns. However, it still needs to be improved to anticipate the dynamics of human living patterns. This study introduces state-of-the-art techniques for recognizing static and dy-namic patterns and forecasting those challenging activities from multi-fused sensors. Further-more, numerous MEMS signals are extracted from one self-annotated IM-WSHA dataset and two benchmarked datasets. First, we acquired raw data is filtered with z-normalization and denoiser methods. Then, we adopted statistical, local binary pattern, auto-regressive model, and intrinsic time scale decomposition major features for feature extraction from different domains. Next, the acquired features are optimized using maximum relevance and minimum redundancy (mRMR). Finally, the artificial neural network is applied to analyze the whole system's performance. As a result, we attained a 90.27% recognition rate for the self-annotated dataset, while the HARTH and KU-HAR achieved 83% on nine living activities and 90.94% on 18 static and dynamic routines. Thus, the proposed HPLR system outperformed other state-of-the-art systems when evaluated with other methods in the literature.

Keywords: artificial intelligence, machine learning, gait analysis, local binary pattern (LBP), statistical features, micro-electro-mechanical systems (MEMS), maximum relevance and minimum re-dundancy (MRMR)

Procedia PDF Downloads 22
158 AI for Efficient Geothermal Exploration and Utilization

Authors: Velimir Monty Vesselinov, Trais Kliplhuis, Hope Jasperson

Abstract:

Artificial intelligence (AI) is a powerful tool in the geothermal energy sector, aiding in both exploration and utilization. Identifying promising geothermal sites can be challenging due to limited surface indicators and the need for expensive drilling to confirm subsurface resources. Geothermal reservoirs can be located deep underground and exhibit complex geological structures, making traditional exploration methods time-consuming and imprecise. AI algorithms can analyze vast datasets of geological, geophysical, and remote sensing data, including satellite imagery, seismic surveys, geochemistry, geology, etc. Machine learning algorithms can identify subtle patterns and relationships within this data, potentially revealing hidden geothermal potential in areas previously overlooked. To address these challenges, a SIML (Science-Informed Machine Learning) technology has been developed. SIML methods are different from traditional ML techniques. In both cases, the ML models are trained to predict the spatial distribution of an output (e.g., pressure, temperature, heat flux) based on a series of inputs (e.g., permeability, porosity, etc.). The traditional ML (a) relies on deep and wide neural networks (NNs) based on simple algebraic mappings to represent complex processes. In contrast, the SIML neurons incorporate complex mappings (including constitutive relationships and physics/chemistry models). This results in ML models that have a physical meaning and satisfy physics laws and constraints. The prototype of the developed software, called GeoTGO, is accessible through the cloud. Our software prototype demonstrates how different data sources can be made available for processing, executed demonstrative SIML analyses, and presents the results in a table and graphic form.

Keywords: science-informed machine learning, artificial inteligence, exploration, utilization, hidden geothermal

Procedia PDF Downloads 56
157 Comparison of Data Reduction Algorithms for Image-Based Point Cloud Derived Digital Terrain Models

Authors: M. Uysal, M. Yilmaz, I. Tiryakioğlu

Abstract:

Digital Terrain Model (DTM) is a digital numerical representation of the Earth's surface. DTMs have been applied to a diverse field of tasks, such as urban planning, military, glacier mapping, disaster management. In the expression of the Earth' surface as a mathematical model, an infinite number of point measurements are needed. Because of the impossibility of this case, the points at regular intervals are measured to characterize the Earth's surface and DTM of the Earth is generated. Hitherto, the classical measurement techniques and photogrammetry method have widespread use in the construction of DTM. At present, RADAR, LiDAR, and stereo satellite images are also used for the construction of DTM. In recent years, especially because of its superiorities, Airborne Light Detection and Ranging (LiDAR) has an increased use in DTM applications. A 3D point cloud is created with LiDAR technology by obtaining numerous point data. However recently, by the development in image mapping methods, the use of unmanned aerial vehicles (UAV) for photogrammetric data acquisition has increased DTM generation from image-based point cloud. The accuracy of the DTM depends on various factors such as data collection method, the distribution of elevation points, the point density, properties of the surface and interpolation methods. In this study, the random data reduction method is compared for DTMs generated from image based point cloud data. The original image based point cloud data set (100%) is reduced to a series of subsets by using random algorithm, representing the 75, 50, 25 and 5% of the original image based point cloud data set. Over the ANS campus of Afyon Kocatepe University as the test area, DTM constructed from the original image based point cloud data set is compared with DTMs interpolated from reduced data sets by Kriging interpolation method. The results show that the random data reduction method can be used to reduce the image based point cloud datasets to 50% density level while still maintaining the quality of DTM.

Keywords: DTM, Unmanned Aerial Vehicle (UAV), uniform, random, kriging

Procedia PDF Downloads 158
156 Advances in Mathematical Sciences: Unveiling the Power of Data Analytics

Authors: Zahid Ullah, Atlas Khan

Abstract:

The rapid advancements in data collection, storage, and processing capabilities have led to an explosion of data in various domains. In this era of big data, mathematical sciences play a crucial role in uncovering valuable insights and driving informed decision-making through data analytics. The purpose of this abstract is to present the latest advances in mathematical sciences and their application in harnessing the power of data analytics. This abstract highlights the interdisciplinary nature of data analytics, showcasing how mathematics intersects with statistics, computer science, and other related fields to develop cutting-edge methodologies. It explores key mathematical techniques such as optimization, mathematical modeling, network analysis, and computational algorithms that underpin effective data analysis and interpretation. The abstract emphasizes the role of mathematical sciences in addressing real-world challenges across different sectors, including finance, healthcare, engineering, social sciences, and beyond. It showcases how mathematical models and statistical methods extract meaningful insights from complex datasets, facilitating evidence-based decision-making and driving innovation. Furthermore, the abstract emphasizes the importance of collaboration and knowledge exchange among researchers, practitioners, and industry professionals. It recognizes the value of interdisciplinary collaborations and the need to bridge the gap between academia and industry to ensure the practical application of mathematical advancements in data analytics. The abstract highlights the significance of ongoing research in mathematical sciences and its impact on data analytics. It emphasizes the need for continued exploration and innovation in mathematical methodologies to tackle emerging challenges in the era of big data and digital transformation. In summary, this abstract sheds light on the advances in mathematical sciences and their pivotal role in unveiling the power of data analytics. It calls for interdisciplinary collaboration, knowledge exchange, and ongoing research to further unlock the potential of mathematical methodologies in addressing complex problems and driving data-driven decision-making in various domains.

Keywords: mathematical sciences, data analytics, advances, unveiling

Procedia PDF Downloads 95
155 Analyzing the Street Pattern Characteristics on Young People’s Choice to Walk or Not: A Study Based on Accelerometer and Global Positioning Systems Data

Authors: Ebru Cubukcu, Gozde Eksioglu Cetintahra, Burcin Hepguzel Hatip, Mert Cubukcu

Abstract:

Obesity and overweight cause serious health problems. Public and private organizations aim to encourage walking in various ways in order to cope with the problem of obesity and overweight. This study aims to understand how the spatial characteristics of urban street pattern, connectivity and complexity influence young people’s choice to walk or not. 185 public university students in Izmir, the third largest city in Turkey, participated in the study. Each participant had worn an accelerometer and a global positioning (GPS) device for a week. The accelerometer device records data on the intensity of the participant’s activity at a specified time interval, and the GPS device on the activities’ locations. Combining the two datasets, activity maps are derived. These maps are then used to differentiate the participants’ walk trips and motor vehicle trips. Given that, the frequency of walk and motor vehicle trips are calculated at the street segment level, and the street segments are then categorized into two as ‘preferred by pedestrians’ and ‘preferred by motor vehicles’. Graph Theory-based accessibility indices are calculated to quantify the spatial characteristics of the streets in the sample. Six different indices are used: (I) edge density, (II) edge sinuosity, (III) eta index, (IV) node density, (V) order of a node, and (VI) beta index. T-tests show that the index values for the ‘preferred by pedestrians’ and ‘preferred by motor vehicles’ are significantly different. The findings indicate that the spatial characteristics of the street network have a measurable effect on young people’s choice to walk or not. Policy implications are discussed. This study is funded by the Scientific and Technological Research Council of Turkey, Project No: 116K358.

Keywords: graph theory, walkability, accessibility, street network

Procedia PDF Downloads 228
154 Exploring the Applications of Neural Networks in the Adaptive Learning Environment

Authors: Baladitya Swaika, Rahul Khatry

Abstract:

Computer Adaptive Tests (CATs) is one of the most efficient ways for testing the cognitive abilities of students. CATs are based on Item Response Theory (IRT) which is based on item selection and ability estimation using statistical methods of maximum information selection/selection from posterior and maximum-likelihood (ML)/maximum a posteriori (MAP) estimators respectively. This study aims at combining both classical and Bayesian approaches to IRT to create a dataset which is then fed to a neural network which automates the process of ability estimation and then comparing it to traditional CAT models designed using IRT. This study uses python as the base coding language, pymc for statistical modelling of the IRT and scikit-learn for neural network implementations. On creation of the model and on comparison, it is found that the Neural Network based model performs 7-10% worse than the IRT model for score estimations. Although performing poorly, compared to the IRT model, the neural network model can be beneficially used in back-ends for reducing time complexity as the IRT model would have to re-calculate the ability every-time it gets a request whereas the prediction from a neural network could be done in a single step for an existing trained Regressor. This study also proposes a new kind of framework whereby the neural network model could be used to incorporate feature sets, other than the normal IRT feature set and use a neural network’s capacity of learning unknown functions to give rise to better CAT models. Categorical features like test type, etc. could be learnt and incorporated in IRT functions with the help of techniques like logistic regression and can be used to learn functions and expressed as models which may not be trivial to be expressed via equations. This kind of a framework, when implemented would be highly advantageous in psychometrics and cognitive assessments. This study gives a brief overview as to how neural networks can be used in adaptive testing, not only by reducing time-complexity but also by being able to incorporate newer and better datasets which would eventually lead to higher quality testing.

Keywords: computer adaptive tests, item response theory, machine learning, neural networks

Procedia PDF Downloads 176
153 Understanding the Classification of Rain Microstructure and Estimation of Z-R Relationship using a Micro Rain Radar in Tropical Region

Authors: Tomiwa, Akinyemi Clement

Abstract:

Tropical regions experience diverse and complex precipitation patterns, posing significant challenges for accurate rainfall estimation and forecasting. This study addresses the problem of effectively classifying tropical rain types and refining the Z-R (Reflectivity-Rain Rate) relationship to enhance rainfall estimation accuracy. Through a combination of remote sensing, meteorological analysis, and machine learning, the research aims to develop an advanced classification framework capable of distinguishing between different types of tropical rain based on their unique characteristics. This involves utilizing high-resolution satellite imagery, radar data, and atmospheric parameters to categorize precipitation events into distinct classes, providing a comprehensive understanding of tropical rain systems. Additionally, the study seeks to improve the Z-R relationship, a crucial aspect of rainfall estimation. One year of rainfall data was analyzed using a Micro Rain Radar (MRR) located at The Federal University of Technology Akure, Nigeria, measuring rainfall parameters from ground level to a height of 4.8 km with a vertical resolution of 0.16 km. Rain rates were classified into low (stratiform) and high (convective) based on various microstructural attributes such as rain rates, liquid water content, Drop Size Distribution (DSD), average fall speed of the drops, and radar reflectivity. By integrating diverse datasets and employing advanced statistical techniques, the study aims to enhance the precision of Z-R models, offering a more reliable means of estimating rainfall rates from radar reflectivity data. This refined Z-R relationship holds significant potential for improving our understanding of tropical rain systems and enhancing forecasting accuracy in regions prone to heavy precipitation.

Keywords: remote sensing, precipitation, drop size distribution, micro rain radar

Procedia PDF Downloads 40
152 Improving Cell Type Identification of Single Cell Data by Iterative Graph-Based Noise Filtering

Authors: Annika Stechemesser, Rachel Pounds, Emma Lucas, Chris Dawson, Julia Lipecki, Pavle Vrljicak, Jan Brosens, Sean Kehoe, Jason Yap, Lawrence Young, Sascha Ott

Abstract:

Advances in technology make it now possible to retrieve the genetic information of thousands of single cancerous cells. One of the key challenges in single cell analysis of cancerous tissue is to determine the number of different cell types and their characteristic genes within the sample to better understand the tumors and their reaction to different treatments. For this analysis to be possible, it is crucial to filter out background noise as it can severely blur the downstream analysis and give misleading results. In-depth analysis of the state-of-the-art filtering methods for single cell data showed that they do, in some cases, not separate noisy and normal cells sufficiently. We introduced an algorithm that filters and clusters single cell data simultaneously without relying on certain genes or thresholds chosen by eye. It detects communities in a Shared Nearest Neighbor similarity network, which captures the similarities and dissimilarities of the cells by optimizing the modularity and then identifies and removes vertices with a weak clustering belonging. This strategy is based on the fact that noisy data instances are very likely to be similar to true cell types but do not match any of these wells. Once the clustering is complete, we apply a set of evaluation metrics on the cluster level and accept or reject clusters based on the outcome. The performance of our algorithm was tested on three datasets and led to convincing results. We were able to replicate the results on a Peripheral Blood Mononuclear Cells dataset. Furthermore, we applied the algorithm to two samples of ovarian cancer from the same patient before and after chemotherapy. Comparing the standard approach to our algorithm, we found a hidden cell type in the ovarian postchemotherapy data with interesting marker genes that are potentially relevant for medical research.

Keywords: cancer research, graph theory, machine learning, single cell analysis

Procedia PDF Downloads 114
151 Two-Level Graph Causality to Detect and Predict Random Cyber-Attacks

Authors: Van Trieu, Shouhuai Xu, Yusheng Feng

Abstract:

Tracking attack trajectories can be difficult, with limited information about the nature of the attack. Even more difficult as attack information is collected by Intrusion Detection Systems (IDSs) due to the current IDSs having some limitations in identifying malicious and anomalous traffic. Moreover, IDSs only point out the suspicious events but do not show how the events relate to each other or which event possibly cause the other event to happen. Because of this, it is important to investigate new methods capable of performing the tracking of attack trajectories task quickly with less attack information and dependency on IDSs, in order to prioritize actions during incident responses. This paper proposes a two-level graph causality framework for tracking attack trajectories in internet networks by leveraging observable malicious behaviors to detect what is the most probable attack events that can cause another event to occur in the system. Technically, given the time series of malicious events, the framework extracts events with useful features, such as attack time and port number, to apply to the conditional independent tests to detect the relationship between attack events. Using the academic datasets collected by IDSs, experimental results show that the framework can quickly detect the causal pairs that offer meaningful insights into the nature of the internet network, given only reasonable restrictions on network size and structure. Without the framework’s guidance, these insights would not be able to discover by the existing tools, such as IDSs. It would cost expert human analysts a significant time if possible. The computational results from the proposed two-level graph network model reveal the obvious pattern and trends. In fact, more than 85% of causal pairs have the average time difference between the causal and effect events in both computed and observed data within 5 minutes. This result can be used as a preventive measure against future attacks. Although the forecast may be short, from 0.24 seconds to 5 minutes, it is long enough to be used to design a prevention protocol to block those attacks.

Keywords: causality, multilevel graph, cyber-attacks, prediction

Procedia PDF Downloads 157
150 Effect of Climate Variability on Children Health Outcomes in Rural Uganda

Authors: Emily Injete Amondo, Alisher Mirzabaev, Emmanuel Rukundo

Abstract:

Children in rural farming households are often vulnerable to a multitude of risks, including health risks associated with climate change and variability. Cognizant of this, this study empirically traced the relationship between climate variability and nutritional health outcomes in rural children while identifying the cause-and-effect transmission mechanisms. We combined four waves of the rich Uganda National Panel Survey (UNPS), part of the World Bank Living Standards Measurement Studies (LSMS) for the period 2009-2014, with long-term and high-frequency rainfall and temperature datasets. Self-reported drought and flood shock variables were further used in separate regressions for triangulation purposes and robustness checks. Panel fixed effects regressions were applied in the empirical analysis, accounting for a variety of causal identification issues. The results showed significant negative outcomes for children’s anthropometric measurements due to the impacts of moderate and extreme droughts, extreme wet spells, and heatwaves. On the contrary, moderate wet spells were positively linked with nutritional measures. Agricultural production and child diarrhea were the main transmission channels, with heatwaves, droughts, and high rainfall variability negatively affecting crop output. The probability of diarrhea was positively related to increases in temperature and dry spells. Results further revealed that children in households who engaged in ex-ante or anticipatory risk-reducing strategies such as savings had better health outcomes as opposed to those engaged in ex-post coping such as involuntary change of diet. These results highlight the importance of adaptation in smoothing the harmful effects of climate variability on the health of rural households and children in Uganda.

Keywords: extreme weather events, undernutrition, diarrhea, agricultural production, gridded weather data

Procedia PDF Downloads 103
149 Mathematics Bridging Theory and Applications for a Data-Driven World

Authors: Zahid Ullah, Atlas Khan

Abstract:

In today's data-driven world, the role of mathematics in bridging the gap between theory and applications is becoming increasingly vital. This abstract highlights the significance of mathematics as a powerful tool for analyzing, interpreting, and extracting meaningful insights from vast amounts of data. By integrating mathematical principles with real-world applications, researchers can unlock the full potential of data-driven decision-making processes. This abstract delves into the various ways mathematics acts as a bridge connecting theoretical frameworks to practical applications. It explores the utilization of mathematical models, algorithms, and statistical techniques to uncover hidden patterns, trends, and correlations within complex datasets. Furthermore, it investigates the role of mathematics in enhancing predictive modeling, optimization, and risk assessment methodologies for improved decision-making in diverse fields such as finance, healthcare, engineering, and social sciences. The abstract also emphasizes the need for interdisciplinary collaboration between mathematicians, statisticians, computer scientists, and domain experts to tackle the challenges posed by the data-driven landscape. By fostering synergies between these disciplines, novel approaches can be developed to address complex problems and make data-driven insights accessible and actionable. Moreover, this abstract underscores the importance of robust mathematical foundations for ensuring the reliability and validity of data analysis. Rigorous mathematical frameworks not only provide a solid basis for understanding and interpreting results but also contribute to the development of innovative methodologies and techniques. In summary, this abstract advocates for the pivotal role of mathematics in bridging theory and applications in a data-driven world. By harnessing mathematical principles, researchers can unlock the transformative potential of data analysis, paving the way for evidence-based decision-making, optimized processes, and innovative solutions to the challenges of our rapidly evolving society.

Keywords: mathematics, bridging theory and applications, data-driven world, mathematical models

Procedia PDF Downloads 77
148 Evaluation of Video Quality Metrics and Performance Comparison on Contents Taken from Most Commonly Used Devices

Authors: Pratik Dhabal Deo, Manoj P.

Abstract:

With the increasing number of social media users, the amount of video content available has also significantly increased. Currently, the number of smartphone users is at its peak, and many are increasingly using their smartphones as their main photography and recording devices. There have been a lot of developments in the field of Video Quality Assessment (VQA) and metrics like VMAF, SSIM etc. are said to be some of the best performing metrics, but the evaluation of these metrics is dominantly done on professionally taken video contents using professional tools, lighting conditions etc. No study particularly pinpointing the performance of the metrics on the contents taken by users on very commonly available devices has been done. Datasets that contain a huge number of videos from different high-end devices make it difficult to analyze the performance of the metrics on the content from most used devices even if they contain contents taken in poor lighting conditions using lower-end devices. These devices face a lot of distortions due to various factors since the spectrum of contents recorded on these devices is huge. In this paper, we have presented an analysis of the objective VQA metrics on contents taken only from most used devices and their performance on them, focusing on full-reference metrics. To carry out this research, we created a custom dataset containing a total of 90 videos that have been taken from three most commonly used devices, and android smartphone, an IOS smartphone and a DSLR. On the videos taken on each of these devices, the six most common types of distortions that users face have been applied on addition to already existing H.264 compression based on four reference videos. These six applied distortions have three levels of degradation each. A total of the five most popular VQA metrics have been evaluated on this dataset and the highest values and the lowest values of each of the metrics on the distortions have been recorded. Finally, it is found that blur is the artifact on which most of the metrics didn’t perform well. Thus, in order to understand the results better the amount of blur in the data set has been calculated and an additional evaluation of the metrics was done using HEVC codec, which is the next version of H.264 compression, on the camera that proved to be the sharpest among the devices. The results have shown that as the resolution increases, the performance of the metrics tends to become more accurate and the best performing metric among them is VQM with very few inconsistencies and inaccurate results when the compression applied is H.264, but when the compression is applied is HEVC, SSIM and VMAF have performed significantly better.

Keywords: distortion, metrics, performance, resolution, video quality assessment

Procedia PDF Downloads 204
147 Multi-Elemental Analysis Using Inductively Coupled Plasma Mass Spectrometry for the Geographical Origin Discrimination of Greek Giant Beans “Gigantes Elefantes”

Authors: Eleni C. Mazarakioti, Anastasios Zotos, Anna-Akrivi Thomatou, Efthimios Kokkotos, Achilleas Kontogeorgos, Athanasios Ladavos, Angelos Patakas

Abstract:

“Gigantes Elefantes” is a particularly dynamic crop of giant beans cultivated in western Macedonia (Greece). This variety of large beans growing in this area and specifically in the regions of Prespes and Kastoria is a protected designation of origin (PDO) species with high nutritional quality. Mislabeling of geographical origin and blending with unidentified samples are common fraudulent practices in Greek food market with financial and possible health consequences. In the last decades, multi-elemental composition analysis has been used in identifying the geographical origin of foods and agricultural products. In an attempt to discriminate the authenticity of Greek beans, multi-elemental analysis (Ag, Al, As, B, Ba, Be, Ca, Cd, Co, Cr, Cs, Cu, Fe, Ga, Ge, K, Li, Mg, Mn, Mo, Na, Nb, Ni, P, Pb, Rb, Re, Se, Sr, Ta, Ti, Tl, U, V, W, Zn, Zr) was performed by inductively coupled plasma mass spectrometry (ICP-MS) on 320 samples of beans, originated from Greece (Prespes and Kastoria), China and Poland. All samples were collected during the autumn of 2021. The obtained data were analysed by principal component analysis (PCA), an unsupervised statistical method, which allows for to reduce of the dimensionality of the enormous datasets. Statistical analysis revealed a clear separation of beans that had been cultivated in Greece compared with those from China and Poland. An adequate discrimination of geographical origin between bean samples originating from the two Greece regions, Prespes and Kastoria, was also evident. Our results suggest that multi-elemental analysis combined with the appropriate multivariate statistical method could be a useful tool for bean’s geographical authentication. Acknowledgment: This research has been financed by the Public Investment Programme/General Secretariat for Research and Innovation, under the call “YPOERGO 3, code 2018SE01300000: project title: ‘Elaboration and implementation of methodology for authenticity and geographical origin assessment of agricultural products.

Keywords: geographical origin, authenticity, multi-elemental analysis, beans, ICP-MS, PCA

Procedia PDF Downloads 79
146 Distant Speech Recognition Using Laser Doppler Vibrometer

Authors: Yunbin Deng

Abstract:

Most existing applications of automatic speech recognition relies on cooperative subjects at a short distance to a microphone. Standoff speech recognition using microphone arrays can extend the subject to sensor distance somewhat, but it is still limited to only a few feet. As such, most deployed applications of standoff speech recognitions are limited to indoor use at short range. Moreover, these applications require air passway between the subject and the sensor to achieve reasonable signal to noise ratio. This study reports long range (50 feet) automatic speech recognition experiments using a Laser Doppler Vibrometer (LDV) sensor. This study shows that the LDV sensor modality can extend the speech acquisition standoff distance far beyond microphone arrays to hundreds of feet. In addition, LDV enables 'listening' through the windows for uncooperative subjects. This enables new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR) for law enforcement, homeland security and counter terrorism applications. The Polytec LDV model OFV-505 is used in this study. To investigate the impact of different vibrating materials, five parallel LDV speech corpora, each consisting of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall. These are the common materials the application could encounter in a daily life. These data were compared with the microphone counterpart to manifest the impact of various materials on the spectrum of the LDV speech signal. State of the art deep neural network modeling approaches is used to conduct continuous speaker independent speech recognition on these LDV speech datasets. Preliminary phoneme recognition results using time-delay neural network, bi-directional long short term memory, and model fusion shows great promise of using LDV for long range speech recognition. To author’s best knowledge, this is the first time an LDV is reported for long distance speech recognition application.

Keywords: covert speech acquisition, distant speech recognition, DSR, laser Doppler vibrometer, LDV, speech intelligence surveillance and reconnaissance, ISR

Procedia PDF Downloads 180
145 Fly Ash Based Geopolymer Concrete as Curbs, Pavement Bricks, and Wall Bricks

Authors: Marthin Dody Josias Sumajouw, Bryan Wijaya, Servie O. Dapas, Ronny E. Pandaleke, Banu Handono, Fabian J. Manoppo

Abstract:

Ordinary Portland Cement (OPC) takes a big role as a concrete binder in infrastructure construction purposes, nevertheless, it produces CO2 emissions abundantly. To reduce the CO2 emissions produced by OPC concrete, nowadays, geopolymer material become one of the solutions due to it being a binder made from waste with pozzolan material. In concrete industries, geopolymer concrete has evolved as a more environmentally friendly material than OPC concrete. The geopolymer concrete was created without the usage of OPC known as cementless concrete materials. Geopolymer concrete obtains silicon and aluminum from industrial by-products such as fly ash, ground granulated blast furnace slag, and kaolinite. A highly alkaline solution chemically activates Si and Al, forming a matrix that holds together the loose aggregates as well as additional unreacted components in the mixture. They are then dissolved in alkaline activating solutions, where they polymerize into molecular chains, resulting in rigid binders. This research aims to get an eco-friendly material that can reduce the use of OPC as a binder and be used for infrastructure development end-products such as Curbs, Pavement Bricks, and Wall Bricks. This research was conducted as applied research to develop new products of environmentally friendly materials by utilizing fly ash and employed for infrastructure development, particularly for the production of end products such as Curbs, Pavement Bricks, and Wall Bricks. Three types of end products with various dimensions and mix designs have been made and tested in the laboratory, resulting in quantitative datasets to be used for identifying patterns and relationships among density, compressive strength, flexural strength, and water absorption. The result found that geopolymer binders can be used for the production of curbs, pavement bricks, and wall bricks. Geopolymer curbs have an average compressive strength of 19,36 MPa, which can be determined as K-233 concrete. Geopolymer pavement bricks have an average compressive strength of 20,79 MPa. It can be used in parking areas and determined as the grade B of pavement bricks according to SNI 03-0691-1996. Geopolymer wall bricks have an average compressive strength of 11,24 MPa, which can be determined as the grade I of Wall Bricks according to SNI 03-0349-1989.

Keywords: absorption, compressive strength, curbs, end products, geopolymer, pavement bricks, wall bricks

Procedia PDF Downloads 33
144 Leveraging Natural Language Processing for Legal Artificial Intelligence: A Longformer Approach for Taiwanese Legal Cases

Authors: Hsin Lee, Hsuan Lee

Abstract:

Legal artificial intelligence (LegalAI) has been increasing applications within legal systems, propelled by advancements in natural language processing (NLP). Compared with general documents, legal case documents are typically long text sequences with intrinsic logical structures. Most existing language models have difficulty understanding the long-distance dependencies between different structures. Another unique challenge is that while the Judiciary of Taiwan has released legal judgments from various levels of courts over the years, there remains a significant obstacle in the lack of labeled datasets. This deficiency makes it difficult to train models with strong generalization capabilities, as well as accurately evaluate model performance. To date, models in Taiwan have yet to be specifically trained on judgment data. Given these challenges, this research proposes a Longformer-based pre-trained language model explicitly devised for retrieving similar judgments in Taiwanese legal documents. This model is trained on a self-constructed dataset, which this research has independently labeled to measure judgment similarities, thereby addressing a void left by the lack of an existing labeled dataset for Taiwanese judgments. This research adopts strategies such as early stopping and gradient clipping to prevent overfitting and manage gradient explosion, respectively, thereby enhancing the model's performance. The model in this research is evaluated using both the dataset and the Average Entropy of Offense-charged Clustering (AEOC) metric, which utilizes the notion of similar case scenarios within the same type of legal cases. Our experimental results illustrate our model's significant advancements in handling similarity comparisons within extensive legal judgments. By enabling more efficient retrieval and analysis of legal case documents, our model holds the potential to facilitate legal research, aid legal decision-making, and contribute to the further development of LegalAI in Taiwan.

Keywords: legal artificial intelligence, computation and language, language model, Taiwanese legal cases

Procedia PDF Downloads 73