Search results for: imbalanced datasets
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 751

Search results for: imbalanced datasets

151 Identification of Blood Biomarkers Unveiling Early Alzheimer's Disease Diagnosis Through Single-Cell RNA Sequencing Data and Autoencoders

Authors: Hediyeh Talebi, Shokoofeh Ghiam, Changiz Eslahchi

Abstract:

Traditionally, Alzheimer’s disease research has focused on genes with significant fold changes, potentially neglecting subtle but biologically important alterations. Our study introduces an integrative approach that highlights genes crucial to underlying biological processes, regardless of their fold change magnitude. Alzheimer's Single-cell RNA-seq data related to the peripheral blood mononuclear cells (PBMC) was extracted from the Gene Expression Omnibus (GEO). After quality control, normalization, scaling, batch effect correction, and clustering, differentially expressed genes (DEGs) were identified with adjusted p-values less than 0.05. These DEGs were categorized based on cell-type, resulting in four datasets, each corresponding to a distinct cell type. To distinguish between cells from healthy individuals and those with Alzheimer's, an adversarial autoencoder with a classifier was employed. This allowed for the separation of healthy and diseased samples. To identify the most influential genes in this classification, the weight matrices in the network, which includes the encoder and classifier components, were multiplied, and focused on the top 20 genes. The analysis revealed that while some of these genes exhibit a high fold change, others do not. These genes, which may be overlooked by previous methods due to their low fold change, were shown to be significant in our study. The findings highlight the critical role of genes with subtle alterations in diagnosing Alzheimer's disease, a facet frequently overlooked by conventional methods. These genes demonstrate remarkable discriminatory power, underscoring the need to integrate biological relevance with statistical measures in gene prioritization. This integrative approach enhances our understanding of the molecular mechanisms in Alzheimer’s disease and provides a promising direction for identifying potential therapeutic targets.

Keywords: alzheimer's disease, single-cell RNA-seq, neural networks, blood biomarkers

Procedia PDF Downloads 36
150 Predicting Radioactive Waste Glass Viscosity, Density and Dissolution with Machine Learning

Authors: Joseph Lillington, Tom Gout, Mike Harrison, Ian Farnan

Abstract:

The vitrification of high-level nuclear waste within borosilicate glass and its incorporation within a multi-barrier repository deep underground is widely accepted as the preferred disposal method. However, for this to happen, any safety case will require validation that the initially localized radionuclides will not be considerably released into the near/far-field. Therefore, accurate mechanistic models are necessary to predict glass dissolution, and these should be robust to a variety of incorporated waste species and leaching test conditions, particularly given substantial variations across international waste-streams. Here, machine learning is used to predict glass material properties (viscosity, density) and glass leaching model parameters from large-scale industrial data. A variety of different machine learning algorithms have been compared to assess performance. Density was predicted solely from composition, whereas viscosity additionally considered temperature. To predict suitable glass leaching model parameters, a large simulated dataset was created by coupling MATLAB and the chemical reactive-transport code HYTEC, considering the state-of-the-art GRAAL model (glass reactivity in allowance of the alteration layer). The trained models were then subsequently applied to the large-scale industrial, experimental data to identify potentially appropriate model parameters. Results indicate that ensemble methods can accurately predict viscosity as a function of temperature and composition across all three industrial datasets. Glass density prediction shows reliable learning performance with predictions primarily being within the experimental uncertainty of the test data. Furthermore, machine learning can predict glass dissolution model parameters behavior, demonstrating potential value in GRAAL model development and in assessing suitable model parameters for large-scale industrial glass dissolution data.

Keywords: machine learning, predictive modelling, pattern recognition, radioactive waste glass

Procedia PDF Downloads 93
149 The Systems Biology Verification Endeavor: Harness the Power of the Crowd to Address Computational and Biological Challenges

Authors: Stephanie Boue, Nicolas Sierro, Julia Hoeng, Manuel C. Peitsch

Abstract:

Systems biology relies on large numbers of data points and sophisticated methods to extract biologically meaningful signal and mechanistic understanding. For example, analyses of transcriptomics and proteomics data enable to gain insights into the molecular differences in tissues exposed to diverse stimuli or test items. Whereas the interpretation of endpoints specifically measuring a mechanism is relatively straightforward, the interpretation of big data is more complex and would benefit from comparing results obtained with diverse analysis methods. The sbv IMPROVER project was created to implement solutions to verify systems biology data, methods, and conclusions. Computational challenges leveraging the wisdom of the crowd allow benchmarking methods for specific tasks, such as signature extraction and/or samples classification. Four challenges have already been successfully conducted and confirmed that the aggregation of predictions often leads to better results than individual predictions and that methods perform best in specific contexts. Whenever the scientific question of interest does not have a gold standard, but may greatly benefit from the scientific community to come together and discuss their approaches and results, datathons are set up. The inaugural sbv IMPROVER datathon was held in Singapore on 23-24 September 2016. It allowed bioinformaticians and data scientists to consolidate their ideas and work on the most promising methods as teams, after having initially reflected on the problem on their own. The outcome is a set of visualization and analysis methods that will be shared with the scientific community via the Garuda platform, an open connectivity platform that provides a framework to navigate through different applications, databases and services in biology and medicine. We will present the results we obtained when analyzing data with our network-based method, and introduce a datathon that will take place in Japan to encourage the analysis of the same datasets with other methods to allow for the consolidation of conclusions.

Keywords: big data interpretation, datathon, systems toxicology, verification

Procedia PDF Downloads 259
148 An Artificial Intelligence Framework to Forecast Air Quality

Authors: Richard Ren

Abstract:

Air pollution is a serious danger to international well-being and economies - it will kill an estimated 7 million people every year, costing world economies $2.6 trillion by 2060 due to sick days, healthcare costs, and reduced productivity. In the United States alone, 60,000 premature deaths are caused by poor air quality. For this reason, there is a crucial need to develop effective methods to forecast air quality, which can mitigate air pollution’s detrimental public health effects and associated costs by helping people plan ahead and avoid exposure. The goal of this study is to propose an artificial intelligence framework for predicting future air quality based on timing variables (i.e. season, weekday/weekend), future weather forecasts, as well as past pollutant and air quality measurements. The proposed framework utilizes multiple machine learning algorithms (logistic regression, random forest, neural network) with different specifications and averages the results of the three top-performing models to eliminate inaccuracies, weaknesses, and biases from any one individual model. Over time, the proposed framework uses new data to self-adjust model parameters and increase prediction accuracy. To demonstrate its applicability, a prototype of this framework was created to forecast air quality in Los Angeles, California using datasets from the RP4 weather data repository and EPA pollutant measurement data. The results showed good agreement between the framework’s predictions and real-life observations, with an overall 92% model accuracy. The combined model is able to predict more accurately than any of the individual models, and it is able to reliably forecast season-based variations in air quality levels. Top air quality predictor variables were identified through the measurement of mean decrease in accuracy. This study proposed and demonstrated the efficacy of a comprehensive air quality prediction framework leveraging multiple machine learning algorithms to overcome individual algorithm shortcomings. Future enhancements should focus on expanding and testing a greater variety of modeling techniques within the proposed framework, testing the framework in different locations, and developing a platform to automatically publish future predictions in the form of a web or mobile application. Accurate predictions from this artificial intelligence framework can in turn be used to save and improve lives by allowing individuals to protect their health and allowing governments to implement effective pollution control measures.Air pollution is a serious danger to international wellbeing and economies - it will kill an estimated 7 million people every year, costing world economies $2.6 trillion by 2060 due to sick days, healthcare costs, and reduced productivity. In the United States alone, 60,000 premature deaths are caused by poor air quality. For this reason, there is a crucial need to develop effective methods to forecast air quality, which can mitigate air pollution’s detrimental public health effects and associated costs by helping people plan ahead and avoid exposure. The goal of this study is to propose an artificial intelligence framework for predicting future air quality based on timing variables (i.e. season, weekday/weekend), future weather forecasts, as well as past pollutant and air quality measurements. The proposed framework utilizes multiple machine learning algorithms (logistic regression, random forest, neural network) with different specifications and averages the results of the three top-performing models to eliminate inaccuracies, weaknesses, and biases from any one individual model. Over time, the proposed framework uses new data to self-adjust model parameters and increase prediction accuracy. To demonstrate its applicability, a prototype of this framework was created to forecast air quality in Los Angeles, California using datasets from the RP4 weather data repository and EPA pollutant measurement data. The results showed good agreement between the framework’s predictions and real-life observations, with an overall 92% model accuracy. The combined model is able to predict more accurately than any of the individual models, and it is able to reliably forecast season-based variations in air quality levels. Top air quality predictor variables were identified through the measurement of mean decrease in accuracy. This study proposed and demonstrated the efficacy of a comprehensive air quality prediction framework leveraging multiple machine learning algorithms to overcome individual algorithm shortcomings. Future enhancements should focus on expanding and testing a greater variety of modeling techniques within the proposed framework, testing the framework in different locations, and developing a platform to automatically publish future predictions in the form of a web or mobile application. Accurate predictions from this artificial intelligence framework can in turn be used to save and improve lives by allowing individuals to protect their health and allowing governments to implement effective pollution control measures.Air pollution is a serious danger to international wellbeing and economies - it will kill an estimated 7 million people every year, costing world economies $2.6 trillion by 2060 due to sick days, healthcare costs, and reduced productivity. In the United States alone, 60,000 premature deaths are caused by poor air quality. For this reason, there is a crucial need to develop effective methods to forecast air quality, which can mitigate air pollution’s detrimental public health effects and associated costs by helping people plan ahead and avoid exposure. The goal of this study is to propose an artificial intelligence framework for predicting future air quality based on timing variables (i.e. season, weekday/weekend), future weather forecasts, as well as past pollutant and air quality measurements. The proposed framework utilizes multiple machine learning algorithms (logistic regression, random forest, neural network) with different specifications and averages the results of the three top-performing models to eliminate inaccuracies, weaknesses, and biases from any one individual model. Over time, the proposed framework uses new data to self-adjust model parameters and increase prediction accuracy. To demonstrate its applicability, a prototype of this framework was created to forecast air quality in Los Angeles, California using datasets from the RP4 weather data repository and EPA pollutant measurement data. The results showed good agreement between the framework’s predictions and real-life observations, with an overall 92% model accuracy. The combined model is able to predict more accurately than any of the individual models, and it is able to reliably forecast season-based variations in air quality levels. Top air quality predictor variables were identified through the measurement of mean decrease in accuracy. This study proposed and demonstrated the efficacy of a comprehensive air quality prediction framework leveraging multiple machine learning algorithms to overcome individual algorithm shortcomings. Future enhancements should focus on expanding and testing a greater variety of modeling techniques within the proposed framework, testing the framework in different locations, and developing a platform to automatically publish future predictions in the form of a web or mobile application. Accurate predictions from this artificial intelligence framework can in turn be used to save and improve lives by allowing individuals to protect their health and allowing governments to implement effective pollution control measures.

Keywords: air quality prediction, air pollution, artificial intelligence, machine learning algorithms

Procedia PDF Downloads 95
147 Resisting Adversarial Assaults: A Model-Agnostic Autoencoder Solution

Authors: Massimo Miccoli, Luca Marangoni, Alberto Aniello Scaringi, Alessandro Marceddu, Alessandro Amicone

Abstract:

The susceptibility of deep neural networks (DNNs) to adversarial manipulations is a recognized challenge within the computer vision domain. Adversarial examples, crafted by adding subtle yet malicious alterations to benign images, exploit this vulnerability. Various defense strategies have been proposed to safeguard DNNs against such attacks, stemming from diverse research hypotheses. Building upon prior work, our approach involves the utilization of autoencoder models. Autoencoders, a type of neural network, are trained to learn representations of training data and reconstruct inputs from these representations, typically minimizing reconstruction errors like mean squared error (MSE). Our autoencoder was trained on a dataset of benign examples; learning features specific to them. Consequently, when presented with significantly perturbed adversarial examples, the autoencoder exhibited high reconstruction errors. The architecture of the autoencoder was tailored to the dimensions of the images under evaluation. We considered various image sizes, constructing models differently for 256x256 and 512x512 images. Moreover, the choice of the computer vision model is crucial, as most adversarial attacks are designed with specific AI structures in mind. To mitigate this, we proposed a method to replace image-specific dimensions with a structure independent of both dimensions and neural network models, thereby enhancing robustness. Our multi-modal autoencoder reconstructs the spectral representation of images across the red-green-blue (RGB) color channels. To validate our approach, we conducted experiments using diverse datasets and subjected them to adversarial attacks using models such as ResNet50 and ViT_L_16 from the torch vision library. The autoencoder extracted features used in a classification model, resulting in an MSE (RGB) of 0.014, a classification accuracy of 97.33%, and a precision of 99%.

Keywords: adversarial attacks, malicious images detector, binary classifier, multimodal transformer autoencoder

Procedia PDF Downloads 46
146 Global Solar Irradiance: Data Imputation to Analyze Complementarity Studies of Energy in Colombia

Authors: Jeisson A. Estrella, Laura C. Herrera, Cristian A. Arenas

Abstract:

The Colombian electricity sector has been transforming through the insertion of new energy sources to generate electricity, one of them being solar energy, which is being promoted by companies interested in photovoltaic technology. The study of this technology is important for electricity generation in general and for the planning of the sector from the perspective of energy complementarity. Precisely in this last approach is where the project is located; we are interested in answering the concerns about the reliability of the electrical system when climatic phenomena such as El Niño occur or in defining whether it is viable to replace or expand thermoelectric plants. Reliability of the electrical system when climatic phenomena such as El Niño occur, or to define whether it is viable to replace or expand thermoelectric plants with renewable electricity generation systems. In this regard, some difficulties related to the basic information on renewable energy sources from measured data must first be solved, as these come from automatic weather stations. Basic information on renewable energy sources from measured data, since these come from automatic weather stations administered by the Institute of Hydrology, Meteorology and Environmental Studies (IDEAM) and, in the range of study (2005-2019), have significant amounts of missing data. For this reason, the overall objective of the project is to complete the global solar irradiance datasets to obtain time series to develop energy complementarity analyses in a subsequent project. Global solar irradiance data sets to obtain time series that will allow the elaboration of energy complementarity analyses in the following project. The filling of the databases will be done through numerical and statistical methods, which are basic techniques for undergraduate students in technical areas who are starting out as researchers technical areas who are starting out as researchers.

Keywords: time series, global solar irradiance, imputed data, energy complementarity

Procedia PDF Downloads 46
145 Automatic Generation of Census Enumeration Area and National Sampling Frame to Achieve Sustainable Development Goals

Authors: Sarchil H. Qader, Andrew Harfoot, Mathias Kuepie, Sabrina Juran, Attila Lazar, Andrew J. Tatem

Abstract:

The need for high-quality, reliable, and timely population data, including demographic information, to support the achievement of the sustainable development goals (SDGs) in all countries was recognized by the United Nations' 2030 Agenda for sustainable development. However, many low and middle-income countries lack reliable and recent census data. To achieve reliable and accurate census and survey outputs, up-to-date census enumeration areas and digital national sampling frames are critical. Census enumeration areas (EAs) are the smallest geographic units for collection, disseminating, and analyzing census data and are often used as a national sampling frame to serve various socio-economic surveys. Even for countries that are wealthy and stable, creating and updating EAs is a difficult yet crucial step in preparing for a national census. Such a process is commonly done manually, either by digitizing small geographic units on high-resolution satellite imagery or walking the boundaries of units, both of which are extremely expensive. We have developed a user-friendly tool that could be employed to generate draft EA boundaries automatically. The tool is based on high-resolution gridded population and settlement datasets, GPS household locations, building footprints and uses publicly available natural, man-made and administrative boundaries. Initial outputs were produced in Burkina Faso, Paraguay, Somalia, Togo, Niger, Guinea, and Zimbabwe. The results indicate that the EAs are in line with international standards, including boundaries that are easily identifiable and follow ground features, have no overlaps, are compact and free of pockets and disjoints, and the boundaries are nested within administrative boundaries.

Keywords: enumeration areas, national sampling frame, gridded population data, preEA tool

Procedia PDF Downloads 114
144 Re-Stating the Origin of Tetrapod Using Measures of Phylogenetic Support for Phylogenomic Data

Authors: Yunfeng Shan, Xiaoliang Wang, Youjun Zhou

Abstract:

Whole-genome data from two lungfish species, along with other species, present a valuable opportunity to re-investigate the longstanding debate regarding the evolutionary relationships among tetrapods, lungfishes, and coelacanths. However, the use of bootstrap support has become outdated for large-scale phylogenomic data. Without robust phylogenetic support, the phylogenetic trees become meaningless. Therefore, it is necessary to re-evaluate the phylogenies of tetrapods, lungfishes, and coelacanths using novel measures of phylogenetic support specifically designed for phylogenomic data, as the previous phylogenies were based on 100% bootstrap support. Our findings consistently provide strong evidence favoring lungfish as the closest living relative of tetrapods. This conclusion is based on high internode certainty, relative gene support, and high gene concordance factor. The evidence stems from five previous datasets derived from lungfish transcriptomes. These results yield fresh insights into the three hypotheses regarding the phylogenies of tetrapods, lungfishes, and coelacanths. Importantly, these hypotheses are not mere conjectures but are substantiated by a significant number of genes. Analyzing real biological data further demonstrates that the inclusion of additional taxa leads to more diverse tree topologies. Consequently, gene trees and species trees may not be identical even when whole-genome sequencing data is utilized. However, it is worth noting that many gene trees can accurately reflect the species tree if an appropriate number of taxa, typically ranging from six to ten, are sampled. Therefore, it is crucial to carefully select the number of taxa and an appropriate outgroup, such as slow-evolving species, while excluding fast-evolving taxa as outgroups to mitigate the adverse effects of long-branch attraction and achieve an accurate reconstruction of the species tree. This is particularly important as more whole-genome sequencing data becomes available.

Keywords: novel measures of phylogenetic support for phylogenomic data, gene concordance factor confidence, relative gene support, internode certainty, origin of tetrapods

Procedia PDF Downloads 34
143 Estimating Precipitable Water Vapour Using the Global Positioning System and Radio Occultation over Ethiopian Regions

Authors: Asmamaw Yehun, Tsegaye Gogie, Martin Vermeer, Addisu Hunegnaw

Abstract:

The Global Positioning System (GPS) is a space-based radio positioning system, which is capable of providing continuous position, velocity, and time information to users anywhere on or near the surface of the Earth. The main objective of this work was to estimate the integrated precipitable water vapour (IPWV) using ground GPS and Low Earth Orbit (LEO) Radio Occultation (RO) to study spatial-temporal variability. For LEO-GPS RO, we used Constellation Observing System for Meteorology, Ionosphere, and Climate (COSMIC) datasets. We estimated the daily and monthly mean of IPWV using six selected ground-based GPS stations over a period of range from 2012 to 2016 (i.e. five-years period). The main perspective for selecting the range period from 2012 to 2016 is that, continuous data were available during these periods at all Ethiopian GPS stations. We studied temporal, seasonal, diurnal, and vertical variations of precipitable water vapour using GPS observables extracted from the precise geodetic GAMIT-GLOBK software package. Finally, we determined the cross-correlation of our GPS-derived IPWV values with those of the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA-40 Interim reanalysis and of the second generation National Oceanic and Atmospheric Administration (NOAA) model ensemble Forecast System Reforecast (GEFS/R) for validation and static comparison. There are higher values of the IPWV range from 30 to 37.5 millimetres (mm) in Gambela and Southern Regions of Ethiopia. Some parts of Tigray, Amhara, and Oromia regions had low IPWV ranges from 8.62 to 15.27 mm. The correlation coefficient between GPS-derived IPWV with ECMWF and GEFS/R exceeds 90%. We conclude that there are highly temporal, seasonal, diurnal, and vertical variations of precipitable water vapour in the study area.

Keywords: GNSS, radio occultation, atmosphere, precipitable water vapour

Procedia PDF Downloads 59
142 AI for Efficient Geothermal Exploration and Utilization

Authors: Velimir "monty" Vesselinov, Trais Kliplhuis, Hope Jasperson

Abstract:

Artificial intelligence (AI) is a powerful tool in the geothermal energy sector, aiding in both exploration and utilization. Identifying promising geothermal sites can be challenging due to limited surface indicators and the need for expensive drilling to confirm subsurface resources. Geothermal reservoirs can be located deep underground and exhibit complex geological structures, making traditional exploration methods time-consuming and imprecise. AI algorithms can analyze vast datasets of geological, geophysical, and remote sensing data, including satellite imagery, seismic surveys, geochemistry, geology, etc. Machine learning algorithms can identify subtle patterns and relationships within this data, potentially revealing hidden geothermal potential in areas previously overlooked. To address these challenges, a SIML (Science-Informed Machine Learning) technology has been developed. SIML methods are different from traditional ML techniques. In both cases, the ML models are trained to predict the spatial distribution of an output (e.g., pressure, temperature, heat flux) based on a series of inputs (e.g., permeability, porosity, etc.). The traditional ML (a) relies on deep and wide neural networks (NNs) based on simple algebraic mappings to represent complex processes. In contrast, the SIML neurons incorporate complex mappings (including constitutive relationships and physics/chemistry models). This results in ML models that have a physical meaning and satisfy physics laws and constraints. The prototype of the developed software, called GeoTGO, is accessible through the cloud. Our software prototype demonstrates how different data sources can be made available for processing, executed demonstrative SIML analyses, and presents the results in a table and graphic form.

Keywords: science-informed machine learning, artificial inteligence, exploration, utilization, hidden geothermal

Procedia PDF Downloads 8
141 Comparison of Data Reduction Algorithms for Image-Based Point Cloud Derived Digital Terrain Models

Authors: M. Uysal, M. Yilmaz, I. Tiryakioğlu

Abstract:

Digital Terrain Model (DTM) is a digital numerical representation of the Earth's surface. DTMs have been applied to a diverse field of tasks, such as urban planning, military, glacier mapping, disaster management. In the expression of the Earth' surface as a mathematical model, an infinite number of point measurements are needed. Because of the impossibility of this case, the points at regular intervals are measured to characterize the Earth's surface and DTM of the Earth is generated. Hitherto, the classical measurement techniques and photogrammetry method have widespread use in the construction of DTM. At present, RADAR, LiDAR, and stereo satellite images are also used for the construction of DTM. In recent years, especially because of its superiorities, Airborne Light Detection and Ranging (LiDAR) has an increased use in DTM applications. A 3D point cloud is created with LiDAR technology by obtaining numerous point data. However recently, by the development in image mapping methods, the use of unmanned aerial vehicles (UAV) for photogrammetric data acquisition has increased DTM generation from image-based point cloud. The accuracy of the DTM depends on various factors such as data collection method, the distribution of elevation points, the point density, properties of the surface and interpolation methods. In this study, the random data reduction method is compared for DTMs generated from image based point cloud data. The original image based point cloud data set (100%) is reduced to a series of subsets by using random algorithm, representing the 75, 50, 25 and 5% of the original image based point cloud data set. Over the ANS campus of Afyon Kocatepe University as the test area, DTM constructed from the original image based point cloud data set is compared with DTMs interpolated from reduced data sets by Kriging interpolation method. The results show that the random data reduction method can be used to reduce the image based point cloud datasets to 50% density level while still maintaining the quality of DTM.

Keywords: DTM, Unmanned Aerial Vehicle (UAV), uniform, random, kriging

Procedia PDF Downloads 126
140 Advances in Mathematical Sciences: Unveiling the Power of Data Analytics

Authors: Zahid Ullah, Atlas Khan

Abstract:

The rapid advancements in data collection, storage, and processing capabilities have led to an explosion of data in various domains. In this era of big data, mathematical sciences play a crucial role in uncovering valuable insights and driving informed decision-making through data analytics. The purpose of this abstract is to present the latest advances in mathematical sciences and their application in harnessing the power of data analytics. This abstract highlights the interdisciplinary nature of data analytics, showcasing how mathematics intersects with statistics, computer science, and other related fields to develop cutting-edge methodologies. It explores key mathematical techniques such as optimization, mathematical modeling, network analysis, and computational algorithms that underpin effective data analysis and interpretation. The abstract emphasizes the role of mathematical sciences in addressing real-world challenges across different sectors, including finance, healthcare, engineering, social sciences, and beyond. It showcases how mathematical models and statistical methods extract meaningful insights from complex datasets, facilitating evidence-based decision-making and driving innovation. Furthermore, the abstract emphasizes the importance of collaboration and knowledge exchange among researchers, practitioners, and industry professionals. It recognizes the value of interdisciplinary collaborations and the need to bridge the gap between academia and industry to ensure the practical application of mathematical advancements in data analytics. The abstract highlights the significance of ongoing research in mathematical sciences and its impact on data analytics. It emphasizes the need for continued exploration and innovation in mathematical methodologies to tackle emerging challenges in the era of big data and digital transformation. In summary, this abstract sheds light on the advances in mathematical sciences and their pivotal role in unveiling the power of data analytics. It calls for interdisciplinary collaboration, knowledge exchange, and ongoing research to further unlock the potential of mathematical methodologies in addressing complex problems and driving data-driven decision-making in various domains.

Keywords: mathematical sciences, data analytics, advances, unveiling

Procedia PDF Downloads 62
139 Analyzing the Street Pattern Characteristics on Young People’s Choice to Walk or Not: A Study Based on Accelerometer and Global Positioning Systems Data

Authors: Ebru Cubukcu, Gozde Eksioglu Cetintahra, Burcin Hepguzel Hatip, Mert Cubukcu

Abstract:

Obesity and overweight cause serious health problems. Public and private organizations aim to encourage walking in various ways in order to cope with the problem of obesity and overweight. This study aims to understand how the spatial characteristics of urban street pattern, connectivity and complexity influence young people’s choice to walk or not. 185 public university students in Izmir, the third largest city in Turkey, participated in the study. Each participant had worn an accelerometer and a global positioning (GPS) device for a week. The accelerometer device records data on the intensity of the participant’s activity at a specified time interval, and the GPS device on the activities’ locations. Combining the two datasets, activity maps are derived. These maps are then used to differentiate the participants’ walk trips and motor vehicle trips. Given that, the frequency of walk and motor vehicle trips are calculated at the street segment level, and the street segments are then categorized into two as ‘preferred by pedestrians’ and ‘preferred by motor vehicles’. Graph Theory-based accessibility indices are calculated to quantify the spatial characteristics of the streets in the sample. Six different indices are used: (I) edge density, (II) edge sinuosity, (III) eta index, (IV) node density, (V) order of a node, and (VI) beta index. T-tests show that the index values for the ‘preferred by pedestrians’ and ‘preferred by motor vehicles’ are significantly different. The findings indicate that the spatial characteristics of the street network have a measurable effect on young people’s choice to walk or not. Policy implications are discussed. This study is funded by the Scientific and Technological Research Council of Turkey, Project No: 116K358.

Keywords: graph theory, walkability, accessibility, street network

Procedia PDF Downloads 189
138 Exploring the Applications of Neural Networks in the Adaptive Learning Environment

Authors: Baladitya Swaika, Rahul Khatry

Abstract:

Computer Adaptive Tests (CATs) is one of the most efficient ways for testing the cognitive abilities of students. CATs are based on Item Response Theory (IRT) which is based on item selection and ability estimation using statistical methods of maximum information selection/selection from posterior and maximum-likelihood (ML)/maximum a posteriori (MAP) estimators respectively. This study aims at combining both classical and Bayesian approaches to IRT to create a dataset which is then fed to a neural network which automates the process of ability estimation and then comparing it to traditional CAT models designed using IRT. This study uses python as the base coding language, pymc for statistical modelling of the IRT and scikit-learn for neural network implementations. On creation of the model and on comparison, it is found that the Neural Network based model performs 7-10% worse than the IRT model for score estimations. Although performing poorly, compared to the IRT model, the neural network model can be beneficially used in back-ends for reducing time complexity as the IRT model would have to re-calculate the ability every-time it gets a request whereas the prediction from a neural network could be done in a single step for an existing trained Regressor. This study also proposes a new kind of framework whereby the neural network model could be used to incorporate feature sets, other than the normal IRT feature set and use a neural network’s capacity of learning unknown functions to give rise to better CAT models. Categorical features like test type, etc. could be learnt and incorporated in IRT functions with the help of techniques like logistic regression and can be used to learn functions and expressed as models which may not be trivial to be expressed via equations. This kind of a framework, when implemented would be highly advantageous in psychometrics and cognitive assessments. This study gives a brief overview as to how neural networks can be used in adaptive testing, not only by reducing time-complexity but also by being able to incorporate newer and better datasets which would eventually lead to higher quality testing.

Keywords: computer adaptive tests, item response theory, machine learning, neural networks

Procedia PDF Downloads 157
137 Improving Cell Type Identification of Single Cell Data by Iterative Graph-Based Noise Filtering

Authors: Annika Stechemesser, Rachel Pounds, Emma Lucas, Chris Dawson, Julia Lipecki, Pavle Vrljicak, Jan Brosens, Sean Kehoe, Jason Yap, Lawrence Young, Sascha Ott

Abstract:

Advances in technology make it now possible to retrieve the genetic information of thousands of single cancerous cells. One of the key challenges in single cell analysis of cancerous tissue is to determine the number of different cell types and their characteristic genes within the sample to better understand the tumors and their reaction to different treatments. For this analysis to be possible, it is crucial to filter out background noise as it can severely blur the downstream analysis and give misleading results. In-depth analysis of the state-of-the-art filtering methods for single cell data showed that they do, in some cases, not separate noisy and normal cells sufficiently. We introduced an algorithm that filters and clusters single cell data simultaneously without relying on certain genes or thresholds chosen by eye. It detects communities in a Shared Nearest Neighbor similarity network, which captures the similarities and dissimilarities of the cells by optimizing the modularity and then identifies and removes vertices with a weak clustering belonging. This strategy is based on the fact that noisy data instances are very likely to be similar to true cell types but do not match any of these wells. Once the clustering is complete, we apply a set of evaluation metrics on the cluster level and accept or reject clusters based on the outcome. The performance of our algorithm was tested on three datasets and led to convincing results. We were able to replicate the results on a Peripheral Blood Mononuclear Cells dataset. Furthermore, we applied the algorithm to two samples of ovarian cancer from the same patient before and after chemotherapy. Comparing the standard approach to our algorithm, we found a hidden cell type in the ovarian postchemotherapy data with interesting marker genes that are potentially relevant for medical research.

Keywords: cancer research, graph theory, machine learning, single cell analysis

Procedia PDF Downloads 83
136 Two-Level Graph Causality to Detect and Predict Random Cyber-Attacks

Authors: Van Trieu, Shouhuai Xu, Yusheng Feng

Abstract:

Tracking attack trajectories can be difficult, with limited information about the nature of the attack. Even more difficult as attack information is collected by Intrusion Detection Systems (IDSs) due to the current IDSs having some limitations in identifying malicious and anomalous traffic. Moreover, IDSs only point out the suspicious events but do not show how the events relate to each other or which event possibly cause the other event to happen. Because of this, it is important to investigate new methods capable of performing the tracking of attack trajectories task quickly with less attack information and dependency on IDSs, in order to prioritize actions during incident responses. This paper proposes a two-level graph causality framework for tracking attack trajectories in internet networks by leveraging observable malicious behaviors to detect what is the most probable attack events that can cause another event to occur in the system. Technically, given the time series of malicious events, the framework extracts events with useful features, such as attack time and port number, to apply to the conditional independent tests to detect the relationship between attack events. Using the academic datasets collected by IDSs, experimental results show that the framework can quickly detect the causal pairs that offer meaningful insights into the nature of the internet network, given only reasonable restrictions on network size and structure. Without the framework’s guidance, these insights would not be able to discover by the existing tools, such as IDSs. It would cost expert human analysts a significant time if possible. The computational results from the proposed two-level graph network model reveal the obvious pattern and trends. In fact, more than 85% of causal pairs have the average time difference between the causal and effect events in both computed and observed data within 5 minutes. This result can be used as a preventive measure against future attacks. Although the forecast may be short, from 0.24 seconds to 5 minutes, it is long enough to be used to design a prevention protocol to block those attacks.

Keywords: causality, multilevel graph, cyber-attacks, prediction

Procedia PDF Downloads 137
135 Effect of Climate Variability on Children Health Outcomes in Rural Uganda

Authors: Emily Injete Amondo, Alisher Mirzabaev, Emmanuel Rukundo

Abstract:

Children in rural farming households are often vulnerable to a multitude of risks, including health risks associated with climate change and variability. Cognizant of this, this study empirically traced the relationship between climate variability and nutritional health outcomes in rural children while identifying the cause-and-effect transmission mechanisms. We combined four waves of the rich Uganda National Panel Survey (UNPS), part of the World Bank Living Standards Measurement Studies (LSMS) for the period 2009-2014, with long-term and high-frequency rainfall and temperature datasets. Self-reported drought and flood shock variables were further used in separate regressions for triangulation purposes and robustness checks. Panel fixed effects regressions were applied in the empirical analysis, accounting for a variety of causal identification issues. The results showed significant negative outcomes for children’s anthropometric measurements due to the impacts of moderate and extreme droughts, extreme wet spells, and heatwaves. On the contrary, moderate wet spells were positively linked with nutritional measures. Agricultural production and child diarrhea were the main transmission channels, with heatwaves, droughts, and high rainfall variability negatively affecting crop output. The probability of diarrhea was positively related to increases in temperature and dry spells. Results further revealed that children in households who engaged in ex-ante or anticipatory risk-reducing strategies such as savings had better health outcomes as opposed to those engaged in ex-post coping such as involuntary change of diet. These results highlight the importance of adaptation in smoothing the harmful effects of climate variability on the health of rural households and children in Uganda.

Keywords: extreme weather events, undernutrition, diarrhea, agricultural production, gridded weather data

Procedia PDF Downloads 81
134 Mathematics Bridging Theory and Applications for a Data-Driven World

Authors: Zahid Ullah, Atlas Khan

Abstract:

In today's data-driven world, the role of mathematics in bridging the gap between theory and applications is becoming increasingly vital. This abstract highlights the significance of mathematics as a powerful tool for analyzing, interpreting, and extracting meaningful insights from vast amounts of data. By integrating mathematical principles with real-world applications, researchers can unlock the full potential of data-driven decision-making processes. This abstract delves into the various ways mathematics acts as a bridge connecting theoretical frameworks to practical applications. It explores the utilization of mathematical models, algorithms, and statistical techniques to uncover hidden patterns, trends, and correlations within complex datasets. Furthermore, it investigates the role of mathematics in enhancing predictive modeling, optimization, and risk assessment methodologies for improved decision-making in diverse fields such as finance, healthcare, engineering, and social sciences. The abstract also emphasizes the need for interdisciplinary collaboration between mathematicians, statisticians, computer scientists, and domain experts to tackle the challenges posed by the data-driven landscape. By fostering synergies between these disciplines, novel approaches can be developed to address complex problems and make data-driven insights accessible and actionable. Moreover, this abstract underscores the importance of robust mathematical foundations for ensuring the reliability and validity of data analysis. Rigorous mathematical frameworks not only provide a solid basis for understanding and interpreting results but also contribute to the development of innovative methodologies and techniques. In summary, this abstract advocates for the pivotal role of mathematics in bridging theory and applications in a data-driven world. By harnessing mathematical principles, researchers can unlock the transformative potential of data analysis, paving the way for evidence-based decision-making, optimized processes, and innovative solutions to the challenges of our rapidly evolving society.

Keywords: mathematics, bridging theory and applications, data-driven world, mathematical models

Procedia PDF Downloads 48
133 Evaluation of Video Quality Metrics and Performance Comparison on Contents Taken from Most Commonly Used Devices

Authors: Pratik Dhabal Deo, Manoj P.

Abstract:

With the increasing number of social media users, the amount of video content available has also significantly increased. Currently, the number of smartphone users is at its peak, and many are increasingly using their smartphones as their main photography and recording devices. There have been a lot of developments in the field of Video Quality Assessment (VQA) and metrics like VMAF, SSIM etc. are said to be some of the best performing metrics, but the evaluation of these metrics is dominantly done on professionally taken video contents using professional tools, lighting conditions etc. No study particularly pinpointing the performance of the metrics on the contents taken by users on very commonly available devices has been done. Datasets that contain a huge number of videos from different high-end devices make it difficult to analyze the performance of the metrics on the content from most used devices even if they contain contents taken in poor lighting conditions using lower-end devices. These devices face a lot of distortions due to various factors since the spectrum of contents recorded on these devices is huge. In this paper, we have presented an analysis of the objective VQA metrics on contents taken only from most used devices and their performance on them, focusing on full-reference metrics. To carry out this research, we created a custom dataset containing a total of 90 videos that have been taken from three most commonly used devices, and android smartphone, an IOS smartphone and a DSLR. On the videos taken on each of these devices, the six most common types of distortions that users face have been applied on addition to already existing H.264 compression based on four reference videos. These six applied distortions have three levels of degradation each. A total of the five most popular VQA metrics have been evaluated on this dataset and the highest values and the lowest values of each of the metrics on the distortions have been recorded. Finally, it is found that blur is the artifact on which most of the metrics didn’t perform well. Thus, in order to understand the results better the amount of blur in the data set has been calculated and an additional evaluation of the metrics was done using HEVC codec, which is the next version of H.264 compression, on the camera that proved to be the sharpest among the devices. The results have shown that as the resolution increases, the performance of the metrics tends to become more accurate and the best performing metric among them is VQM with very few inconsistencies and inaccurate results when the compression applied is H.264, but when the compression is applied is HEVC, SSIM and VMAF have performed significantly better.

Keywords: distortion, metrics, performance, resolution, video quality assessment

Procedia PDF Downloads 182
132 Multi-Elemental Analysis Using Inductively Coupled Plasma Mass Spectrometry for the Geographical Origin Discrimination of Greek Giant Beans “Gigantes Elefantes”

Authors: Eleni C. Mazarakioti, Anastasios Zotos, Anna-Akrivi Thomatou, Efthimios Kokkotos, Achilleas Kontogeorgos, Athanasios Ladavos, Angelos Patakas

Abstract:

“Gigantes Elefantes” is a particularly dynamic crop of giant beans cultivated in western Macedonia (Greece). This variety of large beans growing in this area and specifically in the regions of Prespes and Kastoria is a protected designation of origin (PDO) species with high nutritional quality. Mislabeling of geographical origin and blending with unidentified samples are common fraudulent practices in Greek food market with financial and possible health consequences. In the last decades, multi-elemental composition analysis has been used in identifying the geographical origin of foods and agricultural products. In an attempt to discriminate the authenticity of Greek beans, multi-elemental analysis (Ag, Al, As, B, Ba, Be, Ca, Cd, Co, Cr, Cs, Cu, Fe, Ga, Ge, K, Li, Mg, Mn, Mo, Na, Nb, Ni, P, Pb, Rb, Re, Se, Sr, Ta, Ti, Tl, U, V, W, Zn, Zr) was performed by inductively coupled plasma mass spectrometry (ICP-MS) on 320 samples of beans, originated from Greece (Prespes and Kastoria), China and Poland. All samples were collected during the autumn of 2021. The obtained data were analysed by principal component analysis (PCA), an unsupervised statistical method, which allows for to reduce of the dimensionality of the enormous datasets. Statistical analysis revealed a clear separation of beans that had been cultivated in Greece compared with those from China and Poland. An adequate discrimination of geographical origin between bean samples originating from the two Greece regions, Prespes and Kastoria, was also evident. Our results suggest that multi-elemental analysis combined with the appropriate multivariate statistical method could be a useful tool for bean’s geographical authentication. Acknowledgment: This research has been financed by the Public Investment Programme/General Secretariat for Research and Innovation, under the call “YPOERGO 3, code 2018SE01300000: project title: ‘Elaboration and implementation of methodology for authenticity and geographical origin assessment of agricultural products.

Keywords: geographical origin, authenticity, multi-elemental analysis, beans, ICP-MS, PCA

Procedia PDF Downloads 50
131 Distant Speech Recognition Using Laser Doppler Vibrometer

Authors: Yunbin Deng

Abstract:

Most existing applications of automatic speech recognition relies on cooperative subjects at a short distance to a microphone. Standoff speech recognition using microphone arrays can extend the subject to sensor distance somewhat, but it is still limited to only a few feet. As such, most deployed applications of standoff speech recognitions are limited to indoor use at short range. Moreover, these applications require air passway between the subject and the sensor to achieve reasonable signal to noise ratio. This study reports long range (50 feet) automatic speech recognition experiments using a Laser Doppler Vibrometer (LDV) sensor. This study shows that the LDV sensor modality can extend the speech acquisition standoff distance far beyond microphone arrays to hundreds of feet. In addition, LDV enables 'listening' through the windows for uncooperative subjects. This enables new capabilities in automatic audio and speech intelligence, surveillance, and reconnaissance (ISR) for law enforcement, homeland security and counter terrorism applications. The Polytec LDV model OFV-505 is used in this study. To investigate the impact of different vibrating materials, five parallel LDV speech corpora, each consisting of 630 speakers, are collected from the vibrations of a glass window, a metal plate, a plastic box, a wood slate, and a concrete wall. These are the common materials the application could encounter in a daily life. These data were compared with the microphone counterpart to manifest the impact of various materials on the spectrum of the LDV speech signal. State of the art deep neural network modeling approaches is used to conduct continuous speaker independent speech recognition on these LDV speech datasets. Preliminary phoneme recognition results using time-delay neural network, bi-directional long short term memory, and model fusion shows great promise of using LDV for long range speech recognition. To author’s best knowledge, this is the first time an LDV is reported for long distance speech recognition application.

Keywords: covert speech acquisition, distant speech recognition, DSR, laser Doppler vibrometer, LDV, speech intelligence surveillance and reconnaissance, ISR

Procedia PDF Downloads 156
130 Leveraging Natural Language Processing for Legal Artificial Intelligence: A Longformer Approach for Taiwanese Legal Cases

Authors: Hsin Lee, Hsuan Lee

Abstract:

Legal artificial intelligence (LegalAI) has been increasing applications within legal systems, propelled by advancements in natural language processing (NLP). Compared with general documents, legal case documents are typically long text sequences with intrinsic logical structures. Most existing language models have difficulty understanding the long-distance dependencies between different structures. Another unique challenge is that while the Judiciary of Taiwan has released legal judgments from various levels of courts over the years, there remains a significant obstacle in the lack of labeled datasets. This deficiency makes it difficult to train models with strong generalization capabilities, as well as accurately evaluate model performance. To date, models in Taiwan have yet to be specifically trained on judgment data. Given these challenges, this research proposes a Longformer-based pre-trained language model explicitly devised for retrieving similar judgments in Taiwanese legal documents. This model is trained on a self-constructed dataset, which this research has independently labeled to measure judgment similarities, thereby addressing a void left by the lack of an existing labeled dataset for Taiwanese judgments. This research adopts strategies such as early stopping and gradient clipping to prevent overfitting and manage gradient explosion, respectively, thereby enhancing the model's performance. The model in this research is evaluated using both the dataset and the Average Entropy of Offense-charged Clustering (AEOC) metric, which utilizes the notion of similar case scenarios within the same type of legal cases. Our experimental results illustrate our model's significant advancements in handling similarity comparisons within extensive legal judgments. By enabling more efficient retrieval and analysis of legal case documents, our model holds the potential to facilitate legal research, aid legal decision-making, and contribute to the further development of LegalAI in Taiwan.

Keywords: legal artificial intelligence, computation and language, language model, Taiwanese legal cases

Procedia PDF Downloads 47
129 Assessing the Impact of Low Carbon Technology Integration on Electricity Distribution Networks: Advancing towards Local Area Energy Planning

Authors: Javier Sandoval Bustamante, Pardis Sheikhzadeh, Vijayanarasimha Hindupur Pakka

Abstract:

In the pursuit of achieving net-zero carbon emissions, the integration of low carbon technologies into electricity distribution networks is paramount. This paper delves into the critical assessment of how the integration of low carbon technologies, such as heat pumps, electric vehicle chargers, and photovoltaic systems, impacts the infrastructure and operation of electricity distribution networks. The study employs rigorous methodologies, including power flow analysis and headroom analysis, to evaluate the feasibility and implications of integrating these technologies into existing distribution systems. Furthermore, the research utilizes Local Area Energy Planning (LAEP) methodologies to guide local authorities and distribution network operators in formulating effective plans to meet regional and national decarbonization objectives. Geospatial analysis techniques, coupled with building physics and electric energy systems modeling, are employed to develop geographic datasets aimed at informing the deployment of low carbon technologies at the local level. Drawing upon insights from the Local Energy Net Zero Accelerator (LENZA) project, a comprehensive case study illustrates the practical application of these methodologies in assessing the rollout potential of LCTs. The findings not only shed light on the technical feasibility of integrating low carbon technologies but also provide valuable insights into the broader transition towards a sustainable and electrified energy future. This paper contributes to the advancement of knowledge in power electrical engineering by providing empirical evidence and methodologies to support the integration of low carbon technologies into electricity distribution networks. The insights gained are instrumental for policymakers, utility companies, and stakeholders involved in navigating the complex challenges of energy transition and achieving long-term sustainability goals.

Keywords: energy planning, energy systems, digital twins, power flow analysis, headroom analysis

Procedia PDF Downloads 11
128 Evaluation of Important Transcription Factors and Kinases in Regulating the Signaling Pathways of Cancer Stem Cells With Low and High Proliferation Rate Derived From Colorectal Cancer

Authors: Mohammad Hossein Habibi, Atena Sadat Hosseini

Abstract:

Colorectal cancer is the third leading cause of cancer-related death in the world. Colorectal cancer screening, early detection, and treatment programs could benefit from the most up-to-date information on the disease's burden, given the present worldwide trend of increasing colorectal cancer incidence. Tumor recurrence and resistance are exacerbated by the presence of chemotherapy-resistant cancer stem cells that can generate rapidly proliferating tumor cells. In addition, tumor cells can evolve chemoresistance through adaptation mechanisms. In this work, we used in silico analysis to select suitable GEO datasets. In this study, we compared slow-growing cancer stem cells with high-growth colorectal cancer-derived cancer stem cells. We then evaluated the signal pathways, transcription factors, and kinases associated with these two types of cancer stem cells. A total of 980 upregulated genes and 870 downregulated genes were clustered. MAPK signaling pathway, AGE-RAGE signaling pathway in diabetic complications, Fc gamma R-mediated phagocytosis, and Steroid biosynthesis signaling pathways were observed in upregulated genes. Also, caffeine metabolism, amino sugar and nucleotide sugar metabolism, TNF signaling pathway, and cytosolic DNA-sensing pathway were involved in downregulated genes. In the next step, we evaluated the best transcription factors and kinases in two types of cancer stem cells. In this regard, NR2F2, ZEB2, HEY1, and HDGF as transcription factors and PRDM5, SMAD, CBP, and KDM2B as critical kinases in upregulated genes. On the other hand, IRF1, SPDEF, NCOA1, and STAT1 transcription factors and CTNNB1 and CDH7 kinases were regulated low expression genes. Using bioinformatics analysis in the present study, we conducted an in-depth study of colorectal cancer stem cells at low and high growth rates so that we could take further steps to detect and even target these cells. Naturally, more additional tests are needed in this direction.

Keywords: colorectal cancer, bioinformatics analysis, transcription factor, kinases, cancer stem cells

Procedia PDF Downloads 99
127 Combining Multiscale Patterns of Weather and Sea States into a Machine Learning Classifier for Mid-Term Prediction of Extreme Rainfall in North-Western Mediterranean Sea

Authors: Pinel Sebastien, Bourrin François, De Madron Du Rieu Xavier, Ludwig Wolfgang, Arnau Pedro

Abstract:

Heavy precipitation constitutes a major meteorological threat in the western Mediterranean. Research has investigated the relationship between the states of the Mediterranean Sea and the atmosphere with the precipitation for short temporal windows. However, at a larger temporal scale, the precursor signals of heavy rainfall in the sea and atmosphere have drawn little attention. Moreover, despite ongoing improvements in numerical weather prediction, the medium-term forecasting of rainfall events remains a difficult task. Here, we aim to investigate the influence of early-spring environmental parameters on the following autumnal heavy precipitations. Hence, we develop a machine learning model to predict extreme autumnal rainfall with a 6-month lead time over the Spanish Catalan coastal area, based on i) the sea pattern (main current-LPC and Sea Surface Temperature-SST) at the mesoscale scale, ii) 4 European weather teleconnection patterns (NAO, WeMo, SCAND, MO) at synoptic scale, and iii) the hydrological regime of the main local river (Rhône River). The accuracy of the developed model classifier is evaluated via statistical analysis based on classification accuracy, logarithmic and confusion matrix by comparing with rainfall estimates from rain gauges and satellite observations (CHIRPS-2.0). Sensitivity tests are carried out by changing the model configuration, such as sea SST, sea LPC, river regime, and synoptic atmosphere configuration. The sensitivity analysis suggests a negligible influence from the hydrological regime, unlike SST, LPC, and specific teleconnection weather patterns. At last, this study illustrates how public datasets can be integrated into a machine learning model for heavy rainfall prediction and can interest local policies for management purposes.

Keywords: extreme hazards, sensitivity analysis, heavy rainfall, machine learning, sea-atmosphere modeling, precipitation forecasting

Procedia PDF Downloads 111
126 Comparison of Patient Stay at Withy Bush Same Day Emergency Care and Then Those at the Emergency Department

Authors: Joshua W. Edefo, Shafiul Azam, Murray D. Smith

Abstract:

Introduction: In April 2022, the Welsh Government introduced the six goals for urgent and emergency care programs. One of these goals was to provide access to clinically safe alternatives, leading to the establishment of the Same Day Emergency Care (SDEC) program. The SDEC initiative aims to offer viable options that maintain patient safety while avoiding unnecessary hospital stays. The aim of the study is to determine the duration of patient stay in SDEC and compare it with that of Emergency department (ED) stay to ascertain if one of the objectives of SDEC is achieved. Methods: Patient stays and attendance datasets were constructed from Withybush SDEC and ED patient records. These records were provided by Hywel Dda University Health Board Informatics. Some hypothetical pathways were identified, notably SDEC visits involving a single attendance and ED visits then immediately transferred to SDEC. Descriptive statistics were used to summarise the data, and hypothesis tests for mean differences used the student t-test. Propensity scoring was employed to match a set of ED patient stays to SDEC patient stays which were then used to determine the average treatment effect (ATE) to compare durations of stay in SDEC with ED. Regression methods were used to model the natural logarithm of the duration of SDEC attendance, and the level of statistical significance was set to 0.05. Results: SDEC visits involving a single attendance (170 of 384; 44.3%) is the most frequently observed pathway with patient length of stay at 256 minutes (95%CI 237.4 - 275.1). The next most frequently observed pathway of patient stay was SDEC attendance after presenting to ED (80 of 384; 20.8%) and gave the length of stay of 440 minutes (95%CI 351.6 - 529.2). Time spent in this pathway significantly increased by 184 minutes (95%CI 118.0 - 250.2, support for no difference p<0.001) compared to the most seen pathway. When SDEC data were compared with ED, the estimate for the ATE from SDEC single attendance was -272 minutes (95%CI -334.1 - -210.5; p<0.001), while that of ED then SDEC pathway was -50.6 min (95%CI -182.7-81.5; p=0.453). Conclusion: When patients are admitted to SDEC and successfully discharged, their stays are significantly shorter, approximately 4.5 hours, compared to patients who spend their entire stay in the Emergency Department. That difference vanishes when the patient stay includes a period spent previously in ED before transfer to SDEC.

Keywords: attendance, emergency-department, patient-stay, same-day-emergency-care

Procedia PDF Downloads 20
125 Computational Pipeline for Lynch Syndrome Detection: Integrating Alignment, Variant Calling, and Annotations

Authors: Rofida Gamal, Mostafa Mohammed, Mariam Adel, Marwa Gamal, Marwa kamal, Ayat Saber, Maha Mamdouh, Amira Emad, Mai Ramadan

Abstract:

Lynch Syndrome is an inherited genetic condition associated with an increased risk of colorectal and other cancers. Detecting Lynch Syndrome in individuals is crucial for early intervention and preventive measures. This study proposes a computational pipeline for Lynch Syndrome detection by integrating alignment, variant calling, and annotation. The pipeline leverages popular tools such as FastQC, Trimmomatic, BWA, bcftools, and ANNOVAR to process the input FASTQ file, perform quality trimming, align reads to the reference genome, call variants, and annotate them. It is believed that the computational pipeline was applied to a dataset of Lynch Syndrome cases, and its performance was evaluated. It is believed that the quality check step ensured the integrity of the sequencing data, while the trimming process is thought to have removed low-quality bases and adaptors. In the alignment step, it is believed that the reads were accurately mapped to the reference genome, and the subsequent variant calling step is believed to have identified potential genetic variants. The annotation step is believed to have provided functional insights into the detected variants, including their effects on known Lynch Syndrome-associated genes. The results obtained from the pipeline revealed Lynch Syndrome-related positions in the genome, providing valuable information for further investigation and clinical decision-making. The pipeline's effectiveness was demonstrated through its ability to streamline the analysis workflow and identify potential genetic markers associated with Lynch Syndrome. It is believed that the computational pipeline presents a comprehensive and efficient approach to Lynch Syndrome detection, contributing to early diagnosis and intervention. The modularity and flexibility of the pipeline are believed to enable customization and adaptation to various datasets and research settings. Further optimization and validation are believed to be necessary to enhance performance and applicability across diverse populations.

Keywords: Lynch Syndrome, computational pipeline, alignment, variant calling, annotation, genetic markers

Procedia PDF Downloads 47
124 Comparative Analysis of Reinforcement Learning Algorithms for Autonomous Driving

Authors: Migena Mana, Ahmed Khalid Syed, Abdul Malik, Nikhil Cherian

Abstract:

In recent years, advancements in deep learning enabled researchers to tackle the problem of self-driving cars. Car companies use huge datasets to train their deep learning models to make autonomous cars a reality. However, this approach has certain drawbacks in that the state space of possible actions for a car is so huge that there cannot be a dataset for every possible road scenario. To overcome this problem, the concept of reinforcement learning (RL) is being investigated in this research. Since the problem of autonomous driving can be modeled in a simulation, it lends itself naturally to the domain of reinforcement learning. The advantage of this approach is that we can model different and complex road scenarios in a simulation without having to deploy in the real world. The autonomous agent can learn to drive by finding the optimal policy. This learned model can then be easily deployed in a real-world setting. In this project, we focus on three RL algorithms: Q-learning, Deep Deterministic Policy Gradient (DDPG), and Proximal Policy Optimization (PPO). To model the environment, we have used TORCS (The Open Racing Car Simulator), which provides us with a strong foundation to test our model. The inputs to the algorithms are the sensor data provided by the simulator such as velocity, distance from side pavement, etc. The outcome of this research project is a comparative analysis of these algorithms. Based on the comparison, the PPO algorithm gives the best results. When using PPO algorithm, the reward is greater, and the acceleration, steering angle and braking are more stable compared to the other algorithms, which means that the agent learns to drive in a better and more efficient way in this case. Additionally, we have come up with a dataset taken from the training of the agent with DDPG and PPO algorithms. It contains all the steps of the agent during one full training in the form: (all input values, acceleration, steering angle, break, loss, reward). This study can serve as a base for further complex road scenarios. Furthermore, it can be enlarged in the field of computer vision, using the images to find the best policy.

Keywords: autonomous driving, DDPG (deep deterministic policy gradient), PPO (proximal policy optimization), reinforcement learning

Procedia PDF Downloads 120
123 Embedded Visual Perception for Autonomous Agricultural Machines Using Lightweight Convolutional Neural Networks

Authors: René A. Sørensen, Søren Skovsen, Peter Christiansen, Henrik Karstoft

Abstract:

Autonomous agricultural machines act in stochastic surroundings and therefore, must be able to perceive the surroundings in real time. This perception can be achieved using image sensors combined with advanced machine learning, in particular Deep Learning. Deep convolutional neural networks excel in labeling and perceiving color images and since the cost of high-quality RGB-cameras is low, the hardware cost of good perception depends heavily on memory and computation power. This paper investigates the possibility of designing lightweight convolutional neural networks for semantic segmentation (pixel wise classification) with reduced hardware requirements, to allow for embedded usage in autonomous agricultural machines. Using compression techniques, a lightweight convolutional neural network is designed to perform real-time semantic segmentation on an embedded platform. The network is trained on two large datasets, ImageNet and Pascal Context, to recognize up to 400 individual classes. The 400 classes are remapped into agricultural superclasses (e.g. human, animal, sky, road, field, shelterbelt and obstacle) and the ability to provide accurate real-time perception of agricultural surroundings is studied. The network is applied to the case of autonomous grass mowing using the NVIDIA Tegra X1 embedded platform. Feeding case-specific images to the network results in a fully segmented map of the superclasses in the image. As the network is still being designed and optimized, only a qualitative analysis of the method is complete at the abstract submission deadline. Proceeding this deadline, the finalized design is quantitatively evaluated on 20 annotated grass mowing images. Lightweight convolutional neural networks for semantic segmentation can be implemented on an embedded platform and show competitive performance with regards to accuracy and speed. It is feasible to provide cost-efficient perceptive capabilities related to semantic segmentation for autonomous agricultural machines.

Keywords: autonomous agricultural machines, deep learning, safety, visual perception

Procedia PDF Downloads 366
122 Crime Prevention with Artificial Intelligence

Authors: Mehrnoosh Abouzari, Shahrokh Sahraei

Abstract:

Today, with the increase in quantity and quality and variety of crimes, the discussion of crime prevention has faced a serious challenge that human resources alone and with traditional methods will not be effective. One of the developments in the modern world is the presence of artificial intelligence in various fields, including criminal law. In fact, the use of artificial intelligence in criminal investigations and fighting crime is a necessity in today's world. The use of artificial intelligence is far beyond and even separate from other technologies in the struggle against crime. Second, its application in criminal science is different from the discussion of prevention and it comes to the prediction of crime. Crime prevention in terms of the three factors of the offender, the offender and the victim, following a change in the conditions of the three factors, based on the perception of the criminal being wise, and therefore increasing the cost and risk of crime for him in order to desist from delinquency or to make the victim aware of self-care and possibility of exposing him to danger or making it difficult to commit crimes. While the presence of artificial intelligence in the field of combating crime and social damage and dangers, like an all-seeing eye, regardless of time and place, it sees the future and predicts the occurrence of a possible crime, thus prevent the occurrence of crimes. The purpose of this article is to collect and analyze the studies conducted on the use of artificial intelligence in predicting and preventing crime. How capable is this technology in predicting crime and preventing it? The results have shown that the artificial intelligence technologies in use are capable of predicting and preventing crime and can find patterns in the data set. find large ones in a much more efficient way than humans. In crime prediction and prevention, the term artificial intelligence can be used to refer to the increasing use of technologies that apply algorithms to large sets of data to assist or replace police. The use of artificial intelligence in our debate is in predicting and preventing crime, including predicting the time and place of future criminal activities, effective identification of patterns and accurate prediction of future behavior through data mining, machine learning and deep learning, and data analysis, and also the use of neural networks. Because the knowledge of criminologists can provide insight into risk factors for criminal behavior, among other issues, computer scientists can match this knowledge with the datasets that artificial intelligence uses to inform them.

Keywords: artificial intelligence, criminology, crime, prevention, prediction

Procedia PDF Downloads 58