Search results for: bioinformatic predictions
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 668

Search results for: bioinformatic predictions

668 Bioinformatic Screening of Metagenomic Fosmid Libraries for Identification of Biosynthetic Pathways Derived from the Colombian Soils

Authors: María Fernanda Quiceno Vallejo, Patricia del Portillo, María Mercedes Zambrano, Jeisson Alejandro Triana, Dayana Calderon, Juan Manuel Anzola

Abstract:

Microorganisms from tropical ecosystems can be novel in terms of adaptations and conservation. Given the macrodiversity of Colombian ecosystems, it is possible that this diversity is also present in Colombian soils. Tropical soil bacteria could offer a potentially novel source of bioactive compounds. In this study we analyzed a metagenomic fosmid library constructed with tropical bacterial DNAs with the aim of understanding its underlying diversity and functional potential. 8640 clones from the fosmid library were sequenced by NANOPORE MiniOn technology, then analyzed with bioinformatic tools such as Prokka, AntiSMASH and Bagel4 in order to identify functional biosynthetic pathways in the sequences. The strains showed ample difference when it comes to biosynthetic pathways. In total we identified 4 pathways related to aryl polyene synthesis, 12 related to terpenes, 22 related to NRPs (Non ribosomal peptides), 11 related PKs (Polyketide synthases) and 7 related to RiPPs (bacteriocins). We designed primers for the metagenomic clones with the most BGCs (sample 6 and sample 2). Results show the biotechnological / pharmacological potential of tropical ecosystems. Overall, this work provides an overview of the genomic and functional potential of Colombian soil and sets the groundwork for additional exploration of tropical metagenomic sequencing.

Keywords: bioactives, biosyntethic pathways, bioinformatic, bacterial gene clusters, secondary metabolites

Procedia PDF Downloads 164
667 Bioinformatic Approaches in Population Genetics and Phylogenetic Studies

Authors: Masoud Sheidai

Abstract:

Biologists with a special field of population genetics and phylogeny have different research tasks such as populations’ genetic variability and divergence, species relatedness, the evolution of genetic and morphological characters, and identification of DNA SNPs with adaptive potential. To tackle these problems and reach a concise conclusion, they must use the proper and efficient statistical and bioinformatic methods as well as suitable genetic and morphological characteristics. In recent years application of different bioinformatic and statistical methods, which are based on various well-documented assumptions, are the proper analytical tools in the hands of researchers. The species delineation is usually carried out with the use of different clustering methods like K-means clustering based on proper distance measures according to the studied features of organisms. A well-defined species are assumed to be separated from the other taxa by molecular barcodes. The species relationships are studied by using molecular markers, which are analyzed by different analytical methods like multidimensional scaling (MDS) and principal coordinate analysis (PCoA). The species population structuring and genetic divergence are usually investigated by PCoA and PCA methods and a network diagram. These are based on bootstrapping of data. The Association of different genes and DNA sequences to ecological and geographical variables is determined by LFMM (Latent factor mixed model) and redundancy analysis (RDA), which are based on Bayesian and distance methods. Molecular and morphological differentiating characters in the studied species may be identified by linear discriminant analysis (DA) and discriminant analysis of principal components (DAPC). We shall illustrate these methods and related conclusions by giving examples from different edible and medicinal plant species.

Keywords: GWAS analysis, K-Means clustering, LFMM, multidimensional scaling, redundancy analysis

Procedia PDF Downloads 122
666 A Computational Approach for the Prediction of Relevant Olfactory Receptors in Insects

Authors: Zaide Montes Ortiz, Jorge Alberto Molina, Alejandro Reyes

Abstract:

Insects are extremely successful organisms. A sophisticated olfactory system is in part responsible for their survival and reproduction. The detection of volatile organic compounds can positively or negatively affect many behaviors in insects. Compounds such as carbon dioxide (CO2), ammonium, indol, and lactic acid are essential for many species of mosquitoes like Anopheles gambiae in order to locate vertebrate hosts. For instance, in A. gambiae, the olfactory receptor AgOR2 is strongly activated by indol, which accounts for almost 30% of human sweat. On the other hand, in some insects of agricultural importance, the detection and identification of pheromone receptors (PRs) in lepidopteran species has become a promising field for integrated pest management. For example, with the disruption of the pheromone receptor, BmOR1, mediated by transcription activator-like effector nucleases (TALENs), the sensitivity to bombykol was completely removed affecting the pheromone-source searching behavior in male moths. Then, the detection and identification of olfactory receptors in the genomes of insects is fundamental to improve our understanding of the ecological interactions, and to provide alternatives in the integrated pests and vectors management. Hence, the objective of this study is to propose a bioinformatic workflow to enhance the detection and identification of potential olfactory receptors in genomes of relevant insects. Applying Hidden Markov models (Hmms) and different computational tools, potential candidates for pheromone receptors in Tuta absoluta were obtained, as well as potential carbon dioxide receptors in Rhodnius prolixus, the main vector of Chagas disease. This study showed the validity of a bioinformatic workflow with a potential to improve the identification of certain olfactory receptors in different orders of insects.

Keywords: bioinformatic workflow, insects, olfactory receptors, protein prediction

Procedia PDF Downloads 148
665 Application of Smplify-X Algorithm with Enhanced Gender Classifier in 3D Human Pose Estimation

Authors: Jiahe Liu, Hongyang Yu, Miao Luo, Feng Qian

Abstract:

The widespread application of 3D human body reconstruction spans various fields. Smplify-X, an algorithm reliant on single-image input, employs three distinct body parameter templates, necessitating gender classification of individuals within the input image. Researchers employed a ResNet18 network to train a gender classifier within the Smplify-X framework, setting the threshold at 0.9, designating images falling below this threshold as having neutral gender. This model achieved 62.38% accurate predictions and 7.54% incorrect predictions. Our improvement involved refining the MobileNet network, resulting in a raised threshold of 0.97. Consequently, we attained 78.89% accurate predictions and a mere 0.2% incorrect predictions, markedly enhancing prediction precision and enabling more precise 3D human body reconstruction.

Keywords: SMPLX, mobileNet, gender classification, 3D human reconstruction

Procedia PDF Downloads 98
664 Analysis of Osmotin as Transcription Factor/Cell Signaling Modulator Using Bioinformatic Tools

Authors: Usha Kiran, M. Z. Abdin

Abstract:

Osmotin is an abundant cationic multifunctional protein discovered in cells of tobacco (Nicotiana tabacum L. var Wisconsin 38) adapted to an environment of low osmotic potential. It provides plants protection from pathogens, hence placed in the PRP family of proteins. The osmotin induced proline accumulation has been reported in plants including transgenic tomato and strawberry conferring tolerance against both biotic and abiotic stresses. The exact mechanism of induction of proline by osmotin is however, not known till date. These observations have led us to hypothesize that osmotin induced proline accumulation could be due to its involvement as transcription factor and/or cell signal pathway modulator in proline biosynthesis. The present investigation was therefore, undertaken to analyze the osmotin protein as transcription factor /cell signalling modulator using bioinformatics tools. The results of available online DNA binding motif search programs revealed that osmotin does not contain DNA-binding motifs. The alignment results of osmotin protein with the protein sequence from DATF showed the homology in the range of 0-20%, suggesting that it might not contain a DNA binding motif. Further to find unique DNA-binding domain, the superimposition of osmotin 3D structure on modeled Arabidopsis transcription factors using Chimera also suggested absence of the same. We, however, found evidence implicating osmotin in cell signaling. With these results, we concluded that osmotin is not a transcription factor but regulating proline biosynthesis and accumulation through cell signaling during abiotic stresses.

Keywords: osmotin, cell signaling modulator, bioinformatic tools, protein

Procedia PDF Downloads 271
663 Bioinformatic Prediction of Hub Genes by Analysis of Signaling Pathways, Transcriptional Regulatory Networks and DNA Methylation Pattern in Colon Cancer

Authors: Ankan Roy, Niharika, Samir Kumar Patra

Abstract:

Anomalous nexus of complex topological assemblies and spatiotemporal epigenetic choreography at chromosomal territory may forms the most sophisticated regulatory layer of gene expression in cancer. Colon cancer is one of the leading malignant neoplasms of the lower gastrointestinal tract worldwide. There is still a paucity of information about the complex molecular mechanisms of colonic cancerogenesis. Bioinformatics prediction and analysis helps to identify essential genes and significant pathways for monitoring and conquering this deadly disease. The present study investigates and explores potential hub genes as biomarkers and effective therapeutic targets for colon cancer treatment. Colon cancer patient sample containing gene expression profile datasets, such as GSE44076, GSE20916, and GSE37364 were downloaded from Gene Expression Omnibus (GEO) database and thoroughly screened using the GEO2R tool and Funrich software to find out common 2 differentially expressed genes (DEGs). Other approaches, including Gene Ontology (GO) and KEGG pathway analysis, Protein-Protein Interaction (PPI) network construction and hub gene investigation, Overall Survival (OS) analysis, gene correlation analysis, methylation pattern analysis, and hub gene-Transcription factors regulatory network construction, were performed and validated using various bioinformatics tool. Initially, we identified 166 DEGs, including 68 up-regulated and 98 down-regulated genes. Up-regulated genes are mainly associated with the Cytokine-cytokine receptor interaction, IL17 signaling pathway, ECM-receptor interaction, Focal adhesion and PI3K-Akt pathway. Downregulated genes are enriched in metabolic pathways, retinol metabolism, Steroid hormone biosynthesis, and bile secretion. From the protein-protein interaction network, thirty hub genes with high connectivity are selected using the MCODE and cytoHubba plugin. Survival analysis, expression validation, correlation analysis, and methylation pattern analysis were further verified using TCGA data. Finally, we predicted COL1A1, COL1A2, COL4A1, SPP1, SPARC, and THBS2 as potential master regulators in colonic cancerogenesis. Moreover, our experimental data highlights that disruption of lipid raft and RAS/MAPK signaling cascade affects this gene hub at mRNA level. We identified COL1A1, COL1A2, COL4A1, SPP1, SPARC, and THBS2 as determinant hub genes in colon cancer progression. They can be considered as biomarkers for diagnosis and promising therapeutic targets in colon cancer treatment. Additionally, our experimental data advertise that signaling pathway act as connecting link between membrane hub and gene hub.

Keywords: hub genes, colon cancer, DNA methylation, epigenetic engineering, bioinformatic predictions

Procedia PDF Downloads 128
662 Multivariate Output-Associative RVM for Multi-Dimensional Affect Predictions

Authors: Achut Manandhar, Kenneth D. Morton, Peter A. Torrione, Leslie M. Collins

Abstract:

The current trends in affect recognition research are to consider continuous observations from spontaneous natural interactions in people using multiple feature modalities, and to represent affect in terms of continuous dimensions, incorporate spatio-temporal correlation among affect dimensions, and provide fast affect predictions. These research efforts have been propelled by a growing effort to develop affect recognition system that can be implemented to enable seamless real-time human-computer interaction in a wide variety of applications. Motivated by these desired attributes of an affect recognition system, in this work a multi-dimensional affect prediction approach is proposed by integrating multivariate Relevance Vector Machine (MVRVM) with a recently developed Output-associative Relevance Vector Machine (OARVM) approach. The resulting approach can provide fast continuous affect predictions by jointly modeling the multiple affect dimensions and their correlations. Experiments on the RECOLA database show that the proposed approach performs competitively with the OARVM while providing faster predictions during testing.

Keywords: dimensional affect prediction, output-associative RVM, multivariate regression, fast testing

Procedia PDF Downloads 284
661 Metrology-Inspired Methods to Assess the Biases of Artificial Intelligence Systems

Authors: Belkacem Laimouche

Abstract:

With the field of artificial intelligence (AI) experiencing exponential growth, fueled by technological advancements that pave the way for increasingly innovative and promising applications, there is an escalating need to develop rigorous methods for assessing their performance in pursuit of transparency and equity. This article proposes a metrology-inspired statistical framework for evaluating bias and explainability in AI systems. Drawing from the principles of metrology, we propose a pioneering approach, using a concrete example, to evaluate the accuracy and precision of AI models, as well as to quantify the sources of measurement uncertainty that can lead to bias in their predictions. Furthermore, we explore a statistical approach for evaluating the explainability of AI systems based on their ability to provide interpretable and transparent explanations of their predictions.

Keywords: artificial intelligence, metrology, measurement uncertainty, prediction error, bias, machine learning algorithms, probabilistic models, interlaboratory comparison, data analysis, data reliability, measurement of bias impact on predictions, improvement of model accuracy and reliability

Procedia PDF Downloads 103
660 Uncertainty Estimation in Neural Networks through Transfer Learning

Authors: Ashish James, Anusha James

Abstract:

The impressive predictive performance of deep learning techniques on a wide range of tasks has led to its widespread use. Estimating the confidence of these predictions is paramount for improving the safety and reliability of such systems. However, the uncertainty estimates provided by neural networks (NNs) tend to be overconfident and unreasonable. Ensemble of NNs typically produce good predictions but uncertainty estimates tend to be inconsistent. Inspired by these, this paper presents a framework that can quantitatively estimate the uncertainties by leveraging the advances in transfer learning through slight modification to the existing training pipelines. This promising algorithm is developed with an intention of deployment in real world problems which already boast a good predictive performance by reusing those pretrained models. The idea is to capture the behavior of the trained NNs for the base task by augmenting it with the uncertainty estimates from a supplementary network. A series of experiments with known and unknown distributions show that the proposed approach produces well calibrated uncertainty estimates with high quality predictions.

Keywords: uncertainty estimation, neural networks, transfer learning, regression

Procedia PDF Downloads 134
659 Research on the Cognition and Actual Phenomenon of School Bullying from the Perspective of Students

Authors: Chia-Chun Wu, Yu-Hsien Sung

Abstract:

This study aims to examine the consistency between students’ predictions and their actual observations on the bullying prevalence rate among different types of high-risk victims, thereby clarifying the reliability of students’ reports on the identification of bullying. A total of 1,732 Taiwanese students (734 males and 998 females) participated in this study. A Rasch model was adopted for data analysis. The results showed that students with “personality or behavioral issues” are more likely to be bullied in schools, based on both students’ predictions and actual observations. Moreover, the results differed significantly between genders and between various educational levels in students’ predictions and their actual observations on the bullying prevalence rate of different types of high-risk victims. To summarize, this study not only suggests that students’ reports on the identification of bullying are accurate and could be a valuable reference in terms of recognizing a bullying incident, but it also argues that more attention should be paid to students’ gender and educational level when taking their perspectives into consideration when it comes to identifying bullying behaviors.

Keywords: school bullying, student, bullying recognition, high-risk victims

Procedia PDF Downloads 83
658 On Improving Breast Cancer Prediction Using GRNN-CP

Authors: Kefaya Qaddoum

Abstract:

The aim of this study is to predict breast cancer and to construct a supportive model that will stimulate a more reliable prediction as a factor that is fundamental for public health. In this study, we utilize general regression neural networks (GRNN) to replace the normal predictions with prediction periods to achieve a reasonable percentage of confidence. The mechanism employed here utilises a machine learning system called conformal prediction (CP), in order to assign consistent confidence measures to predictions, which are combined with GRNN. We apply the resulting algorithm to the problem of breast cancer diagnosis. The results show that the prediction constructed by this method is reasonable and could be useful in practice.

Keywords: neural network, conformal prediction, cancer classification, regression

Procedia PDF Downloads 290
657 Bioinformatic Design of a Non-toxic Modified Adjuvant from the Native A1 Structure of Cholera Toxin with Membrane Synthetic Peptide of Naegleria fowleri

Authors: Frida Carrillo Morales, Maria Maricela Carrasco Yépez, Saúl Rojas Hernández

Abstract:

Naegleria fowleri is the causative agent of primary amebic meningoencephalitis, this disease is acute and fulminant that affects humans. It has been reported that despite the existence of therapeutic options against this disease, its mortality rate is 97%. Therefore, the need arises to have vaccines that confer protection against this disease and, in addition to developing adjuvants to enhance the immune response. In this regard, in our work group, we obtained a peptide designed from the membrane protein MP2CL5 of Naegleria fowleri called Smp145 that was shown to be immunogenic; however, it would be of great importance to enhance its immunological response, being able to co-administer it with a non-toxic adjuvant. Therefore, the objective of this work was to carry out the bioinformatic design of a peptide of the Naegleria fowleri membrane protein MP2CL5 conjugated with a non-toxic modified adjuvant from the native A1 structure of Cholera Toxin. For which different bioinformatics tools were used to obtain a model with a modification in amino acid 61 of the A1 subunit of the CT (CTA1), to which the Smp145 peptide was added and both molecules were joined with a 13-glycine linker. As for the results obtained, the modification in CTA1 bound to the peptide produces a reduction in the toxicity of the molecule in in silico experiments, likewise, the prediction in the binding of Smp145 to the receptor of B cells suggests that the molecule is directed in specifically to the BCR receptor, decreasing its native enzymatic activity. The stereochemical evaluation showed that the generated model has a high number of adequately predicted residues. In the ERRAT test, the confidence with which it is possible to reject regions that exceed the error values was evaluated, in the generated model, a high score was obtained, which determines that the model has a good structural resolution. Therefore, the design of the conjugated peptide in this work will allow us to proceed with its chemical synthesis and subsequently be able to use it in the mouse meningitis protection model caused by N. fowleri.

Keywords: immunology, vaccines, pathogens, infectious disease

Procedia PDF Downloads 91
656 Predictions for the Anisotropy in Thermal Conductivity in Polymers Subjected to Model Flows by Combination of the eXtended Pom-Pom Model and the Stress-Thermal Rule

Authors: David Nieto Simavilla, Wilco M. H. Verbeeten

Abstract:

The viscoelastic behavior of polymeric flows under isothermal conditions has been extensively researched. However, most of the processing of polymeric materials occurs under non-isothermal conditions and understanding the linkage between the thermo-physical properties and the process state variables remains a challenge. Furthermore, the cost and energy required to manufacture, recycle and dispose polymers is strongly affected by the thermo-physical properties and their dependence on state variables such as temperature and stress. Experiments show that thermal conductivity in flowing polymers is anisotropic (i.e. direction dependent). This phenomenon has been previously omitted in the study and simulation of industrially relevant flows. Our work combines experimental evidence of a universal relationship between thermal conductivity and stress tensors (i.e. the stress-thermal rule) with differential constitutive equations for the viscoelastic behavior of polymers to provide predictions for the anisotropy in thermal conductivity in uniaxial, planar, equibiaxial and shear flow in commercial polymers. A particular focus is placed on the eXtended Pom-Pom model which is able to capture the non-linear behavior in both shear and elongation flows. The predictions provided by this approach are amenable to implementation in finite elements packages, since viscoelastic and thermal behavior can be described by a single equation. Our results include predictions for flow-induced anisotropy in thermal conductivity for low and high density polyethylene as well as confirmation of our method through comparison with a number of thermoplastic systems for which measurements of anisotropy in thermal conductivity are available. Remarkably, this approach allows for universal predictions of anisotropy in thermal conductivity that can be used in simulations of complex flows in which only the most fundamental rheological behavior of the material has been previously characterized (i.e. there is no need for additional adjusting parameters other than those in the constitutive model). Accounting for polymers anisotropy in thermal conductivity in industrially relevant flows benefits the optimization of manufacturing processes as well as the mechanical and thermal performance of finalized plastic products during use.

Keywords: anisotropy, differential constitutive models, flow simulations in polymers, thermal conductivity

Procedia PDF Downloads 180
655 Predictions of Values in a Causticizing Process

Authors: R. Andreola, O. A. A. Santos, L. M. M. Jorge

Abstract:

An industrial system for the production of white liquor of a paper industry, Klabin Paraná Papé is, formed by ten reactors was modeled, simulated, and analyzed. The developed model considered possible water losses by evaporation and reaction, in addition to variations in volumetric flow of lime mud across the reactors due to composition variations. The model predictions agreed well with the process measurements at the plant and the results showed that the slaking reaction is nearly complete at the third causticizing reactor, while causticizing ends by the seventh reactor. Water loss due to slaking reaction and evaporation occurs more pronouncedly in the slaking reaction than in the final causticizing reactors; nevertheless, the lime mud flow remains nearly constant across the reactors.

Keywords: causticizing, lime, prediction, process

Procedia PDF Downloads 352
654 Energy Performance Gaps in Residences: An Analysis of the Variables That Cause Energy Gaps and Their Impact

Authors: Amrutha Kishor

Abstract:

Today, with the rising global warming and depletion of resources every industry is moving toward sustainability and energy efficiency. As part of this movement, it is nowadays obligatory for architects to play their part by creating energy predictions for their designs. But in a lot of cases, these predictions do not reflect the real quantities of energy in newly built buildings when operating. These can be described as ‘Energy Performance Gaps’. This study aims to determine the underlying reasons for these gaps. Seven houses designed by Allan Joyce Architects, UK from 1998 until 2019 were considered for this study. The data from the residents’ energy bills were cross-referenced with the predictions made with the software SefairaPro and from energy reports. Results indicated that the predictions did not match the actual energy usage. An account of how energy was used in these seven houses was made by means of personal interviews. The main factors considered in the study were occupancy patterns, heating systems and usage, lighting profile and usage, and appliances’ profile and usage. The study found that the main reasons for the creation of energy gaps were the discrepancies in occupant usage and patterns of energy consumption that are predicted as opposed to the actual ones. This study is particularly useful for energy-conscious architectural firms to fine-tune the approach to designing houses and analysing their energy performance. As the findings reveal that energy usage in homes varies based on the way residents use the space, it helps deduce the most efficient technological combinations. This information can be used to set guidelines for future policies and regulations related to energy consumption in homes. This study can also be used by the developers of simulation software to understand how architects use their product and drive improvements in its future versions.

Keywords: architectural simulation, energy efficient design, energy performance gaps, environmental design

Procedia PDF Downloads 118
653 Visualization-Based Feature Extraction for Classification in Real-Time Interaction

Authors: Ágoston Nagy

Abstract:

This paper introduces a method of using unsupervised machine learning to visualize the feature space of a dataset in 2D, in order to find most characteristic segments in the set. After dimension reduction, users can select clusters by manual drawing. Selected clusters are recorded into a data model that is used for later predictions, based on realtime data. Predictions are made with supervised learning, using Gesture Recognition Toolkit. The paper introduces two example applications: a semantic audio organizer for analyzing incoming sounds, and a gesture database organizer where gestural data (recorded by a Leap motion) is visualized for further manipulation.

Keywords: gesture recognition, machine learning, real-time interaction, visualization

Procedia PDF Downloads 352
652 Predicting Relative Performance of Sector Exchange Traded Funds Using Machine Learning

Authors: Jun Wang, Ge Zhang

Abstract:

Machine learning has been used in many areas today. It thrives at reviewing large volumes of data and identifying patterns and trends that might not be apparent to a human. Given the huge potential benefit and the amount of data available in the financial market, it is not surprising to see machine learning applied to various financial products. While future prices of financial securities are extremely difficult to forecast, we study them from a different angle. Instead of trying to forecast future prices, we apply machine learning algorithms to predict the direction of future price movement, in particular, whether a sector Exchange Traded Fund (ETF) would outperform or underperform the market in the next week or in the next month. We apply several machine learning algorithms for this prediction. The algorithms are Linear Discriminant Analysis (LDA), k-Nearest Neighbors (KNN), Decision Tree (DT), Gaussian Naive Bayes (GNB), and Neural Networks (NN). We show that these machine learning algorithms, most notably GNB and NN, have some predictive power in forecasting out-performance and under-performance out of sample. We also try to explore whether it is possible to utilize the predictions from these algorithms to outperform the buy-and-hold strategy of the S&P 500 index. The trading strategy to explore out-performance predictions does not perform very well, but the trading strategy to explore under-performance predictions can earn higher returns than simply holding the S&P 500 index out of sample.

Keywords: machine learning, ETF prediction, dynamic trading, asset allocation

Procedia PDF Downloads 98
651 Review on Rainfall Prediction Using Machine Learning Technique

Authors: Prachi Desai, Ankita Gandhi, Mitali Acharya

Abstract:

Rainfall forecast is mainly used for predictions of rainfall in a specified area and determining their future rainfall conditions. Rainfall is always a global issue as it affects all major aspects of one's life. Agricultural, fisheries, forestry, tourism industry and other industries are widely affected by these conditions. The studies have resulted in insufficient availability of water resources and an increase in water demand in the near future. We already have a new forecast system that uses the deep Convolutional Neural Network (CNN) to forecast monthly rainfall and climate changes. We have also compared CNN against Artificial Neural Networks (ANN). Machine Learning techniques that are used in rainfall predictions include ARIMA Model, ANN, LR, SVM etc. The dataset on which we are experimenting is gathered online over the year 1901 to 20118. Test results have suggested more realistic improvements than conventional rainfall forecasts.

Keywords: ANN, CNN, supervised learning, machine learning, deep learning

Procedia PDF Downloads 199
650 D-Wave Quantum Computing Ising Model: A Case Study for Forecasting of Heat Waves

Authors: Dmytro Zubov, Francesco Volponi

Abstract:

In this paper, D-Wave quantum computing Ising model is used for the forecasting of positive extremes of daily mean air temperature. Forecast models are designed with two to five qubits, which represent 2-, 3-, 4-, and 5-day historical data respectively. Ising model’s real-valued weights and dimensionless coefficients are calculated using daily mean air temperatures from 119 places around the world, as well as sea level (Aburatsu, Japan). In comparison with current methods, this approach is better suited to predict heat wave values because it does not require the estimation of a probability distribution from scarce observations. Proposed forecast quantum computing algorithm is simulated based on traditional computer architecture and combinatorial optimization of Ising model parameters for the Ronald Reagan Washington National Airport dataset with 1-day lead-time on learning sample (1975-2010 yr). Analysis of the forecast accuracy (ratio of successful predictions to total number of predictions) on the validation sample (2011-2014 yr) shows that Ising model with three qubits has 100 % accuracy, which is quite significant as compared to other methods. However, number of identified heat waves is small (only one out of nineteen in this case). Other models with 2, 4, and 5 qubits have 20 %, 3.8 %, and 3.8 % accuracy respectively. Presented three-qubit forecast model is applied for prediction of heat waves at other five locations: Aurel Vlaicu, Romania – accuracy is 28.6 %; Bratislava, Slovakia – accuracy is 21.7 %; Brussels, Belgium – accuracy is 33.3 %; Sofia, Bulgaria – accuracy is 50 %; Akhisar, Turkey – accuracy is 21.4 %. These predictions are not ideal, but not zeros. They can be used independently or together with other predictions generated by different method(s). The loss of human life, as well as environmental, economic, and material damage, from extreme air temperatures could be reduced if some of heat waves are predicted. Even a small success rate implies a large socio-economic benefit.

Keywords: heat wave, D-wave, forecast, Ising model, quantum computing

Procedia PDF Downloads 496
649 Exploring the Impact of Input Sequence Lengths on Long Short-Term Memory-Based Streamflow Prediction in Flashy Catchments

Authors: Farzad Hosseini Hossein Abadi, Cristina Prieto Sierra, Cesar Álvarez Díaz

Abstract:

Predicting streamflow accurately in flashy catchments prone to floods is a major research and operational challenge in hydrological modeling. Recent advancements in deep learning, particularly Long Short-Term Memory (LSTM) networks, have shown to be promising in achieving accurate hydrological predictions at daily and hourly time scales. In this work, a multi-timescale LSTM (MTS-LSTM) network was applied to the context of regional hydrological predictions at an hourly time scale in flashy catchments. The case study includes 40 catchments allocated in the Basque Country, north of Spain. We explore the impact of hyperparameters on the performance of streamflow predictions given by regional deep learning models through systematic hyperparameter tuning - where optimal regional values for different catchments are identified. The results show that predictions are highly accurate, with Nash-Sutcliffe (NSE) and Kling-Gupta (KGE) metrics values as high as 0.98 and 0.97, respectively. A principal component analysis reveals that a hyperparameter related to the length of the input sequence contributes most significantly to the prediction performance. The findings suggest that input sequence lengths have a crucial impact on the model prediction performance. Moreover, employing catchment-scale analysis reveals distinct sequence lengths for individual basins, highlighting the necessity of customizing this hyperparameter based on each catchment’s characteristics. This aligns with well known “uniqueness of the place” paradigm. In prior research, tuning the length of the input sequence of LSTMs has received limited focus in the field of streamflow prediction. Initially it was set to 365 days to capture a full annual water cycle. Later, performing limited systematic hyper-tuning using grid search, revealed a modification to 270 days. However, despite the significance of this hyperparameter in hydrological predictions, usually studies have overlooked its tuning and fixed it to 365 days. This study, employing a simultaneous systematic hyperparameter tuning approach, emphasizes the critical role of input sequence length as an influential hyperparameter in configuring LSTMs for regional streamflow prediction. Proper tuning of this hyperparameter is essential for achieving accurate hourly predictions using deep learning models.

Keywords: LSTMs, streamflow, hyperparameters, hydrology

Procedia PDF Downloads 69
648 A Study for Area-level Mosquito Abundance Prediction by Using Supervised Machine Learning Point-level Predictor

Authors: Theoktisti Makridou, Konstantinos Tsaprailis, George Arvanitakis, Charalampos Kontoes

Abstract:

In the literature, the data-driven approaches for mosquito abundance prediction relaying on supervised machine learning models that get trained with historical in-situ measurements. The counterpart of this approach is once the model gets trained on pointlevel (specific x,y coordinates) measurements, the predictions of the model refer again to point-level. These point-level predictions reduce the applicability of those solutions once a lot of early warning and mitigation actions applications need predictions for an area level, such as a municipality, village, etc... In this study, we apply a data-driven predictive model, which relies on public-open satellite Earth Observation and geospatial data and gets trained with historical point-level in-Situ measurements of mosquito abundance. Then we propose a methodology to extract information from a point-level predictive model to a broader area-level prediction. Our methodology relies on the randomly spatial sampling of the area of interest (similar to the Poisson hardcore process), obtaining the EO and geomorphological information for each sample, doing the point-wise prediction for each sample, and aggregating the predictions to represent the average mosquito abundance of the area. We quantify the performance of the transformation from the pointlevel to the area-level predictions, and we analyze it in order to understand which parameters have a positive or negative impact on it. The goal of this study is to propose a methodology that predicts the mosquito abundance of a given area by relying on point-level prediction and to provide qualitative insights regarding the expected performance of the area-level prediction. We applied our methodology to historical data (of Culex pipiens) of two areas of interest (Veneto region of Italy and Central Macedonia of Greece). In both cases, the results were consistent. The mean mosquito abundance of a given area can be estimated with similar accuracy to the point-level predictor, sometimes even better. The density of the samples that we use to represent one area has a positive effect on the performance in contrast to the actual number of sampling points which is not informative at all regarding the performance without the size of the area. Additionally, we saw that the distance between the sampling points and the real in-situ measurements that were used for training did not strongly affect the performance.

Keywords: mosquito abundance, supervised machine learning, culex pipiens, spatial sampling, west nile virus, earth observation data

Procedia PDF Downloads 147
647 Partially-Averaged Navier-Stokes for Computations of Flow Around Three-Dimensional Ahmed Bodies

Authors: Maryam Mirzaei, Sinisa Krajnovic´

Abstract:

The paper reports a study about the prediction of flows around simplified vehicles using Partially-Averaged Navier-Stokes (PANS). Numerical simulations are performed for two simplified vehicles: A slanted-back Ahmed body at Re=30 000 and a square back Ahmed body at Re=300 000. A comparison of the resolved and modeled physical flow scales is made with corresponding LES and experimental data for a better understanding of the performance of the PANS model. The PANS model is compared for coarse and fine grid resolutions and it is indicated that even a coarse-grid PANS simulation is able to produce fairly close flow predictions to those from a well-resolved LES simulation. The results indicate the possibility of improvement of the predictions by employing a finer grid resolution.

Keywords: partially-averaged Navier-Stokes, large eddy simulation, PANS, LES, Ahmed body

Procedia PDF Downloads 597
646 Patient-Specific Modeling Algorithm for Medical Data Based on AUC

Authors: Guilherme Ribeiro, Alexandre Oliveira, Antonio Ferreira, Shyam Visweswaran, Gregory Cooper

Abstract:

Patient-specific models are instance-based learning algorithms that take advantage of the particular features of the patient case at hand to predict an outcome. We introduce two patient-specific algorithms based on decision tree paradigm that use AUC as a metric to select an attribute. We apply the patient specific algorithms to predict outcomes in several datasets, including medical datasets. Compared to the patient-specific decision path (PSDP) entropy-based and CART methods, the AUC-based patient-specific decision path models performed equivalently on area under the ROC curve (AUC). Our results provide support for patient-specific methods being a promising approach for making clinical predictions.

Keywords: approach instance-based, area under the ROC curve, patient-specific decision path, clinical predictions

Procedia PDF Downloads 477
645 A Non-Linear Eddy Viscosity Model for Turbulent Natural Convection in Geophysical Flows

Authors: J. P. Panda, K. Sasmal, H. V. Warrior

Abstract:

Eddy viscosity models in turbulence modeling can be mainly classified as linear and nonlinear models. Linear formulations are simple and require less computational resources but have the disadvantage that they cannot predict actual flow pattern in complex geophysical flows where streamline curvature and swirling motion are predominant. A constitutive equation of Reynolds stress anisotropy is adopted for the formulation of eddy viscosity including all the possible higher order terms quadratic in the mean velocity gradients, and a simplified model is developed for actual oceanic flows where only the vertical velocity gradients are important. The new model is incorporated into the one dimensional General Ocean Turbulence Model (GOTM). Two realistic oceanic test cases (OWS Papa and FLEX' 76) have been investigated. The new model predictions match well with the observational data and are better in comparison to the predictions of the two equation k-epsilon model. The proposed model can be easily incorporated in the three dimensional Princeton Ocean Model (POM) to simulate a wide range of oceanic processes. Practically, this model can be implemented in the coastal regions where trasverse shear induces higher vorticity, and for prediction of flow in estuaries and lakes, where depth is comparatively less. The model predictions of marine turbulence and other related data (e.g. Sea surface temperature, Surface heat flux and vertical temperature profile) can be utilized in short term ocean and climate forecasting and warning systems.

Keywords: Eddy viscosity, turbulence modeling, GOTM, CFD

Procedia PDF Downloads 201
644 Primer Design for the Detection of Secondary Metabolite Biosynthetic Pathways in Metagenomic Data

Authors: Jeisson Alejandro Triana, Maria Fernanda Quiceno Vallejo, Patricia del Portillo, Juan Manuel Anzola

Abstract:

Most of the known antimicrobials so far discovered are secondary metabolites. The potential for new natural products of this category increases as new microbial genomes and metagenomes are being sequenced. Despite the advances, there is no systematic way to interrogate metagenomic clones for their potential to contain clusters of genes related to these pathways. Here we analyzed 52 biosynthetic pathways from the AntiSMASH database at the protein domain level in order to identify domains of high specificity and sensitivity with respect to specific biosynthetic pathways. These domains turned out to have various degrees of divergence at the DNA level. We propose PCR assays targetting such domains in-silico and corroborated one by Sanger sequencing.

Keywords: bioinformatic, anti smash, antibiotics, secondary metabolites, natural products, protein domains

Procedia PDF Downloads 176
643 Bioinformatic Strategies for the Production of Glycoproteins in Algae

Authors: Fadi Saleh, Çığdem Sezer Zhmurov

Abstract:

Biopharmaceuticals represent one of the wildest developing fields within biotechnology, and the biological macromolecules being produced inside cells have a variety of applications for therapies. In the past, mammalian cells, especially CHO cells, have been employed in the production of biopharmaceuticals. This is because these cells can achieve human-like completion of PTM. These systems, however, carry apparent disadvantages like high production costs, vulnerability to contamination, and limitations in scalability. This research is focused on the utilization of microalgae as a bioreactor system for the synthesis of biopharmaceutical glycoproteins in relation to PTMs, particularly N-glycosylation. The research points to a growing interest in microalgae as a potential substitute for more conventional expression systems. A number of advantages exist in the use of microalgae, including rapid growth rates, the lack of common human pathogens, controlled scalability in bioreactors, and the ability of some PTMs to take place. Thus, the potential of microalgae to produce recombinant proteins with favorable characteristics makes this a promising platform in order to produce biopharmaceuticals. The study focuses on the examination of the N-glycosylation pathways across different species of microalgae. This investigation is important as N-glycosylation—the process by which carbohydrate groups are linked to proteins—profoundly influences the stability, activity, and general performance of glycoproteins. Additionally, bioinformatics methodologies are employed to explain the genetic pathways implicated in N-glycosylation within microalgae, with the intention of modifying these organisms to produce glycoproteins suitable for human consumption. In this way, the present comparative analysis of the N-glycosylation pathway in humans and microalgae can be used to bridge both systems in order to produce biopharmaceuticals with humanized glycosylation profiles within the microalgal organisms. The results of the research underline microalgae's potential to help improve some of the limitations associated with traditional biopharmaceutical production systems. The study may help in the creation of a cost-effective and scale-up means of producing quality biopharmaceuticals by modifying microalgae genetically to produce glycoproteins with N-glycosylation that is compatible with humans. Improvements in effectiveness will benefit biopharmaceutical production and the biopharmaceutical sector with this novel, green, and efficient expression platform. This thesis, therefore, is thorough research into the viability of microalgae as an efficient platform for producing biopharmaceutical glycoproteins. Based on the in-depth bioinformatic analysis of microalgal N-glycosylation pathways, a platform for their engineering to produce human-compatible glycoproteins is set out in this work. The findings obtained in this research will have significant implications for the biopharmaceutical industry by opening up a new way of developing safer, more efficient, and economically more feasible biopharmaceutical manufacturing platforms.

Keywords: microalgae, glycoproteins, post-translational modification, genome

Procedia PDF Downloads 24
642 Hourly Solar Radiations Predictions for Anticipatory Control of Electrically Heated Floor: Use of Online Weather Conditions Forecast

Authors: Helene Thieblemont, Fariborz Haghighat

Abstract:

Energy storage systems play a crucial role in decreasing building energy consumption during peak periods and expand the use of renewable energies in buildings. To provide a high building thermal performance, the energy storage system has to be properly controlled to insure a good energy performance while maintaining a satisfactory thermal comfort for building’s occupant. In the case of passive discharge storages, defining in advance the required amount of energy is required to avoid overheating in the building. Consequently, anticipatory supervisory control strategies have been developed forecasting future energy demand and production to coordinate systems. Anticipatory supervisory control strategies are based on some predictions, mainly of the weather forecast. However, if the forecasted hourly outdoor temperature may be found online with a high accuracy, solar radiations predictions are most of the time not available online. To estimate them, this paper proposes an advanced approach based on the forecast of weather conditions. Several methods to correlate hourly weather conditions forecast to real hourly solar radiations are compared. Results show that using weather conditions forecast allows estimating with an acceptable accuracy solar radiations of the next day. Moreover, this technique allows obtaining hourly data that may be used for building models. As a result, this solar radiation prediction model may help to implement model-based controller as Model Predictive Control.

Keywords: anticipatory control, model predictive control, solar radiation forecast, thermal storage

Procedia PDF Downloads 270
641 Stacking Ensemble Approach for Combining Different Methods in Real Estate Prediction

Authors: Sol Girouard, Zona Kostic

Abstract:

A home is often the largest and most expensive purchase a person makes. Whether the decision leads to a successful outcome will be determined by a combination of critical factors. In this paper, we propose a method that efficiently handles all the factors in residential real estate and performs predictions given a feature space with high dimensionality while controlling for overfitting. The proposed method was built on gradient descent and boosting algorithms and uses a mixed optimizing technique to improve the prediction power. Usually, a single model cannot handle all the cases thus our approach builds multiple models based on different subsets of the predictors. The algorithm was tested on 3 million homes across the U.S., and the experimental results demonstrate the efficiency of this approach by outperforming techniques currently used in forecasting prices. With everyday changes on the real estate market, our proposed algorithm capitalizes from new events allowing more efficient predictions.

Keywords: real estate prediction, gradient descent, boosting, ensemble methods, active learning, training

Procedia PDF Downloads 274
640 Improving Predictions of Coastal Benthic Invertebrate Occurrence and Density Using a Multi-Scalar Approach

Authors: Stephanie Watson, Fabrice Stephenson, Conrad Pilditch, Carolyn Lundquist

Abstract:

Spatial data detailing both the distribution and density of functionally important marine species are needed to inform management decisions. Species distribution models (SDMs) have proven helpful in this regard; however, models often focus only on species occurrences derived from spatially expansive datasets and lack the resolution and detail required to inform regional management decisions. Boosted regression trees (BRT) were used to produce high-resolution SDMs (250 m) at two spatial scales predicting probability of occurrence, abundance (count per sample unit), density (count per km2) and uncertainty for seven coastal seafloor taxa that vary in habitat usage and distribution to examine prediction differences and implications for coastal management. We investigated if small scale regionally focussed models (82,000 km2) can provide improved predictions compared to data-rich national scale models (4.2 million km2). We explored the variability in predictions across model type (occurrence vs abundance) and model scale to determine if specific taxa models or model types are more robust to geographical variability. National scale occurrence models correlated well with broad-scale environmental predictors, resulting in higher AUC (Area under the receiver operating curve) and deviance explained scores; however, they tended to overpredict in the coastal environment and lacked spatially differentiated detail for some taxa. Regional models had lower overall performance, but for some taxa, spatial predictions were more differentiated at a localised ecological scale. National density models were often spatially refined and highlighted areas of ecological relevance producing more useful outputs than regional-scale models. The utility of a two-scale approach aids the selection of the most optimal combination of models to create a spatially informative density model, as results contrasted for specific taxa between model type and scale. However, it is vital that robust predictions of occurrence and abundance are generated as inputs for the combined density model as areas that do not spatially align between models can be discarded. This study demonstrates the variability in SDM outputs created over different geographical scales and highlights implications and opportunities for managers utilising these tools for regional conservation, particularly in data-limited environments.

Keywords: Benthic ecology, spatial modelling, multi-scalar modelling, marine conservation.

Procedia PDF Downloads 75
639 Road Accidents Bigdata Mining and Visualization Using Support Vector Machines

Authors: Usha Lokala, Srinivas Nowduri, Prabhakar K. Sharma

Abstract:

Useful information has been extracted from the road accident data in United Kingdom (UK), using data analytics method, for avoiding possible accidents in rural and urban areas. This analysis make use of several methodologies such as data integration, support vector machines (SVM), correlation machines and multinomial goodness. The entire datasets have been imported from the traffic department of UK with due permission. The information extracted from these huge datasets forms a basis for several predictions, which in turn avoid unnecessary memory lapses. Since data is expected to grow continuously over a period of time, this work primarily proposes a new framework model which can be trained and adapt itself to new data and make accurate predictions. This work also throws some light on use of SVM’s methodology for text classifiers from the obtained traffic data. Finally, it emphasizes the uniqueness and adaptability of SVMs methodology appropriate for this kind of research work.

Keywords: support vector mechanism (SVM), machine learning (ML), support vector machines (SVM), department of transportation (DFT)

Procedia PDF Downloads 274