Search results for: seismic prediction equations
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 4776

Search results for: seismic prediction equations

2886 Method to Find a ε-Optimal Control of Stochastic Differential Equation Driven by a Brownian Motion

Authors: Francys Souza, Alberto Ohashi, Dorival Leao

Abstract:

We present a general solution for finding the ε-optimal controls for non-Markovian stochastic systems as stochastic differential equations driven by Brownian motion, which is a problem recognized as a difficult solution. The contribution appears in the development of mathematical tools to deal with modeling and control of non-Markovian systems, whose applicability in different areas is well known. The methodology used consists to discretize the problem through a random discretization. In this way, we transform an infinite dimensional problem in a finite dimensional, thereafter we use measurable selection arguments, to find a control on an explicit form for the discretized problem. Then, we prove the control found for the discretized problem is a ε-optimal control for the original problem. Our theory provides a concrete description of a rather general class, among the principals, we can highlight financial problems such as portfolio control, hedging, super-hedging, pairs-trading and others. Therefore, our main contribution is the development of a tool to explicitly the ε-optimal control for non-Markovian stochastic systems. The pathwise analysis was made through a random discretization jointly with measurable selection arguments, has provided us with a structure to transform an infinite dimensional problem into a finite dimensional. The theory is applied to stochastic control problems based on path-dependent stochastic differential equations, where both drift and diffusion components are controlled. We are able to explicitly show optimal control with our method.

Keywords: dynamic programming equation, optimal control, stochastic control, stochastic differential equation

Procedia PDF Downloads 188
2885 Time and Cost Prediction Models for Language Classification Over a Large Corpus on Spark

Authors: Jairson Barbosa Rodrigues, Paulo Romero Martins Maciel, Germano Crispim Vasconcelos

Abstract:

This paper presents an investigation of the performance impacts regarding the variation of five factors (input data size, node number, cores, memory, and disks) when applying a distributed implementation of Naïve Bayes for text classification of a large Corpus on the Spark big data processing framework. Problem: The algorithm's performance depends on multiple factors, and knowing before-hand the effects of each factor becomes especially critical as hardware is priced by time slice in cloud environments. Objectives: To explain the functional relationship between factors and performance and to develop linear predictor models for time and cost. Methods: the solid statistical principles of Design of Experiments (DoE), particularly the randomized two-level fractional factorial design with replications. This research involved 48 real clusters with different hardware arrangements. The metrics were analyzed using linear models for screening, ranking, and measurement of each factor's impact. Results: Our findings include prediction models and show some non-intuitive results about the small influence of cores and the neutrality of memory and disks on total execution time, and the non-significant impact of data input scale on costs, although notably impacts the execution time.

Keywords: big data, design of experiments, distributed machine learning, natural language processing, spark

Procedia PDF Downloads 120
2884 Easymodel: Web-based Bioinformatics Software for Protein Modeling Based on Modeller

Authors: Alireza Dantism

Abstract:

Presently, describing the function of a protein sequence is one of the most common problems in biology. Usually, this problem can be facilitated by studying the three-dimensional structure of proteins. In the absence of a protein structure, comparative modeling often provides a useful three-dimensional model of the protein that is dependent on at least one known protein structure. Comparative modeling predicts the three-dimensional structure of a given protein sequence (target) mainly based on its alignment with one or more proteins of known structure (templates). Comparative modeling consists of four main steps 1. Similarity between the target sequence and at least one known template structure 2. Alignment of target sequence and template(s) 3. Build a model based on alignment with the selected template(s). 4. Prediction of model errors 5. Optimization of the built model There are many computer programs and web servers that automate the comparative modeling process. One of the most important advantages of these servers is that it makes comparative modeling available to both experts and non-experts, and they can easily do their own modeling without the need for programming knowledge, but some other experts prefer using programming knowledge and do their modeling manually because by doing this they can maximize the accuracy of their modeling. In this study, a web-based tool has been designed to predict the tertiary structure of proteins using PHP and Python programming languages. This tool is called EasyModel. EasyModel can receive, according to the user's inputs, the desired unknown sequence (which we know as the target) in this study, the protein sequence file (template), etc., which also has a percentage of similarity with the primary sequence, and its third structure Predict the unknown sequence and present the results in the form of graphs and constructed protein files.

Keywords: structural bioinformatics, protein tertiary structure prediction, modeling, comparative modeling, modeller

Procedia PDF Downloads 97
2883 Use of Front-Face Fluorescence Spectroscopy and Multiway Analysis for the Prediction of Olive Oil Quality Features

Authors: Omar Dib, Rita Yaacoub, Luc Eveleigh, Nathalie Locquet, Hussein Dib, Ali Bassal, Christophe B. Y. Cordella

Abstract:

The potential of front-face fluorescence coupled with chemometric techniques, namely parallel factor analysis (PARAFAC) and multiple linear regression (MLR) as a rapid analysis tool to characterize Lebanese virgin olive oils was investigated. Fluorescence fingerprints were acquired directly on 102 Lebanese virgin olive oil samples in the range of 280-540 nm in excitation and 280-700 nm in emission. A PARAFAC model with seven components was considered optimal with a residual of 99.64% and core consistency value of 78.65. The model revealed seven main fluorescence profiles in olive oil and was mainly associated with tocopherols, polyphenols, chlorophyllic compounds and oxidation/hydrolysis products. 23 MLR regression models based on PARAFAC scores were generated, the majority of which showed a good correlation coefficient (R > 0.7 for 12 predicted variables), thus satisfactory prediction performances. Acid values, peroxide values, and Delta K had the models with the highest predictions, with R values of 0.89, 0.84 and 0.81 respectively. Among fatty acids, linoleic and oleic acids were also highly predicted with R values of 0.8 and 0.76, respectively. Factors contributing to the model's construction were related to common fluorophores found in olive oil, mainly chlorophyll, polyphenols, and oxidation products. This study demonstrates the interest of front-face fluorescence as a promising tool for quality control of Lebanese virgin olive oils.

Keywords: front-face fluorescence, Lebanese virgin olive oils, multiple Linear regressions, PARAFAC analysis

Procedia PDF Downloads 453
2882 Deep Learning Framework for Predicting Bus Travel Times with Multiple Bus Routes: A Single-Step Multi-Station Forecasting Approach

Authors: Muhammad Ahnaf Zahin, Yaw Adu-Gyamfi

Abstract:

Bus transit is a crucial component of transportation networks, especially in urban areas. Any intelligent transportation system must have accurate real-time information on bus travel times since it minimizes waiting times for passengers at different stations along a route, improves service reliability, and significantly optimizes travel patterns. Bus agencies must enhance the quality of their information service to serve their passengers better and draw in more travelers since people waiting at bus stops are frequently anxious about when the bus will arrive at their starting point and when it will reach their destination. For solving this issue, different models have been developed for predicting bus travel times recently, but most of them are focused on smaller road networks due to their relatively subpar performance in high-density urban areas on a vast network. This paper develops a deep learning-based architecture using a single-step multi-station forecasting approach to predict average bus travel times for numerous routes, stops, and trips on a large-scale network using heterogeneous bus transit data collected from the GTFS database. Over one week, data was gathered from multiple bus routes in Saint Louis, Missouri. In this study, Gated Recurrent Unit (GRU) neural network was followed to predict the mean vehicle travel times for different hours of the day for multiple stations along multiple routes. Historical time steps and prediction horizon were set up to 5 and 1, respectively, which means that five hours of historical average travel time data were used to predict average travel time for the following hour. The spatial and temporal information and the historical average travel times were captured from the dataset for model input parameters. As adjacency matrices for the spatial input parameters, the station distances and sequence numbers were used, and the time of day (hour) was considered for the temporal inputs. Other inputs, including volatility information such as standard deviation and variance of journey durations, were also included in the model to make it more robust. The model's performance was evaluated based on a metric called mean absolute percentage error (MAPE). The observed prediction errors for various routes, trips, and stations remained consistent throughout the day. The results showed that the developed model could predict travel times more accurately during peak traffic hours, having a MAPE of around 14%, and performed less accurately during the latter part of the day. In the context of a complicated transportation network in high-density urban areas, the model showed its applicability for real-time travel time prediction of public transportation and ensured the high quality of the predictions generated by the model.

Keywords: gated recurrent unit, mean absolute percentage error, single-step forecasting, travel time prediction.

Procedia PDF Downloads 72
2881 Simulation of Glass Breakage Using Voronoi Random Field Tessellations

Authors: Michael A. Kraus, Navid Pourmoghaddam, Martin Botz, Jens Schneider, Geralt Siebert

Abstract:

Fragmentation analysis of tempered glass gives insight into the quality of the tempering process and defines a certain degree of safety as well. Different standard such as the European EN 12150-1 or the American ASTM C 1048/CPSC 16 CFR 1201 define a minimum number of fragments required for soda-lime safety glass on the basis of fragmentation test results for classification. This work presents an approach for the glass breakage pattern prediction using a Voronoi Tesselation over Random Fields. The random Voronoi tessellation is trained with and validated against data from several breakage patterns. The fragments in observation areas of 50 mm x 50 mm were used for training and validation. All glass specimen used in this study were commercially available soda-lime glasses at three different thicknesses levels of 4 mm, 8 mm and 12 mm. The results of this work form a Bayesian framework for the training and prediction of breakage patterns of tempered soda-lime glass using a Voronoi Random Field Tesselation. Uncertainties occurring in this process can be well quantified, and several statistical measures of the pattern can be preservation with this method. Within this work it was found, that different Random Fields as basis for the Voronoi Tesselation lead to differently well fitted statistical properties of the glass breakage patterns. As the methodology is derived and kept general, the framework could be also applied to other random tesselations and crack pattern modelling purposes.

Keywords: glass breakage predicition, Voronoi Random Field Tessellation, fragmentation analysis, Bayesian parameter identification

Procedia PDF Downloads 160
2880 Behavior of Beam-Column Nodes Reinforced Concrete in Earthquake Zones

Authors: Zaidour Mohamed, Ghalem Ali Jr., Achit Henni Mohamed

Abstract:

This project is destined to study pole junctions of reinforced concrete beams subjected to seismic loads. A literature review was made to clarify the work done by researchers in the last three decades and especially the results of the last two years that were studied for the determination of the method of calculating the transverse reinforcement in the different nodes of a structure. For implementation efforts in the columns and beams of a building R + 4 in zone 3 were calculated using the finite element method through software. These results are the basis of our work which led to the calculation of the transverse reinforcement of the nodes of the structure in question.

Keywords: beam–column joints, cyclic loading, shearing force, damaged joint

Procedia PDF Downloads 550
2879 Artificial Neural Network in Ultra-High Precision Grinding of Borosilicate-Crown Glass

Authors: Goodness Onwuka, Khaled Abou-El-Hossein

Abstract:

Borosilicate-crown (BK7) glass has found broad application in the optic and automotive industries and the growing demands for nanometric surface finishes is becoming a necessity in such applications. Thus, it has become paramount to optimize the parameters influencing the surface roughness of this precision lens. The research was carried out on a 4-axes Nanoform 250 precision lathe machine with an ultra-high precision grinding spindle. The experiment varied the machining parameters of feed rate, wheel speed and depth of cut at three levels for different combinations using Box Behnken design of experiment and the resulting surface roughness values were measured using a Taylor Hobson Dimension XL optical profiler. Acoustic emission monitoring technique was applied at a high sampling rate to monitor the machining process while further signal processing and feature extraction methods were implemented to generate the input to a neural network algorithm. This paper highlights the training and development of a back propagation neural network prediction algorithm through careful selection of parameters and the result show a better classification accuracy when compared to a previously developed response surface model with very similar machining parameters. Hence artificial neural network algorithms provide better surface roughness prediction accuracy in the ultra-high precision grinding of BK7 glass.

Keywords: acoustic emission technique, artificial neural network, surface roughness, ultra-high precision grinding

Procedia PDF Downloads 305
2878 A Machine Learning Approach for Performance Prediction Based on User Behavioral Factors in E-Learning Environments

Authors: Naduni Ranasinghe

Abstract:

E-learning environments are getting more popular than any other due to the impact of COVID19. Even though e-learning is one of the best solutions for the teaching-learning process in the academic process, it’s not without major challenges. Nowadays, machine learning approaches are utilized in the analysis of how behavioral factors lead to better adoption and how they related to better performance of the students in eLearning environments. During the pandemic, we realized the academic process in the eLearning approach had a major issue, especially for the performance of the students. Therefore, an approach that investigates student behaviors in eLearning environments using a data-intensive machine learning approach is appreciated. A hybrid approach was used to understand how each previously told variables are related to the other. A more quantitative approach was used referred to literature to understand the weights of each factor for adoption and in terms of performance. The data set was collected from previously done research to help the training and testing process in ML. Special attention was made to incorporating different dimensionality of the data to understand the dependency levels of each. Five independent variables out of twelve variables were chosen based on their impact on the dependent variable, and by considering the descriptive statistics, out of three models developed (Random Forest classifier, SVM, and Decision tree classifier), random forest Classifier (Accuracy – 0.8542) gave the highest value for accuracy. Overall, this work met its goals of improving student performance by identifying students who are at-risk and dropout, emphasizing the necessity of using both static and dynamic data.

Keywords: academic performance prediction, e learning, learning analytics, machine learning, predictive model

Procedia PDF Downloads 157
2877 Effect of Mach Number for Gust-Airfoil Interatcion Noise

Authors: ShuJiang Jiang

Abstract:

The interaction of turbulence with airfoil is an important noise source in many engineering fields, including helicopters, turbofan, and contra-rotating open rotor engines, where turbulence generated in the wake of upstream blades interacts with the leading edge of downstream blades and produces aerodynamic noise. One approach to study turbulence-airfoil interaction noise is to model the oncoming turbulence as harmonic gusts. A compact noise source produces a dipole-like sound directivity pattern. However, when the acoustic wavelength is much smaller than the airfoil chord length, the airfoil needs to be treated as a non-compact source, and the gust-airfoil interaction becomes more complicated and results in multiple lobes generated in the radiated sound directivity. Capturing the short acoustic wavelength is a challenge for numerical simulations. In this work, simulations are performed for gust-airfoil interaction at different Mach numbers, using a high-fidelity direct Computational AeroAcoustic (CAA) approach based on a spectral/hp element method, verified by a CAA benchmark case. It is found that the squared sound pressure varies approximately as the 5th power of Mach number, which changes slightly with the observer location. This scaling law can give a better sound prediction than the flat-plate theory for thicker airfoils. Besides, another prediction method, based on the flat-plate theory and CAA simulation, has been proposed to give better predictions than the scaling law for thicker airfoils.

Keywords: aeroacoustics, gust-airfoil interaction, CFD, CAA

Procedia PDF Downloads 78
2876 Creep Analysis and Rupture Evaluation of High Temperature Materials

Authors: Yuexi Xiong, Jingwu He

Abstract:

The structural components in an energy facility such as steam turbine machines are operated under high stress and elevated temperature in an endured time period and thus the creep deformation and creep rupture failure are important issues that need to be addressed in the design of such components. There are numerous creep models being used for creep analysis that have both advantages and disadvantages in terms of accuracy and efficiency. The Isochronous Creep Analysis is one of the simplified approaches in which a full-time dependent creep analysis is avoided and instead an elastic-plastic analysis is conducted at each time point. This approach has been established based on the rupture dependent creep equations using the well-known Larson-Miller parameter. In this paper, some fundamental aspects of creep deformation and the rupture dependent creep models are reviewed and the analysis procedures using isochronous creep curves are discussed. Four rupture failure criteria are examined from creep fundamental perspectives including criteria of Stress Damage, Strain Damage, Strain Rate Damage, and Strain Capability. The accuracy of these criteria in predicting creep life is discussed and applications of the creep analysis procedures and failure predictions of simple models will be presented. In addition, a new failure criterion is proposed to improve the accuracy and effectiveness of the existing criteria. Comparisons are made between the existing criteria and the new one using several examples materials. Both strain increase and stress relaxation form a full picture of the creep behaviour of a material under high temperature in an endured time period. It is important to bear this in mind when dealing with creep problems. Accordingly there are two sets of rupture dependent creep equations. While the rupture strength vs LMP equation shows how the rupture time depends on the stress level under load controlled condition, the strain rate vs rupture time equation reflects how the rupture time behaves under strain-controlled condition. Among the four existing failure criteria for rupture life predictions, the Stress Damage and Strain Damage Criteria provide the most conservative and non-conservative predictions, respectively. The Strain Rate and Strain Capability Criteria provide predictions in between that are believed to be more accurate because the strain rate and strain capability are more determined quantities than stress to reflect the creep rupture behaviour. A modified Strain Capability Criterion is proposed making use of the two sets of creep equations and therefore is considered to be more accurate than the original Strain Capability Criterion.

Keywords: creep analysis, high temperature mateials, rapture evalution, steam turbine machines

Procedia PDF Downloads 290
2875 Chemical Reaction, Heat and Mass Transfer on Unsteady MHD Flow along a Vertical Stretching Sheet with Heat Generation/Absorption and Variable Viscosity

Authors: Jatindra Lahkar

Abstract:

The effect of chemical reaction on laminar mixed convection flow and heat and mass transfer along a vertical unsteady stretching sheet is investigated, in the presence of heat generation/absorption with variable viscosity and viscous dissipation. The governing non-linear partial differential equations are reduced to ordinary differential equations using similarity transformation and solved numerically using the fourth order Runge-Kutta method along with shooting technique. The effects of various flow parameters on the velocity, temperature and concentration distributions are analyzed and presented graphically. Skin-friction coefficient, Nusselt number and Sherwood number are derived at the sheet. It is observed that the influence of chemical reaction, the fluid flow along the sheet accelerate with the increase of chemical reaction parameter, on the other hand, temperature of the fluid increases with increase of chemical reaction parameter but concentration of the fluid reduces with it. The boundary layer decreases on the surface of the sheet for all values of unsteadiness parameter, increasing values of the chemical reaction parameter. The increases in the values of Sc cause the species concentration and its boundary layer thickness to decrease resulting in less induced flow and higher fluid temperatures. This is depicted in the decreases in the velocity and species concentration and increases in the fluid temperature as Sc increases.

Keywords: chemical reaction, heat generation/absorption, magnetic number, unsteadiness, variable viscosity

Procedia PDF Downloads 307
2874 An Object-Oriented Modelica Model of the Water Level Swell during Depressurization of the Reactor Pressure Vessel of the Boiling Water Reactor

Authors: Rafal Bryk, Holger Schmidt, Thomas Mull, Ingo Ganzmann, Oliver Herbst

Abstract:

Prediction of the two-phase water mixture level during fast depressurization of the Reactor Pressure Vessel (RPV) resulting from an accident scenario is an important issue from the view point of the reactor safety. Since the level swell may influence the behavior of some passive safety systems, it has been recognized that an assumption which at the beginning may be considered as a conservative one, not necessary leads to a conservative result. This paper discusses outcomes obtained during simulations of the water dynamics and heat transfer during sudden depressurization of a vessel filled up to a certain level with liquid water under saturation conditions and with the rest of the vessel occupied by saturated steam. In case of the pressure decrease e.g. due to the main steam line break, the liquid water evaporates abruptly, being a reason thereby, of strong transients in the vessel. These transients and the sudden emergence of void in the region occupied at the beginning by liquid, cause elevation of the two-phase mixture. In this work, several models calculating the water collapse and swell levels are presented and validated against experimental data. Each of the models uses different approach to calculate void fraction. The object-oriented models were developed with the Modelica modelling language and the OpenModelica environment. The models represent the RPV of the Integral Test Facility Karlstein (INKA) – a dedicated test rig for simulation of KERENA – a new Boiling Water Reactor design of Framatome. The models are based on dynamic mass and energy equations. They are divided into several dynamic volumes in each of which, the fluid may be single-phase liquid, steam or a two-phase mixture. The heat transfer between the wall of the vessel and the fluid is taken into account. Additional heat flow rate may be applied to the first volume of the vessel in order to simulate the decay heat of the reactor core in a similar manner as it is simulated at INKA. The comparison of the simulations results against the reference data shows a good agreement.

Keywords: boiling water reactor, level swell, Modelica, RPV depressurization, thermal-hydraulics

Procedia PDF Downloads 210
2873 A Prediction Method of Pollutants Distribution Pattern: Flare Motion Using Computational Fluid Dynamics (CFD) Fluent Model with Weather Research Forecast Input Model during Transition Season

Authors: Benedictus Asriparusa, Lathifah Al Hakimi, Aulia Husada

Abstract:

A large amount of energy is being wasted by the release of natural gas associated with the oil industry. This release interrupts the environment particularly atmosphere layer condition globally which contributes to global warming impact. This research presents an overview of the methods employed by researchers in PT. Chevron Pacific Indonesia in the Minas area to determine a new prediction method of measuring and reducing gas flaring and its emission. The method emphasizes advanced research which involved analytical studies, numerical studies, modeling, and computer simulations, amongst other techniques. A flaring system is the controlled burning of natural gas in the course of routine oil and gas production operations. This burning occurs at the end of a flare stack or boom. The combustion process releases emissions of greenhouse gases such as NO2, CO2, SO2, etc. This condition will affect the chemical composition of air and environment around the boundary layer mainly during transition season. Transition season in Indonesia is absolutely very difficult condition to predict its pattern caused by the difference of two air mass conditions. This paper research focused on transition season in 2013. A simulation to create the new pattern of the pollutants distribution is needed. This paper has outlines trends in gas flaring modeling and current developments to predict the dominant variables in the pollutants distribution. A Fluent model is used to simulate the distribution of pollutants gas coming out of the stack, whereas WRF model output is used to overcome the limitations of the analysis of meteorological data and atmospheric conditions in the study area. Based on the running model, the most influence factor was wind speed. The goal of the simulation is to predict the new pattern based on the time of fastest wind and slowest wind occurs for pollutants distribution. According to the simulation results, it can be seen that the fastest wind (last of March) moves pollutants in a horizontal direction and the slowest wind (middle of May) moves pollutants vertically. Besides, the design of flare stack in compliance according to EPA Oil and Gas Facility Stack Parameters likely shows pollutants concentration remains on the under threshold NAAQS (National Ambient Air Quality Standards).

Keywords: flare motion, new prediction, pollutants distribution, transition season, WRF model

Procedia PDF Downloads 556
2872 Improved Soil and Snow Treatment with the Rapid Update Cycle Land-Surface Model for Regional and Global Weather Predictions

Authors: Tatiana G. Smirnova, Stan G. Benjamin

Abstract:

Rapid Update Cycle (RUC) land surface model (LSM) was a land-surface component in several generations of operational weather prediction models at the National Center for Environment Prediction (NCEP) at the National Oceanic and Atmospheric Administration (NOAA). It was designed for short-range weather predictions with an emphasis on severe weather and originally was intentionally simple to avoid uncertainties from poorly known parameters. Nevertheless, the RUC LSM, when coupled with the hourly-assimilating atmospheric model, can produce a realistic evolution of time-varying soil moisture and temperature, as well as the evolution of snow cover on the ground surface. This result is possible only if the soil/vegetation/snow component of the coupled weather prediction model has sufficient skill to avoid long-term drift. RUC LSM was first implemented in the operational NCEP Rapid Update Cycle (RUC) weather model in 1998 and later in the Weather Research Forecasting Model (WRF)-based Rapid Refresh (RAP) and High-resolution Rapid Refresh (HRRR). Being available to the international WRF community, it was implemented in operational weather models in Austria, New Zealand, and Switzerland. Based on the feedback from the US weather service offices and the international WRF community and also based on our own validation, RUC LSM has matured over the years. Also, a sea-ice module was added to RUC LSM for surface predictions over the Arctic sea-ice. Other modifications include refinements to the snow model and a more accurate specification of albedo, roughness length, and other surface properties. At present, RUC LSM is being tested in the regional application of the Unified Forecast System (UFS). The next generation UFS-based regional Rapid Refresh FV3 Standalone (RRFS) model will replace operational RAP and HRRR at NCEP. Over time, RUC LSM participated in several international model intercomparison projects to verify its skill using observed atmospheric forcing. The ESM-SnowMIP was the last of these experiments focused on the verification of snow models for open and forested regions. The simulations were performed for ten sites located in different climatic zones of the world forced with observed atmospheric conditions. While most of the 26 participating models have more sophisticated snow parameterizations than in RUC, RUC LSM got a high ranking in simulations of both snow water equivalent and surface temperature. However, ESM-SnowMIP experiment also revealed some issues in the RUC snow model, which will be addressed in this paper. One of them is the treatment of grid cells partially covered with snow. RUC snow module computes energy and moisture budgets of snow-covered and snow-free areas separately by aggregating the solutions at the end of each time step. Such treatment elevates the importance of computing in the model snow cover fraction. Improvements to the original simplistic threshold-based approach have been implemented and tested both offline and in the coupled weather model. The detailed description of changes to the snow cover fraction and other modifications to RUC soil and snow parameterizations will be described in this paper.

Keywords: land-surface models, weather prediction, hydrology, boundary-layer processes

Procedia PDF Downloads 88
2871 Two-Dimensional Observation of Oil Displacement by Water in a Petroleum Reservoir through Numerical Simulation and Application to a Petroleum Reservoir

Authors: Ahmad Fahim Nasiry, Shigeo Honma

Abstract:

We examine two-dimensional oil displacement by water in a petroleum reservoir. The pore fluid is immiscible, and the porous media is homogenous and isotropic in the horizontal direction. Buckley-Leverett theory and a combination of Laplacian and Darcy’s law are used to study the fluid flow through porous media, and the Laplacian that defines the dispersion and diffusion of fluid in the sand using heavy oil is discussed. The reservoir is homogenous in the horizontal direction, as expressed by the partial differential equation. Two main factors which are observed are the water saturation and pressure distribution in the reservoir, and they are evaluated for predicting oil recovery in two dimensions by a physical and mathematical simulation model. We review the numerical simulation that solves difficult partial differential reservoir equations. Based on the numerical simulations, the saturation and pressure equations are calculated by the iterative alternating direction implicit method and the iterative alternating direction explicit method, respectively, according to the finite difference assumption. However, to understand the displacement of oil by water and the amount of water dispersion in the reservoir better, an interpolated contour line of the water distribution of the five-spot pattern, that provides an approximate solution which agrees well with the experimental results, is also presented. Finally, a computer program is developed to calculate the equation for pressure and water saturation and to draw the pressure contour line and water distribution contour line for the reservoir.

Keywords: numerical simulation, immiscible, finite difference, IADI, IDE, waterflooding

Procedia PDF Downloads 331
2870 Big Data in Telecom Industry: Effective Predictive Techniques on Call Detail Records

Authors: Sara ElElimy, Samir Moustafa

Abstract:

Mobile network operators start to face many challenges in the digital era, especially with high demands from customers. Since mobile network operators are considered a source of big data, traditional techniques are not effective with new era of big data, Internet of things (IoT) and 5G; as a result, handling effectively different big datasets becomes a vital task for operators with the continuous growth of data and moving from long term evolution (LTE) to 5G. So, there is an urgent need for effective Big data analytics to predict future demands, traffic, and network performance to full fill the requirements of the fifth generation of mobile network technology. In this paper, we introduce data science techniques using machine learning and deep learning algorithms: the autoregressive integrated moving average (ARIMA), Bayesian-based curve fitting, and recurrent neural network (RNN) are employed for a data-driven application to mobile network operators. The main framework included in models are identification parameters of each model, estimation, prediction, and final data-driven application of this prediction from business and network performance applications. These models are applied to Telecom Italia Big Data challenge call detail records (CDRs) datasets. The performance of these models is found out using a specific well-known evaluation criteria shows that ARIMA (machine learning-based model) is more accurate as a predictive model in such a dataset than the RNN (deep learning model).

Keywords: big data analytics, machine learning, CDRs, 5G

Procedia PDF Downloads 139
2869 Predicting Costs in Construction Projects with Machine Learning: A Detailed Study Based on Activity-Level Data

Authors: Soheila Sadeghi

Abstract:

Construction projects are complex and often subject to significant cost overruns due to the multifaceted nature of the activities involved. Accurate cost estimation is crucial for effective budget planning and resource allocation. Traditional methods for predicting overruns often rely on expert judgment or analysis of historical data, which can be time-consuming, subjective, and may fail to consider important factors. However, with the increasing availability of data from construction projects, machine learning techniques can be leveraged to improve the accuracy of overrun predictions. This study applied machine learning algorithms to enhance the prediction of cost overruns in a case study of a construction project. The methodology involved the development and evaluation of two machine learning models: Random Forest and Neural Networks. Random Forest can handle high-dimensional data, capture complex relationships, and provide feature importance estimates. Neural Networks, particularly Deep Neural Networks (DNNs), are capable of automatically learning and modeling complex, non-linear relationships between input features and the target variable. These models can adapt to new data, reduce human bias, and uncover hidden patterns in the dataset. The findings of this study demonstrate that both Random Forest and Neural Networks can significantly improve the accuracy of cost overrun predictions compared to traditional methods. The Random Forest model also identified key cost drivers and risk factors, such as changes in the scope of work and delays in material delivery, which can inform better project risk management. However, the study acknowledges several limitations. First, the findings are based on a single construction project, which may limit the generalizability of the results to other projects or contexts. Second, the dataset, although comprehensive, may not capture all relevant factors influencing cost overruns, such as external economic conditions or political factors. Third, the study focuses primarily on cost overruns, while schedule overruns are not explicitly addressed. Future research should explore the application of machine learning techniques to a broader range of projects, incorporate additional data sources, and investigate the prediction of both cost and schedule overruns simultaneously.

Keywords: cost prediction, machine learning, project management, random forest, neural networks

Procedia PDF Downloads 56
2868 Effective Stacking of Deep Neural Models for Automated Object Recognition in Retail Stores

Authors: Ankit Sinha, Soham Banerjee, Pratik Chattopadhyay

Abstract:

Automated product recognition in retail stores is an important real-world application in the domain of Computer Vision and Pattern Recognition. In this paper, we consider the problem of automatically identifying the classes of the products placed on racks in retail stores from an image of the rack and information about the query/product images. We improve upon the existing approaches in terms of effectiveness and memory requirement by developing a two-stage object detection and recognition pipeline comprising of a Faster-RCNN-based object localizer that detects the object regions in the rack image and a ResNet-18-based image encoder that classifies the detected regions into the appropriate classes. Each of the models is fine-tuned using appropriate data sets for better prediction and data augmentation is performed on each query image to prepare an extensive gallery set for fine-tuning the ResNet-18-based product recognition model. This encoder is trained using a triplet loss function following the strategy of online-hard-negative-mining for improved prediction. The proposed models are lightweight and can be connected in an end-to-end manner during deployment to automatically identify each product object placed in a rack image. Extensive experiments using Grozi-32k and GP-180 data sets verify the effectiveness of the proposed model.

Keywords: retail stores, faster-RCNN, object localization, ResNet-18, triplet loss, data augmentation, product recognition

Procedia PDF Downloads 156
2867 Feature Analysis of Predictive Maintenance Models

Authors: Zhaoan Wang

Abstract:

Research in predictive maintenance modeling has improved in the recent years to predict failures and needed maintenance with high accuracy, saving cost and improving manufacturing efficiency. However, classic prediction models provide little valuable insight towards the most important features contributing to the failure. By analyzing and quantifying feature importance in predictive maintenance models, cost saving can be optimized based on business goals. First, multiple classifiers are evaluated with cross-validation to predict the multi-class of failures. Second, predictive performance with features provided by different feature selection algorithms are further analyzed. Third, features selected by different algorithms are ranked and combined based on their predictive power. Finally, linear explainer SHAP (SHapley Additive exPlanations) is applied to interpret classifier behavior and provide further insight towards the specific roles of features in both local predictions and global model behavior. The results of the experiments suggest that certain features play dominant roles in predictive models while others have significantly less impact on the overall performance. Moreover, for multi-class prediction of machine failures, the most important features vary with type of machine failures. The results may lead to improved productivity and cost saving by prioritizing sensor deployment, data collection, and data processing of more important features over less importance features.

Keywords: automated supply chain, intelligent manufacturing, predictive maintenance machine learning, feature engineering, model interpretation

Procedia PDF Downloads 133
2866 Non-Linear Assessment of Chromatographic Lipophilicity and Model Ranking of Newly Synthesized Steroid Derivatives

Authors: Milica Karadzic, Lidija Jevric, Sanja Podunavac-Kuzmanovic, Strahinja Kovacevic, Anamarija Mandic, Katarina Penov Gasi, Marija Sakac, Aleksandar Okljesa, Andrea Nikolic

Abstract:

The present paper deals with chromatographic lipophilicity prediction of newly synthesized steroid derivatives. The prediction was achieved using in silico generated molecular descriptors and quantitative structure-retention relationship (QSRR) methodology with the artificial neural networks (ANN) approach. Chromatographic lipophilicity of the investigated compounds was expressed as retention factor value logk. For QSRR modeling, a feedforward back-propagation ANN with gradient descent learning algorithm was applied. Using the novel sum of ranking differences (SRD) method generated ANN models were ranked. The aim was to distinguish the most consistent QSRR model that can be found, and similarity or dissimilarity between the models that could be noticed. In this study, SRD was performed with average values of retention factor value logk as reference values. An excellent correlation between experimentally observed retention factor value logk and values predicted by the ANN was obtained with a correlation coefficient higher than 0.9890. Statistical results show that the established ANN models can be applied for required purpose. This article is based upon work from COST Action (TD1305), supported by COST (European Cooperation in Science and Technology).

Keywords: artificial neural networks, liquid chromatography, molecular descriptors, steroids, sum of ranking differences

Procedia PDF Downloads 319
2865 Prediction for DC-AC PWM Inverters DC Pulsed Current Sharing from Passive Parallel Battery-Supercapacitor Energy Storage Systems

Authors: Andreas Helwig, John Bell, Wangmo

Abstract:

Hybrid energy storage systems (HESS) are gaining popularity for grid energy storage (ESS) driven by the increasingly dynamic nature of energy demands, requiring both high energy and high power density. Particularly the ability of energy storage systems via inverters to respond to increasing fluctuation in energy demands, the combination of lithium Iron Phosphate (LFP) battery and supercapacitor (SC) is a particular example of complex electro-chemical devices that may provide benefit to each other for pulse width modulated DC to AC inverter application. This is due to SC’s ability to respond to instantaneous, high-current demands and batteries' long-term energy delivery. However, there is a knowledge gap on the current sharing mechanism within a HESS supplying a load powered by high-frequency pulse-width modulation (PWM) switching to understand the mechanism of aging in such HESS. This paper investigates the prediction of current utilizing various equivalent circuits for SC to investigate sharing between battery and SC in MATLAB/Simulink simulation environment. The findings predict a significant reduction of battery current when the battery is used in a hybrid combination with a supercapacitor as compared to a battery-only model. The impact of PWM inverter carrier switching frequency on current requirements was analyzed between 500Hz and 31kHz. While no clear trend emerged, models predicted optimal frequencies for minimized current needs.

Keywords: hybrid energy storage, carrier frequency, PWM switching, equivalent circuit models

Procedia PDF Downloads 26
2864 Hansen Solubility Parameter from Surface Measurements

Authors: Neveen AlQasas, Daniel Johnson

Abstract:

Membranes for water treatment are an established technology that attracts great attention due to its simplicity and cost effectiveness. However, membranes in operation suffer from the adverse effect of membrane fouling. Bio-fouling is a phenomenon that occurs at the water-membrane interface, and is a dynamic process that is initiated by the adsorption of dissolved organic material, including biomacromolecules, on the membrane surface. After initiation, attachment of microorganisms occurs, followed by biofilm growth. The biofilm blocks the pores of the membrane and consequently results in reducing the water flux. Moreover, the presence of a fouling layer can have a substantial impact on the membrane separation properties. Understanding the mechanism of the initiation phase of biofouling is a key point in eliminating the biofouling on membrane surfaces. The adhesion and attachment of different fouling materials is affected by the surface properties of the membrane materials. Therefore, surface properties of different polymeric materials had been studied in terms of their surface energies and Hansen solubility parameters (HSP). The difference between the combined HSP parameters (HSP distance) allows prediction of the affinity of two materials to each other. The possibilities of measuring the HSP of different polymer films via surface measurements, such as contact angle has been thoroughly investigated. Knowing the HSP of a membrane material and the HSP of a specific foulant, facilitate the estimation of the HSP distance between the two, and therefore the strength of attachment to the surface. Contact angle measurements using fourteen different solvents on five different polymeric films were carried out using the sessile drop method. Solvents were ranked as good or bad solvents using different ranking method and ranking was used to calculate the HSP of each polymeric film. Results clearly indicate the absence of a direct relation between contact angle values of each film and the HSP distance between each polymer film and the solvents used. Therefore, estimating HSP via contact angle alone is not sufficient. However, it was found if the surface tensions and viscosities of the used solvents are taken in to the account in the analysis of the contact angle values, a prediction of the HSP from contact angle measurements is possible. This was carried out via training of a neural network model. The trained neural network model has three inputs, contact angle value, surface tension and viscosity of solvent used. The model is able to predict the HSP distance between the used solvent and the tested polymer (material). The HSP distance prediction is further used to estimate the total and individual HSP parameters of each tested material. The results showed an accuracy of about 90% for all the five studied films

Keywords: surface characterization, hansen solubility parameter estimation, contact angle measurements, artificial neural network model, surface measurements

Procedia PDF Downloads 94
2863 Aeroacoustics Investigations of Unsteady 3D Airfoil for Different Angle Using Computational Fluid Dynamics Software

Authors: Haydar Kepekçi, Baha Zafer, Hasan Rıza Güven

Abstract:

Noise disturbance is one of the major factors considered in the fast development of aircraft technology. This paper reviews the flow field, which is examined on the 2D NACA0015 and 3D NACA0012 blade profile using SST k-ω turbulence model to compute the unsteady flow field. We inserted the time-dependent flow area variables in Ffowcs-Williams and Hawkings (FW-H) equations as an input and Sound Pressure Level (SPL) values will be computed for different angles of attack (AoA) from the microphone which is positioned in the computational domain to investigate effect of augmentation of unsteady 2D and 3D airfoil region noise level. The computed results will be compared with experimental data which are available in the open literature. As results; one of the calculated Cp is slightly lower than the experimental value. This difference could be due to the higher Reynolds number of the experimental data. The ANSYS Fluent software was used in this study. Fluent includes well-validated physical modeling capabilities to deliver fast, accurate results across the widest range of CFD and multiphysics applications. This paper includes a study which is on external flow over an airfoil. The case of 2D NACA0015 has approximately 7 million elements and solves compressible fluid flow with heat transfer using the SST turbulence model. The other case of 3D NACA0012 has approximately 3 million elements.

Keywords: 3D blade profile, noise disturbance, aeroacoustics, Ffowcs-Williams and Hawkings (FW-H) equations, k-ω-SST turbulence model

Procedia PDF Downloads 212
2862 Vibration Analysis of Magnetostrictive Nano-Plate by Using Modified Couple Stress and Nonlocal Elasticity Theories

Authors: Hamed Khani Arani, Mohammad Shariyat, Armaghan Mohammadian

Abstract:

In the present study, the free vibration of magnetostrictive nano-plate (MsNP) resting on the Pasternak foundation is investigated. Firstly, the modified couple stress (MCS) and nonlocal elasticity theories are compared together and taken into account to consider the small scale effects; in this paper not only two theories are analyzed but also it improves the MCS theory is more accurate than nonlocal elasticity theory in such problems. A feedback control system is utilized to investigate the effects of a magnetic field. First-order shear deformation theory (FSDT), Hamilton’s principle and energy method are utilized in order to drive the equations of motion and these equations are solved by differential quadrature method (DQM) for simply supported boundary conditions. The MsNP undergoes in-plane forces in x and y directions. In this regard, the dimensionless frequency is plotted to study the effects of small scale parameter, magnetic field, aspect ratio, thickness ratio and compression and tension loads. Results indicate that these parameters play a key role on the natural frequency. According to the above results, MsNP can be used in the communications equipment, smart control vibration of nanostructure especially in sensor and actuators such as wireless linear micro motor and smart nano valves in injectors.

Keywords: feedback control system, magnetostrictive nano-plate, modified couple stress theory, nonlocal elasticity theory, vibration analysis

Procedia PDF Downloads 135
2861 A Comparison Between Different Discretization Techniques for the Doyle-Fuller-Newman Li+ Battery Model

Authors: Davide Gotti, Milan Prodanovic, Sergio Pinilla, David Muñoz-Torrero

Abstract:

Since its proposal, the Doyle-Fuller-Newman (DFN) lithium-ion battery model has gained popularity in the electrochemical field. In fact, this model provides the user with theoretical support for designing the lithium-ion battery parameters, such as the material particle or the diffusion coefficient adjustment direction. However, the model is mathematically complex as it is composed of several partial differential equations (PDEs) such as Fick’s law of diffusion, the MacInnes and Ohm’s equations, among other phenomena. Thus, to efficiently use the model in a time-domain simulation environment, the selection of the discretization technique is of a pivotal importance. There are several numerical methods available in the literature that can be used to carry out this task. In this study, a comparison between the explicit Euler, Crank-Nicolson, and Chebyshev discretization methods is proposed. These three methods are compared in terms of accuracy, stability, and computational times. Firstly, the explicit Euler discretization technique is analyzed. This method is straightforward to implement and is computationally fast. In this work, the accuracy of the method and its stability properties are shown for the electrolyte diffusion partial differential equation. Subsequently, the Crank-Nicolson method is considered. It represents a combination of the implicit and explicit Euler methods that has the advantage of being of the second order in time and is intrinsically stable, thus overcoming the disadvantages of the simpler Euler explicit method. As shown in the full paper, the Crank-Nicolson method provides accurate results when applied to the DFN model. Its stability does not depend on the integration time step, thus it is feasible for both short- and long-term tests. This last remark is particularly important as this discretization technique would allow the user to implement parameter estimation and optimization techniques such as system or genetic parameter identification methods using this model. Finally, the Chebyshev discretization technique is implemented in the DFN model. This discretization method features swift convergence properties and, as other spectral methods used to solve differential equations, achieves the same accuracy with a smaller number of discretization nodes. However, as shown in the literature, these methods are not suitable for handling sharp gradients, which are common during the first instants of the charge and discharge phases of the battery. The numerical results obtained and presented in this study aim to provide the guidelines on how to select the adequate discretization technique for the DFN model according to the type of application to be performed, highlighting the pros and cons of the three methods. Specifically, the non-eligibility of the simple Euler method for longterm tests will be presented. Afterwards, the Crank-Nicolson and the Chebyshev discretization methods will be compared in terms of accuracy and computational times under a wide range of battery operating scenarios. These include both long-term simulations for aging tests, and short- and mid-term battery charge/discharge cycles, typically relevant in battery applications like grid primary frequency and inertia control and electrical vehicle breaking and acceleration.

Keywords: Doyle-Fuller-Newman battery model, partial differential equations, discretization, numerical methods

Procedia PDF Downloads 23
2860 Policy Effectiveness in the Situation of Economic Recession

Authors: S. K. Ashiquer Rahman

Abstract:

The proper policy handling might not able to attain the target since some of recessions, e.g., pandemic-led crises, the variables shocks of the economics. At the level of this situation, the Central bank implements the monetary policy to choose increase the exogenous expenditure and level of money supply consecutively for booster level economic growth, whether the monetary policy is relatively more effective than fiscal policy in altering real output growth of a country or both stand for relatively effective in the direction of output growth of a country. The dispute with reference to the relationship between the monetary policy and fiscal policy is centered on the inflationary penalty of the shortfall financing by the fiscal authority. The latest variables socks of economics as well as the pandemic-led crises, central banks around the world predicted just about a general dilemma in relation to increase rates to face the or decrease rates to sustain the economic movement. Whether the prices hang about fundamentally unaffected, the aggregate demand has also been hold a significantly negative attitude by the outbreak COVID-19 pandemic. To empirically investigate the effects of economics shocks associated COVID-19 pandemic, the paper considers the effectiveness of the monetary policy and fiscal policy that linked to the adjustment mechanism of different economic variables. To examine the effects of economics shock associated COVID-19 pandemic towards the effectiveness of Monetary Policy and Fiscal Policy in the direction of output growth of a Country, this paper uses the Simultaneous equations model under the estimation of Two-Stage Least Squares (2SLS) and Ordinary Least Squares (OLS) Method.

Keywords: IS-LM framework, pandemic. Economics variables shocks, simultaneous equations model, output growth

Procedia PDF Downloads 95
2859 Study of the Persian Gulf’s and Oman Sea’s Numerical Tidal Currents

Authors: Fatemeh Sadat Sharifi

Abstract:

In this research, a barotropic model was employed to consider the tidal studies in the Persian Gulf and Oman Sea, where the only sufficient force was the tidal force. To do that, a finite-difference, free-surface model called Regional Ocean Modeling System (ROMS), was employed on the data over the Persian Gulf and Oman Sea. To analyze flow patterns of the region, the results of limited size model of The Finite Volume Community Ocean Model (FVCOM) were appropriated. The two points were determined since both are one of the most critical water body in case of the economy, biology, fishery, Shipping, navigation, and petroleum extraction. The OSU Tidal Prediction Software (OTPS) tide and observation data validated the modeled result. Next, tidal elevation and speed, and tidal analysis were interpreted. Preliminary results determine a significant accuracy in the tidal height compared with observation and OTPS data, declaring that tidal currents are highest in Hormuz Strait and the narrow and shallow region between Iranian coasts and Islands. Furthermore, tidal analysis clarifies that the M_2 component has the most significant value. Finally, the Persian Gulf tidal currents are divided into two branches: the first branch converts from south to Qatar and via United Arab Emirate rotates to Hormuz Strait. The secondary branch, in north and west, extends up to the highest point in the Persian Gulf and in the head of Gulf turns counterclockwise.

Keywords: numerical model, barotropic tide, tidal currents, OSU tidal prediction software, OTPS

Procedia PDF Downloads 131
2858 On Consolidated Predictive Model of the Natural History of Breast Cancer Considering Primary Tumor and Secondary Distant Metastases Growth in Patients with Lymph Nodes Metastases

Authors: Ella Tyuryumina, Alexey Neznanov

Abstract:

This paper is devoted to mathematical modelling of the progression and stages of breast cancer. We propose Consolidated mathematical growth model of primary tumor and secondary distant metastases growth in patients with lymph nodes metastases (CoM-III) as a new research tool. We are interested in: 1) modelling the whole natural history of primary tumor and secondary distant metastases growth in patients with lymph nodes metastases; 2) developing adequate and precise CoM-III which reflects relations between primary tumor and secondary distant metastases; 3) analyzing the CoM-III scope of application; 4) implementing the model as a software tool. Firstly, the CoM-III includes exponential tumor growth model as a system of determinate nonlinear and linear equations. Secondly, mathematical model corresponds to TNM classification. It allows to calculate different growth periods of primary tumor and secondary distant metastases growth in patients with lymph nodes metastases: 1) ‘non-visible period’ for primary tumor; 2) ‘non-visible period’ for secondary distant metastases growth in patients with lymph nodes metastases; 3) ‘visible period’ for secondary distant metastases growth in patients with lymph nodes metastases. The new predictive tool: 1) is a solid foundation to develop future studies of breast cancer models; 2) does not require any expensive diagnostic tests; 3) is the first predictor which makes forecast using only current patient data, the others are based on the additional statistical data. Thus, the CoM-III model and predictive software: a) detect different growth periods of primary tumor and secondary distant metastases growth in patients with lymph nodes metastases; b) make forecast of the period of the distant metastases appearance in patients with lymph nodes metastases; c) have higher average prediction accuracy than the other tools; d) can improve forecasts on survival of breast cancer and facilitate optimization of diagnostic tests. The following are calculated by CoM-III: the number of doublings for ‘non-visible’ and ‘visible’ growth period of secondary distant metastases; tumor volume doubling time (days) for ‘non-visible’ and ‘visible’ growth period of secondary distant metastases. The CoM-III enables, for the first time, to predict the whole natural history of primary tumor and secondary distant metastases growth on each stage (pT1, pT2, pT3, pT4) relying only on primary tumor sizes. Summarizing: a) CoM-III describes correctly primary tumor and secondary distant metastases growth of IA, IIA, IIB, IIIB (T1-4N1-3M0) stages in patients with lymph nodes metastases (N1-3); b) facilitates the understanding of the appearance period and inception of secondary distant metastases.

Keywords: breast cancer, exponential growth model, mathematical model, primary tumor, secondary metastases, survival

Procedia PDF Downloads 302
2857 Profiling Risky Code Using Machine Learning

Authors: Zunaira Zaman, David Bohannon

Abstract:

This study explores the application of machine learning (ML) for detecting security vulnerabilities in source code. The research aims to assist organizations with large application portfolios and limited security testing capabilities in prioritizing security activities. ML-based approaches offer benefits such as increased confidence scores, false positives and negatives tuning, and automated feedback. The initial approach using natural language processing techniques to extract features achieved 86% accuracy during the training phase but suffered from overfitting and performed poorly on unseen datasets during testing. To address these issues, the study proposes using the abstract syntax tree (AST) for Java and C++ codebases to capture code semantics and structure and generate path-context representations for each function. The Code2Vec model architecture is used to learn distributed representations of source code snippets for training a machine-learning classifier for vulnerability prediction. The study evaluates the performance of the proposed methodology using two datasets and compares the results with existing approaches. The Devign dataset yielded 60% accuracy in predicting vulnerable code snippets and helped resist overfitting, while the Juliet Test Suite predicted specific vulnerabilities such as OS-Command Injection, Cryptographic, and Cross-Site Scripting vulnerabilities. The Code2Vec model achieved 75% accuracy and a 98% recall rate in predicting OS-Command Injection vulnerabilities. The study concludes that even partial AST representations of source code can be useful for vulnerability prediction. The approach has the potential for automated intelligent analysis of source code, including vulnerability prediction on unseen source code. State-of-the-art models using natural language processing techniques and CNN models with ensemble modelling techniques did not generalize well on unseen data and faced overfitting issues. However, predicting vulnerabilities in source code using machine learning poses challenges such as high dimensionality and complexity of source code, imbalanced datasets, and identifying specific types of vulnerabilities. Future work will address these challenges and expand the scope of the research.

Keywords: code embeddings, neural networks, natural language processing, OS command injection, software security, code properties

Procedia PDF Downloads 107