Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 3720

Search results for: accuracy

3030 Hybrid Intelligent Optimization Methods for Optimal Design of Horizontal-Axis Wind Turbine Blades

Abstract:

Designing the optimal shape of MW wind turbine blades is provided in a number of cases through evolutionary algorithms associated with mathematical modeling (Blade Element Momentum Theory). Evolutionary algorithms, among the optimization methods, enjoy many advantages, particularly in stability. However, they usually need a large number of function evaluations. Since there are a large number of local extremes, the optimization method has to find the global extreme accurately. The present paper introduces a new population-based hybrid algorithm called Genetic-Based Bees Algorithm (GBBA). This algorithm is meant to design the optimal shape for MW wind turbine blades. The current method employs crossover and neighborhood searching operators taken from the respective Genetic Algorithm (GA) and Bees Algorithm (BA) to provide a method with good performance in accuracy and speed convergence. Different blade designs, twenty-one to be exact, were considered based on the chord length, twist angle and tip speed ratio using GA results. They were compared with BA and GBBA optimum design results targeting the power coefficient and solidity. The results suggest that the final shape, obtained by the proposed hybrid algorithm, performs better compared to either BA or GA. Furthermore, the accuracy and speed convergence increases when the GBBA is employed

Keywords: Blade Design, Optimization, Genetic Algorithm, Bees Algorithm, Genetic-Based Bees Algorithm, Large Wind Turbine

Procedia PDF Downloads 316

3029 Diagnostic Accuracy Of Core Biopsy In Patients Presenting With Axillary Lymphadenopathy And Suspected Non-Breast Malignancy

Authors: Monisha Edirisooriya, Wilma Jack, Dominique Twelves, Jennifer Royds, Fiona Scott, Nicola Mason, Arran Turnbull, J. Michael Dixon

Abstract:

Introduction: Excision biopsy has been the investigation of choice for patients presenting with pathological axillary lymphadenopathy without a breast abnormality. Core biopsy of nodes can provide sufficient tissue for diagnosis and has advantages in terms of morbidity and speed of diagnosis. This study evaluates the diagnostic accuracy of core biopsy in patients presenting with axillary lymphadenopathy. Methods: Between 2009 and 2019, 165 patients referred to the Edinburgh Breast Unit had a total of 179 axillary lymph node core biopsies. Results: 152 (92%) of the 165 initial core biopsies were deemed to contain adequate nodal tissue. Core biopsy correctly established malignancy in 75 of the 78 patients with haematological malignancy (96%) and in all 28 patients with metastatic carcinoma (100%) and correctly diagnosed benign changes in 49 of 57 (86%) patients with benign conditions. There were no false positives and no false negatives. In 67 (85.9%) of the 78 patients with hematological malignancy, there was sufficient material in the first core biopsy to allow the pathologist to make an actionable diagnosis and not ask for more tissue sampling prior to treatment. There were no complications of core biopsy. On follow up, none of the patients with benign cores has been shown to have malignancy in the axilla and none with lymphoma had their initial disease incorrectly classified. Conclusions: This study shows that core biopsy is now the investigation of choice for patients presenting with axillary lymphadenopathy even in those suspected as having lymphoma.

Keywords: core biopsy, excision biopsy, axillary lymphadenopathy, non-breast malignancy

Procedia PDF Downloads 241

3028 Validation of Asymptotic Techniques to Predict Bistatic Radar Cross Section

Authors: M. Pienaar, J. W. Odendaal, J. C. Smit, J. Joubert

Abstract:

Simulations are commonly used to predict the bistatic radar cross section (RCS) of military targets since characterization measurements can be expensive and time consuming. It is thus important to accurately predict the bistatic RCS of targets. Computational electromagnetic (CEM) methods can be used for bistatic RCS prediction. CEM methods are divided into full-wave and asymptotic methods. Full-wave methods are numerical approximations to the exact solution of Maxwell’s equations. These methods are very accurate but are computationally very intensive and time consuming. Asymptotic techniques make simplifying assumptions in solving Maxwell's equations and are thus less accurate but require less computational resources and time. Asymptotic techniques can thus be very valuable for the prediction of bistatic RCS of electrically large targets, due to the decreased computational requirements. This study extends previous work by validating the accuracy of asymptotic techniques to predict bistatic RCS through comparison with full-wave simulations as well as measurements. Validation is done with canonical structures as well as complex realistic aircraft models instead of only looking at a complex slicy structure. The slicy structure is a combination of canonical structures, including cylinders, corner reflectors and cubes. Validation is done over large bistatic angles and at different polarizations. Bistatic RCS measurements were conducted in a compact range, at the University of Pretoria, South Africa. The measurements were performed at different polarizations from 2 GHz to 6 GHz. Fixed bistatic angles of β = 30.8°, 45° and 90° were used. The measurements were calibrated with an active calibration target. The EM simulation tool FEKO was used to generate simulated results. The full-wave multi-level fast multipole method (MLFMM) simulated results together with the measured data were used as reference for validation. The accuracy of physical optics (PO) and geometrical optics (GO) was investigated. Differences relating to amplitude, lobing structure and null positions were observed between the asymptotic, full-wave and measured data. PO and GO were more accurate at angles close to the specular scattering directions and the accuracy seemed to decrease as the bistatic angle increased. At large bistatic angles PO did not perform well due to the shadow regions not being treated appropriately. PO also did not perform well for canonical structures where multi-bounce was the main scattering mechanism. PO and GO do not account for diffraction but these inaccuracies tended to decrease as the electrical size of objects increased. It was evident that both asymptotic techniques do not properly account for bistatic structural shadowing. Specular scattering was calculated accurately even if targets did not meet the electrically large criteria. It was evident that the bistatic RCS prediction performance of PO and GO depends on incident angle, frequency, target shape and observation angle. The improved computational efficiency of the asymptotic solvers yields a major advantage over full-wave solvers and measurements; however, there is still much room for improvement of the accuracy of these asymptotic techniques.

Keywords: asymptotic techniques, bistatic RCS, geometrical optics, physical optics

Procedia PDF Downloads 258

3027 Modeling Fertility and Production of Hazelnut Cultivars through the Artificial Neural Network under Climate Change of Karaj

Authors: Marziyeh Khavari

Abstract:

In recent decades, climate change, global warming, and the growing population worldwide face some challenges, such as increasing food consumption and shortage of resources. Assessing how climate change could disturb crops, especially hazelnut production, seems crucial for sustainable agriculture production. For hazelnut cultivation in the mid-warm condition, such as in Iran, here we present an investigation of climate parameters and how much they are effective on fertility and nut production of hazelnut trees. Therefore, the climate change of the northern zones in Iran has investigated (1960-2017) and was reached an uptrend in temperature. Furthermore, the descriptive analysis performed on six cultivars during seven years shows how this small-scale survey could demonstrate the effects of climate change on hazelnut production and stability. Results showed that some climate parameters are more significant on nut production, such as solar radiation, soil temperature, relative humidity, and precipitation. Moreover, some cultivars have produced more stable production, for instance, Negret and Segorbe, while the Mervill de Boliver recorded the most variation during the study. Another aspect that needs to be met is training and predicting an actual model to simulate nut production through a neural network and linear regression simulation. The study developed and estimated the ANN model's generalization capability with different criteria such as RMSE, SSE, and accuracy factors for dependent and independent variables (environmental and yield traits). The models were trained and tested while the accuracy of the model is proper to predict hazelnut production under fluctuations in weather parameters.

Keywords: climate change, neural network, hazelnut, global warming

Procedia PDF Downloads 132

3026 A Methodology for Automatic Diversification of Document Categories

Authors: Dasom Kim, Chen Liu, Myungsu Lim, Su-Hyeon Jeon, ByeoungKug Jeon, Kee-Young Kwahk, Namgyu Kim

Abstract:

Recently, numerous documents including unstructured data and text have been created due to the rapid increase in the usage of social media and the Internet. Each document is usually provided with a specific category for the convenience of the users. In the past, the categorization was performed manually. However, in the case of manual categorization, not only can the accuracy of the categorization be not guaranteed but the categorization also requires a large amount of time and huge costs. Many studies have been conducted towards the automatic creation of categories to solve the limitations of manual categorization. Unfortunately, most of these methods cannot be applied to categorizing complex documents with multiple topics because the methods work by assuming that one document can be categorized into one category only. In order to overcome this limitation, some studies have attempted to categorize each document into multiple categories. However, they are also limited in that their learning process involves training using a multi-categorized document set. These methods therefore cannot be applied to multi-categorization of most documents unless multi-categorized training sets are provided. To overcome the limitation of the requirement of a multi-categorized training set by traditional multi-categorization algorithms, we previously proposed a new methodology that can extend a category of a single-categorized document to multiple categorizes by analyzing relationships among categories, topics, and documents. In this paper, we design a survey-based verification scenario for estimating the accuracy of our automatic categorization methodology.

Keywords: big data analysis, document classification, multi-category, text mining, topic analysis

Procedia PDF Downloads 272

3025 Enhanced Planar Pattern Tracking for an Outdoor Augmented Reality System

Authors: L. Yu, W. K. Li, S. K. Ong, A. Y. C. Nee

Abstract:

In this paper, a scalable augmented reality framework for handheld devices is presented. The presented framework is enabled by using a server-client data communication structure, in which the search for tracking targets among a database of images is performed on the server-side while pixel-wise 3D tracking is performed on the client-side, which, in this case, is a handheld mobile device. Image search on the server-side adopts a residual-enhanced image descriptors representation that gives the framework a scalability property. The tracking algorithm on the client-side is based on a gravity-aligned feature descriptor which takes the advantage of a sensor-equipped mobile device and an optimized intensity-based image alignment approach that ensures the accuracy of 3D tracking. Automatic content streaming is achieved by using a key-frame selection algorithm, client working phase monitoring and standardized rules for content communication between the server and client. The recognition accuracy test performed on a standard dataset shows that the method adopted in the presented framework outperforms the Bag-of-Words (BoW) method that has been used in some of the previous systems. Experimental test conducted on a set of video sequences indicated the real-time performance of the tracking system with a frame rate at 15-30 frames per second. The presented framework is exposed to be functional in practical situations with a demonstration application on a campus walk-around.

Keywords: augmented reality framework, server-client model, vision-based tracking, image search

Procedia PDF Downloads 275

3024 Fast Approximate Bayesian Contextual Cold Start Learning (FAB-COST)

Authors: Jack R. McKenzie, Peter A. Appleby, Thomas House, Neil Walton

Abstract:

Cold-start is a notoriously difficult problem which can occur in recommendation systems, and arises when there is insufficient information to draw inferences for users or items. To address this challenge, a contextual bandit algorithm – the Fast Approximate Bayesian Contextual Cold Start Learning algorithm (FAB-COST) – is proposed, which is designed to provide improved accuracy compared to the traditionally used Laplace approximation in the logistic contextual bandit, while controlling both algorithmic complexity and computational cost. To this end, FAB-COST uses a combination of two moment projection variational methods: Expectation Propagation (EP), which performs well at the cold start, but becomes slow as the amount of data increases; and Assumed Density Filtering (ADF), which has slower growth of computational cost with data size but requires more data to obtain an acceptable level of accuracy. By switching from EP to ADF when the dataset becomes large, it is able to exploit their complementary strengths. The empirical justification for FAB-COST is presented, and systematically compared to other approaches on simulated data. In a benchmark against the Laplace approximation on real data consisting of over 670, 000 impressions from autotrader.co.uk, FAB-COST demonstrates at one point increase of over 16% in user clicks. On the basis of these results, it is argued that FAB-COST is likely to be an attractive approach to cold-start recommendation systems in a variety of contexts.

Keywords: cold-start learning, expectation propagation, multi-armed bandits, Thompson Sampling, variational inference

Procedia PDF Downloads 108

3023 Procedural Protocol for Dual Energy Computed Tomography (DECT) Inversion

Authors: Rezvan Ravanfar Haghighi, S. Chatterjee, Pratik Kumar, V. C. Vani, Priya Jagia, Sanjiv Sharma, Susama Rani Mandal, R. Lakshmy

Abstract:

The dual energy computed tomography (DECT) aims at noting the HU(V) values for the sample at two different voltages V=V1, V2 and thus obtain the electron densities (ρe) and effective atomic number (Zeff) of the substance. In the present paper, we aim to obtain a numerical algorithm by which (ρe, Zeff) can be obtained from the HU(100) and HU(140) data, where V=100, 140 kVp. The idea is to use this inversion method to characterize and distinguish between the lipid and fibrous coronary artery plaques.With the idea to develop the inversion algorithm for low Zeff materials, as is the case with non calcified coronary artery plaque, we prepare aqueous samples whose calculated values of (ρe, Zeff) lie in the range (2.65×1023≤ ρe≤ 3.64×1023 per cc ) and (6.80≤ Zeff ≤ 8.90). We fill the phantom with these known samples and experimentally determine HU(100) and HU(140) for the same pixels. Knowing that the HU(V) values are related to the attenuation coefficient of the system, we present an algorithm by which the (ρe, Zeff) is calibrated with respect to (HU(100), HU(140)). The calibration is done with a known set of 20 samples; its accuracy is checked with a different set of 23 known samples. We find that the calibration gives the ρe with an accuracy of ± 4% while Zeff is found within ±1% of the actual value, the confidence being 95%.In this inversion method (ρe, Zeff) of the scanned sample can be found by eliminating the effects of the CT machine and also by ensuring that the determination of the two unknowns (ρe, Zeff) does not interfere with each other. It is found that this algorithm can be used for prediction of chemical characteristic (ρe, Zeff) of unknown scanned materials with 95% confidence level, by inversion of the DECT data.

Keywords: chemical composition, dual-energy computed tomography, inversion algorithm

Procedia PDF Downloads 438

3022 Interpretation of the Russia-Ukraine 2022 War via N-Gram Analysis

Authors: Elcin Timur Cakmak, Ayse Oguzlar

Abstract:

This study presents the results of the tweets sent by Twitter users on social media about the Russia-Ukraine war by bigram and trigram methods. On February 24, 2022, Russian President Vladimir Putin declared a military operation against Ukraine, and all eyes were turned to this war. Many people living in Russia and Ukraine reacted to this war and protested and also expressed their deep concern about this war as they felt the safety of their families and their futures were at stake. Most people, especially those living in Russia and Ukraine, express their views on the war in different ways. The most popular way to do this is through social media. Many people prefer to convey their feelings using Twitter, one of the most frequently used social media tools. Since the beginning of the war, it is seen that there have been thousands of tweets about the war from many countries of the world on Twitter. These tweets accumulated in data sources are extracted using various codes for analysis through Twitter API and analysed by Python programming language. The aim of the study is to find the word sequences in these tweets by the n-gram method, which is known for its widespread use in computational linguistics and natural language processing. The tweet language used in the study is English. The data set consists of the data obtained from Twitter between February 24, 2022, and April 24, 2022. The tweets obtained from Twitter using the #ukraine, #russia, #war, #putin, #zelensky hashtags together were captured as raw data, and the remaining tweets were included in the analysis stage after they were cleaned through the preprocessing stage. In the data analysis part, the sentiments are found to present what people send as a message about the war on Twitter. Regarding this, negative messages make up the majority of all the tweets as a ratio of %63,6. Furthermore, the most frequently used bigram and trigram word groups are found. Regarding the results, the most frequently used word groups are “he, is”, “I, do”, “I, am” for bigrams. Also, the most frequently used word groups are “I, do, not”, “I, am, not”, “I, can, not” for trigrams. In the machine learning phase, the accuracy of classifications is measured by Classification and Regression Trees (CART) and Naïve Bayes (NB) algorithms. The algorithms are used separately for bigrams and trigrams. We gained the highest accuracy and F-measure values by the NB algorithm and the highest precision and recall values by the CART algorithm for bigrams. On the other hand, the highest values for accuracy, precision, and F-measure values are achieved by the CART algorithm, and the highest value for the recall is gained by NB for trigrams.

Keywords: classification algorithms, machine learning, sentiment analysis, Twitter

Procedia PDF Downloads 73

3021 Using Mathematical Models to Predict the Academic Performance of Students from Initial Courses in Engineering School

Authors: Martín Pratto Burgos

Abstract:

The Engineering School of the University of the Republic in Uruguay offers an Introductory Mathematical Course from the second semester of 2019. This course has been designed to assist students in preparing themselves for math courses that are essential for Engineering Degrees, namely Math1, Math2, and Math3 in this research. The research proposes to build a model that can accurately predict the student's activity and academic progress based on their performance in the three essential Mathematical courses. Additionally, there is a need for a model that can forecast the incidence of the Introductory Mathematical Course in the three essential courses approval during the first academic year. The techniques used are Principal Component Analysis and predictive modelling using the Generalised Linear Model. The dataset includes information from 5135 engineering students and 12 different characteristics based on activity and course performance. Two models are created for a type of data that follows a binomial distribution using the R programming language. Model 1 is based on a variable's p-value being less than 0.05, and Model 2 uses the stepAIC function to remove variables and get the lowest AIC score. After using Principal Component Analysis, the main components represented in the y-axis are the approval of the Introductory Mathematical Course, and the x-axis is the approval of Math1 and Math2 courses as well as student activity three years after taking the Introductory Mathematical Course. Model 2, which considered student’s activity, performed the best with an AUC of 0.81 and an accuracy of 84%. According to Model 2, the student's engagement in school activities will continue for three years after the approval of the Introductory Mathematical Course. This is because they have successfully completed the Math1 and Math2 courses. Passing the Math3 course does not have any effect on the student’s activity. Concerning academic progress, the best fit is Model 1. It has an AUC of 0.56 and an accuracy rate of 91%. The model says that if the student passes the three first-year courses, they will progress according to the timeline set by the curriculum. Both models show that the Introductory Mathematical Course does not directly affect the student’s activity and academic progress. The best model to explain the impact of the Introductory Mathematical Course on the three first-year courses was Model 1. It has an AUC of 0.76 and 98% accuracy. The model shows that if students pass the Introductory Mathematical Course, it will help them to pass Math1 and Math2 courses without affecting their performance on the Math3 course. Matching the three predictive models, if students pass Math1 and Math2 courses, they will stay active for three years after taking the Introductory Mathematical Course, and also, they will continue following the recommended engineering curriculum. Additionally, the Introductory Mathematical Course helps students to pass Math1 and Math2 when they start Engineering School. Models obtained in the research don't consider the time students took to pass the three Math courses, but they can successfully assess courses in the university curriculum.

Keywords: machine-learning, engineering, university, education, computational models

Procedia PDF Downloads 94

3020 Using Virtual Reality Exergaming to Improve Health of College Students

Authors: Juanita Wallace, Mark Jackson, Bethany Jurs

Abstract:

Introduction: Exergames, VR games used as a form of exercise, are being used to reduce sedentary lifestyles in a vast number of populations. However, there is a distinct lack of research comparing the physiological response during VR exergaming to that of traditional exercises. The purpose of this study was to create a foundationary investigation establishing changes in physiological responses resulting from VR exergaming in a college aged population. Methods: In this IRB approved study, college aged students were recruited to play a virtual reality exergame (Beat Saber) on the Oculus Quest 2 (Facebook, 2021) in either a control group (CG) or training group (TG). Both groups consisted of subjects who were not habitual users of virtual reality. The CG played VR one time per week for three weeks and the TG played 150 min/week three weeks. Each group played the same nine Beat Saber songs, in a randomized order, during 30 minute sessions. Song difficulty was increased during play based on song performance. Subjects completed a pre- and posttests at which the following was collected: • Beat Saber Game Metrics: song level played, song score, number of beats completed per song and accuracy (beats completed/total beats) • Physiological Data: heart rate (max and avg.), active calories • Demographics Results: A total of 20 subjects completed the study; nine in the CG (3 males, 6 females) and 11 (5 males, 6 females) in the TG. • Beat Saber Song Metrics: The TG improved performance from a normal/hard difficulty to hard/expert. The CG stayed at the normal/hard difficulty. At the pretest there was no difference in game accuracy between groups. However, at the posttest the CG had a higher accuracy. • Physiological Data (Table 1): Average heart rates were similar between the TG and CG at both the pre- and posttest. However, the TG expended more total calories. Discussion: Due to the lack of peer reviewed literature on c exergaming using Beat Saber, the results of this study cannot be directly compared. However, the results of this study can be compared with the previously established trends for traditional exercise. In traditional exercise, an increase in training volume equates to increased efficiency at the activity. The TG should naturally increase in difficulty at a faster rate than the CG because they played 150 hours per week. Heart rate and caloric responses also increase during traditional exercise as load increases (i.e. speed or resistance). The TG reported an increase in total calories due to a higher difficulty of play. The song accuracy decreases in the TG can be explained by the increased difficulty of play. Conclusion: VR exergaming is comparable to traditional exercise for loads within the 50-70% of maximum heart rate. The ability to use VR for health could motivate individuals who do not engage in traditional exercise. In addition, individuals in health professions can and should promote VR exergaming as a viable way to increase physical activity and improve health in their clients/patients.

Keywords: virtual reality, exergaming, health, heart rate, wellness

Procedia PDF Downloads 187

3019 Composite Approach to Extremism and Terrorism Web Content Classification

Authors: Kolade Olawande Owoeye, George Weir

Abstract:

Terrorism and extremism activities on the internet are becoming the most significant threats to national security because of their potential dangers. In response to this challenge, law enforcement and security authorities are actively implementing comprehensive measures by countering the use of the internet for terrorism. To achieve the measures, there is need for intelligence gathering via the internet. This includes real-time monitoring of potential websites that are used for recruitment and information dissemination among other operations by extremist groups. However, with billions of active webpages, real-time monitoring of all webpages become almost impossible. To narrow down the search domain, there is a need for efficient webpage classification techniques. This research proposed a new approach tagged: SentiPosit-based method. SentiPosit-based method combines features of the Posit-based method and the Sentistrenght-based method for classification of terrorism and extremism webpages. The experiment was carried out on 7500 webpages obtained through TENE-webcrawler by International Cyber Crime Research Centre (ICCRC). The webpages were manually grouped into three classes which include the ‘pro-extremist’, ‘anti-extremist’ and ‘neutral’ with 2500 webpages in each category. A supervised learning algorithm is then applied on the classified dataset in order to build the model. Results obtained was compared with existing classification method using the prediction accuracy and runtime. It was observed that our proposed hybrid approach produced a better classification accuracy compared to existing approaches within a reasonable runtime.

Keywords: sentiposit, classification, extremism, terrorism

Procedia PDF Downloads 278

3018 Multi-Temporal Mapping of Built-up Areas Using Daytime and Nighttime Satellite Images Based on Google Earth Engine Platform

Authors: S. Hutasavi, D. Chen

Abstract:

The built-up area is a significant proxy to measure regional economic growth and reflects the Gross Provincial Product (GPP). However, an up-to-date and reliable database of built-up areas is not always available, especially in developing countries. The cloud-based geospatial analysis platform such as Google Earth Engine (GEE) provides an opportunity with accessibility and computational power for those countries to generate the built-up data. Therefore, this study aims to extract the built-up areas in Eastern Economic Corridor (EEC), Thailand using day and nighttime satellite imagery based on GEE facilities. The normalized indices were generated from Landsat 8 surface reflectance dataset, including Normalized Difference Built-up Index (NDBI), Built-up Index (BUI), and Modified Built-up Index (MBUI). These indices were applied to identify built-up areas in EEC. The result shows that MBUI performs better than BUI and NDBI, with the highest accuracy of 0.85 and Kappa of 0.82. Moreover, the overall accuracy of classification was improved from 79% to 90%, and error of total built-up area was decreased from 29% to 0.7%, after night-time light data from the Visible and Infrared Imaging Suite (VIIRS) Day Night Band (DNB). The results suggest that MBUI with night-time light imagery is appropriate for built-up area extraction and be utilize for further study of socioeconomic impacts of regional development policy over the EEC region.

Keywords: built-up area extraction, google earth engine, adaptive thresholding method, rapid mapping

Procedia PDF Downloads 125

3017 Disease Level Assessment in Wheat Plots Using a Residual Deep Learning Algorithm

Authors: Felipe A. Guth, Shane Ward, Kevin McDonnell

Abstract:

The assessment of disease levels in crop fields is an important and time-consuming task that generally relies on expert knowledge of trained individuals. Image classification in agriculture problems historically has been based on classical machine learning strategies that make use of hand-engineered features in the top of a classification algorithm. This approach tends to not produce results with high accuracy and generalization to the classes classified by the system when the nature of the elements has a significant variability. The advent of deep convolutional neural networks has revolutionized the field of machine learning, especially in computer vision tasks. These networks have great resourcefulness of learning and have been applied successfully to image classification and object detection tasks in the last years. The objective of this work was to propose a new method based on deep learning convolutional neural networks towards the task of disease level monitoring. Common RGB images of winter wheat were obtained during a growing season. Five categories of disease levels presence were produced, in collaboration with agronomists, for the algorithm classification. Disease level tasks performed by experts provided ground truth data for the disease score of the same winter wheat plots were RGB images were acquired. The system had an overall accuracy of 84% on the discrimination of the disease level classes.

Keywords: crop disease assessment, deep learning, precision agriculture, residual neural networks

Procedia PDF Downloads 331

3016 A Review of Effective Gene Selection Methods for Cancer Classification Using Microarray Gene Expression Profile

Authors: Hala Alshamlan, Ghada Badr, Yousef Alohali

Abstract:

Cancer is one of the dreadful diseases, which causes considerable death rate in humans. DNA microarray-based gene expression profiling has been emerged as an efficient technique for cancer classification, as well as for diagnosis, prognosis, and treatment purposes. In recent years, a DNA microarray technique has gained more attraction in both scientific and in industrial fields. It is important to determine the informative genes that cause cancer to improve early cancer diagnosis and to give effective chemotherapy treatment. In order to gain deep insight into the cancer classification problem, it is necessary to take a closer look at the proposed gene selection methods. We believe that they should be an integral preprocessing step for cancer classification. Furthermore, finding an accurate gene selection method is a very significant issue in a cancer classification area because it reduces the dimensionality of microarray dataset and selects informative genes. In this paper, we classify and review the state-of-art gene selection methods. We proceed by evaluating the performance of each gene selection approach based on their classification accuracy and number of informative genes. In our evaluation, we will use four benchmark microarray datasets for the cancer diagnosis (leukemia, colon, lung, and prostate). In addition, we compare the performance of gene selection method to investigate the effective gene selection method that has the ability to identify a small set of marker genes, and ensure high cancer classification accuracy. To the best of our knowledge, this is the first attempt to compare gene selection approaches for cancer classification using microarray gene expression profile.

Keywords: gene selection, feature selection, cancer classification, microarray, gene expression profile

Procedia PDF Downloads 454

3015 Simulation Analysis of a Full-Scale Five-Story Building with Vibration Control Dampers

Authors: Naohiro Nakamura

Abstract:

Analysis methods to accurately estimate the behavior of buildings when earthquakes occur is very important for improving the seismic safety of such buildings. Recently, the use of damping devices has increased significantly and there is a particular need to appropriately evaluate the behavior of buildings with such devices during earthquakes in the design stage. At present, however, the accuracy of the analysis evaluations is not sufficient. One reason is that the accuracy of current analysis methods has not been appropriately verified because there is very limited data on the behavior of actual buildings during earthquakes. Many types of shaking table test of large structures are performed at the '3-Dimensional Full-Scale Earthquake Testing Facility' (nicknamed 'E-Defense') operated by the National Research Institute of Earth Science and Disaster Prevention (NIED). In this study, simulations using 3- dimensional analysis models were conducted on shaking table test of a 5-story steel-frame structure with dampers. The results of the analysis correspond favorably to the test results announced afterward by the committee. However, the suitability of the parameters and models used in the analysis and the influence they had on the responses remain unclear. Hence, we conducted additional analysis and studies on these models and parameters. In this paper, outlines of the test are shown and the utilized analysis model is explained. Next, the analysis results are compared with the test results. Then, the additional analyses, concerning with the hysteresis curve of the dampers and the beam-end stiffness of the frame, are investigated.

Keywords: three-dimensional analysis, E-defense, full-scale experimen, vibration control damper

Procedia PDF Downloads 190

3014 Artificial Neural Network in Ultra-High Precision Grinding of Borosilicate-Crown Glass

Authors: Goodness Onwuka, Khaled Abou-El-Hossein

Abstract:

Borosilicate-crown (BK7) glass has found broad application in the optic and automotive industries and the growing demands for nanometric surface finishes is becoming a necessity in such applications. Thus, it has become paramount to optimize the parameters influencing the surface roughness of this precision lens. The research was carried out on a 4-axes Nanoform 250 precision lathe machine with an ultra-high precision grinding spindle. The experiment varied the machining parameters of feed rate, wheel speed and depth of cut at three levels for different combinations using Box Behnken design of experiment and the resulting surface roughness values were measured using a Taylor Hobson Dimension XL optical profiler. Acoustic emission monitoring technique was applied at a high sampling rate to monitor the machining process while further signal processing and feature extraction methods were implemented to generate the input to a neural network algorithm. This paper highlights the training and development of a back propagation neural network prediction algorithm through careful selection of parameters and the result show a better classification accuracy when compared to a previously developed response surface model with very similar machining parameters. Hence artificial neural network algorithms provide better surface roughness prediction accuracy in the ultra-high precision grinding of BK7 glass.

Keywords: acoustic emission technique, artificial neural network, surface roughness, ultra-high precision grinding

Procedia PDF Downloads 305

3013 A Machine Learning Approach for Performance Prediction Based on User Behavioral Factors in E-Learning Environments

Authors: Naduni Ranasinghe

Abstract:

E-learning environments are getting more popular than any other due to the impact of COVID19. Even though e-learning is one of the best solutions for the teaching-learning process in the academic process, it’s not without major challenges. Nowadays, machine learning approaches are utilized in the analysis of how behavioral factors lead to better adoption and how they related to better performance of the students in eLearning environments. During the pandemic, we realized the academic process in the eLearning approach had a major issue, especially for the performance of the students. Therefore, an approach that investigates student behaviors in eLearning environments using a data-intensive machine learning approach is appreciated. A hybrid approach was used to understand how each previously told variables are related to the other. A more quantitative approach was used referred to literature to understand the weights of each factor for adoption and in terms of performance. The data set was collected from previously done research to help the training and testing process in ML. Special attention was made to incorporating different dimensionality of the data to understand the dependency levels of each. Five independent variables out of twelve variables were chosen based on their impact on the dependent variable, and by considering the descriptive statistics, out of three models developed (Random Forest classifier, SVM, and Decision tree classifier), random forest Classifier (Accuracy – 0.8542) gave the highest value for accuracy. Overall, this work met its goals of improving student performance by identifying students who are at-risk and dropout, emphasizing the necessity of using both static and dynamic data.

Keywords: academic performance prediction, e learning, learning analytics, machine learning, predictive model

Procedia PDF Downloads 157

3012 Computer Countenanced Diagnosis of Skin Nodule Detection and Histogram Augmentation: Extracting System for Skin Cancer

Authors: S. Zith Dey Babu, S. Kour, S. Verma, C. Verma, V. Pathania, A. Agrawal, V. Chaudhary, A. Manoj Puthur, R. Goyal, A. Pal, T. Danti Dey, A. Kumar, K. Wadhwa, O. Ved

Abstract:

Background: Skin cancer is now is the buzzing button in the field of medical science. The cyst's pandemic is drastically calibrating the body and well-being of the global village. Methods: The extracted image of the skin tumor cannot be used in one way for diagnosis. The stored image contains anarchies like the center. This approach will locate the forepart of an extracted appearance of skin. Partitioning image models has been presented to sort out the disturbance in the picture. Results: After completing partitioning, feature extraction has been formed by using genetic algorithm and finally, classification can be performed between the trained and test data to evaluate a large scale of an image that helps the doctors for the right prediction. To bring the improvisation of the existing system, we have set our objectives with an analysis. The efficiency of the natural selection process and the enriching histogram is essential in that respect. To reduce the false-positive rate or output, GA is performed with its accuracy. Conclusions: The objective of this task is to bring improvisation of effectiveness. GA is accomplishing its task with perfection to bring down the invalid-positive rate or outcome. The paper's mergeable portion conflicts with the composition of deep learning and medical image processing, which provides superior accuracy. Proportional types of handling create the reusability without any errors.

Keywords: computer-aided system, detection, image segmentation, morphology

Procedia PDF Downloads 150

3011 Evaluating Multiple Diagnostic Tests: An Application to Cervical Intraepithelial Neoplasia

Authors: Areti Angeliki Veroniki, Sofia Tsokani, Evangelos Paraskevaidis, Dimitris Mavridis

Abstract:

The plethora of diagnostic test accuracy (DTA) studies has led to the increased use of systematic reviews and meta-analysis of DTA studies. Clinicians and healthcare professionals often consult DTA meta-analyses to make informed decisions regarding the optimum test to choose and use for a given setting. For example, the human papilloma virus (HPV) DNA, mRNA, and cytology can be used for the cervical intraepithelial neoplasia grade 2+ (CIN2+) diagnosis. But which test is the most accurate? Studies directly comparing test accuracy are not always available, and comparisons between multiple tests create a network of DTA studies that can be synthesized through a network meta-analysis of diagnostic tests (DTA-NMA). The aim is to summarize the DTA-NMA methods for at least three index tests presented in the methodological literature. We illustrate the application of the methods using a real data set for the comparative accuracy of HPV DNA, HPV mRNA, and cytology tests for cervical cancer. A search was conducted in PubMed, Web of Science, and Scopus from inception until the end of July 2019 to identify full-text research articles that describe a DTA-NMA method for three or more index tests. Since the joint classification of the results from one index against the results of another index test amongst those with the target condition and amongst those without the target condition are rarely reported in DTA studies, only methods requiring the 2x2 tables of the results of each index test against the reference standard were included. Studies of any design published in English were eligible for inclusion. Relevant unpublished material was also included. Ten relevant studies were finally included to evaluate their methodology. DTA-NMA methods that have been presented in the literature together with their advantages and disadvantages are described. In addition, using 37 studies for cervical cancer obtained from a published Cochrane review as a case study, an application of the identified DTA-NMA methods to determine the most promising test (in terms of sensitivity and specificity) for use as the best screening test to detect CIN2+ is presented. As a conclusion, different approaches for the comparative DTA meta-analysis of multiple tests may conclude to different results and hence may influence decision-making. Acknowledgment: This research is co-financed by Greece and the European Union (European Social Fund- ESF) through the Operational Programme «Human Resources Development, Education and Lifelong Learning 2014-2020» in the context of the project “Extension of Network Meta-Analysis for the Comparison of Diagnostic Tests ” (MIS 5047640).

Keywords: colposcopy, diagnostic test, HPV, network meta-analysis

Procedia PDF Downloads 139

3010 Text Localization in Fixed-Layout Documents Using Convolutional Networks in a Coarse-to-Fine Manner

Authors: Beier Zhu, Rui Zhang, Qi Song

Abstract:

Text contained within fixed-layout documents can be of great semantic value and so requires a high localization accuracy, such as ID cards, invoices, cheques, and passports. Recently, algorithms based on deep convolutional networks achieve high performance on text detection tasks. However, for text localization in fixed-layout documents, such algorithms detect word bounding boxes individually, which ignores the layout information. This paper presents a novel architecture built on convolutional neural networks (CNNs). A global text localization network and a regional bounding-box regression network are introduced to tackle the problem in a coarse-to-fine manner. The text localization network simultaneously locates word bounding points, which takes the layout information into account. The bounding-box regression network inputs the features pooled from arbitrarily sized RoIs and refine the localizations. These two networks share their convolutional features and are trained jointly. A typical type of fixed-layout documents: ID cards, is selected to evaluate the effectiveness of the proposed system. These networks are trained on data cropped from nature scene images, and synthetic data produced by a synthetic text generation engine. Experiments show that our approach locates high accuracy word bounding boxes and achieves state-of-the-art performance.

Keywords: bounding box regression, convolutional networks, fixed-layout documents, text localization

Procedia PDF Downloads 194

3009 Artificial Intelligence in Melanoma Prognosis: A Narrative Review

Authors: Shohreh Ghasemi

Abstract:

Introduction: Melanoma is a complex disease with various clinical and histopathological features that impact prognosis and treatment decisions. Traditional methods of melanoma prognosis involve manual examination and interpretation of clinical and histopathological data by dermatologists and pathologists. However, the subjective nature of these assessments can lead to inter-observer variability and suboptimal prognostic accuracy. AI, with its ability to analyze vast amounts of data and identify patterns, has emerged as a promising tool for improving melanoma prognosis. Methods: A comprehensive literature search was conducted to identify studies that employed AI techniques for melanoma prognosis. The search included databases such as PubMed and Google Scholar, using keywords such as "artificial intelligence," "melanoma," and "prognosis." Studies published between 2010 and 2022 were considered. The selected articles were critically reviewed, and relevant information was extracted. Results: The review identified various AI methodologies utilized in melanoma prognosis, including machine learning algorithms, deep learning techniques, and computer vision. These techniques have been applied to diverse data sources, such as clinical images, dermoscopy images, histopathological slides, and genetic data. Studies have demonstrated the potential of AI in accurately predicting melanoma prognosis, including survival outcomes, recurrence risk, and response to therapy. AI-based prognostic models have shown comparable or even superior performance compared to traditional methods.

Keywords: artificial intelligence, melanoma, accuracy, prognosis prediction, image analysis, personalized medicine

Procedia PDF Downloads 81

3008 A Sensor Placement Methodology for Chemical Plants

Authors: Omid Ataei Nia, Karim Salahshoor

Abstract:

In this paper, a new precise and reliable sensor network methodology is introduced for unit processes and operations using the Constriction Coefficient Particle Swarm Optimization (CPSO) method. CPSO is introduced as a new search engine for optimal sensor network design purposes. Furthermore, a Square Root Unscented Kalman Filter (SRUKF) algorithm is employed as a new data reconciliation technique to enhance the stability and accuracy of the filter. The proposed design procedure incorporates precision, cost, observability, reliability together with importance-of-variables (IVs) as a novel measure in Instrumentation Criteria (IC). To the best of our knowledge, no comprehensive approach has yet been proposed in the literature to take into account the importance of variables in the sensor network design procedure. In this paper, specific weight is assigned to each sensor, measuring a process variable in the sensor network to indicate the importance of that variable over the others to cater to the ultimate sensor network application requirements. A set of distinct scenarios has been conducted to evaluate the performance of the proposed methodology in a simulated Continuous Stirred Tank Reactor (CSTR) as a highly nonlinear process plant benchmark. The obtained results reveal the efficacy of the proposed method, leading to significant improvement in accuracy with respect to other alternative sensor network design approaches and securing the definite allocation of sensors to the most important process variables in sensor network design as a novel achievement.

Keywords: constriction coefficient PSO, importance of variable, MRMSE, reliability, sensor network design, square root unscented Kalman filter

Procedia PDF Downloads 160

3007 Improving the Technology of Assembly by Use of Computer Calculations

Authors: Mariya V. Yanyukina, Michael A. Bolotov

Abstract:

Assembling accuracy is the degree of accordance between the actual values of the parameters obtained during assembly, and the values specified in the assembly drawings and technical specifications. However, the assembling accuracy depends not only on the quality of the production process but also on the correctness of the assembly process. Therefore, preliminary calculations of assembly stages are carried out to verify the correspondence of real geometric parameters to their acceptable values. In the aviation industry, most calculations involve interacting dimensional chains. This greatly complicates the task. Solving such problems requires a special approach. The purpose of this article is to carry out the problem of improving the technology of assembly of aviation units by use of computer calculations. One of the actual examples of the assembly unit, in which there is an interacting dimensional chain, is the turbine wheel of gas turbine engine. Dimensional chain of turbine wheel is formed by geometric parameters of disk and set of blades. The interaction of the dimensional chain consists in the formation of two chains. The first chain is formed by the dimensions that determine the location of the grooves for the installation of the blades, and the dimensions of the blade roots. The second dimensional chain is formed by the dimensions of the airfoil shroud platform. The interaction of the dimensional chain of the turbine wheel is the interdependence of the first and second chains by means of power circuits formed by a plurality of middle parts of the turbine blades. The timeliness of the calculation of the dimensional chain of the turbine wheel is the need to improve the technology of assembly of this unit. The task at hand contains geometric and mathematical components; therefore, its solution can be implemented following the algorithm: 1) research and analysis of production errors by geometric parameters; 2) development of a parametric model in the CAD system; 3) creation of set of CAD-models of details taking into account actual or generalized distributions of errors of geometrical parameters; 4) calculation model in the CAE-system, loading of various combinations of models of parts; 5) the accumulation of statistics and analysis. The main task is to pre-simulate the assembly process by calculating the interacting dimensional chains. The article describes the approach to the solution from the point of view of mathematical statistics, implemented in the software package Matlab. Within the framework of the study, there are data on the measurement of the components of the turbine wheel-blades and disks, as a result of which it is expected that the assembly process of the unit will be optimized by solving dimensional chains.

Keywords: accuracy, assembly, interacting dimension chains, turbine

Procedia PDF Downloads 373

3006 Validation of Electrical Field Effect on Electrostatic Desalter Modeling with Experimental Laboratory Data

Authors: Fatemeh Yazdanmehr, Iulian Nistor

Abstract:

The scope of the current study is the evaluation of the electric field effect on electrostatic desalting mathematical modeling with laboratory data. This research study was focused on developing a model for an existing operation desalting unit of one of the Iranian heavy oil field with a 75 MBPD production capacity. The high temperature of inlet oil to dehydration unit reduces the oil recovery, so the mathematical modeling of desalter operation parameters is very significant. The existing production unit operating data has been used for the accuracy of the mathematical desalting plant model. The inlet oil temperature to desalter was decreased from 110 to 80°C, and the desalted electrical field was increased from 0.75 to 2.5 Kv/cm. The model result shows that the desalter parameter changes meet the water-oil specification and also the oil production and consequently annual income is increased. In addition to that, changing desalter operation conditions reduces environmental footprint because of flare gas reduction. Following to specify the accuracy of selected electrostatic desalter electrical field, laboratory data has been used. Experimental data are used to ensure the effect of electrical field change on desalter. Therefore, the lab test is done on a crude oil sample. The results include the dehydration efficiency in the presence of a demulsifier and under electrical field (0.75 Kv) conditions at various temperatures. Comparing lab experimental and electrostatic desalter mathematical model results shows 1-3 percent acceptable error which confirms the validity of desalter specification and operation conditions changes.

Keywords: desalter, electrical field, demulsification, mathematical modeling, water-oil separation

Procedia PDF Downloads 140

3005 [Keynote Talk]: sEMG Interface Design for Locomotion Identification

Authors: Rohit Gupta, Ravinder Agarwal

Abstract:

Surface electromyographic (sEMG) signal has the potential to identify the human activities and intention. This potential is further exploited to control the artificial limbs using the sEMG signal from residual limbs of amputees. The paper deals with the development of multichannel cost efficient sEMG signal interface for research application, along with evaluation of proposed class dependent statistical approach of the feature selection method. The sEMG signal acquisition interface was developed using ADS1298 of Texas Instruments, which is a front-end interface integrated circuit for ECG application. Further, the sEMG signal is recorded from two lower limb muscles for three locomotions namely: Plane Walk (PW), Stair Ascending (SA), Stair Descending (SD). A class dependent statistical approach is proposed for feature selection and also its performance is compared with 12 preexisting feature vectors. To make the study more extensive, performance of five different types of classifiers are compared. The outcome of the current piece of work proves the suitability of the proposed feature selection algorithm for locomotion recognition, as compared to other existing feature vectors. The SVM Classifier is found as the outperformed classifier among compared classifiers with an average recognition accuracy of 97.40%. Feature vector selection emerges as the most dominant factor affecting the classification performance as it holds 51.51% of the total variance in classification accuracy. The results demonstrate the potentials of the developed sEMG signal acquisition interface along with the proposed feature selection algorithm.

Keywords: classifiers, feature selection, locomotion, sEMG

Procedia PDF Downloads 293

3004 Effect of Needle Height on Discharge Coefficient and Cavitation Number

Authors: Mohammadreza Nezamirad, Sepideh Amirahmadian, Nasim Sabetpour, Azadeh Yazdi, Amirmasoud Hamedi

Abstract:

Cavitation inside diesel injector nozzle is investigated using Reynolds-Stress-Navier Stokes equations. Schnerr-Sauer cavitation model is used for modeling cavitation inside diesel injector nozzle. The carrying fluid utilized in the current study is diesel fuel. The flow is verified at the beginning by comparing with the previous experimental data, and it was found that K-Epsilon turbulent model could lead to a better accuracy comparing to K-Omega turbulent model. Moreover, the mass flow rate obtained numerically is compared with the experimental value, and the discrepancy was found to be less than 5 percent which shows the accuracy of the current results. Finally, a real-size four-hole nozzle is investigated, and the flow inside it is visualized based on velocity profile, discharge coefficient, and cavitation number. It was found that the mesh density could be reduced significantly by utilizing periodic boundary conditions. Velocity contour at the mid nozzle showed that the maximum value of velocity occurs at the end of the needle before entering the orifice area. Last but not least, at the same boundary conditions, when different needle heights were utilized, it was found that as needle height increases with an increase in cavitation number, discharge coefficient increases, while the mentioned increases are more tangible at smaller values of needle heights.

Keywords: cavitation, diesel fuel, CFD, real size nozzle, mass flow rate

Procedia PDF Downloads 148

3003 Diagnostic Accuracy of the Tuberculin Skin Test for Tuberculosis Diagnosis: Interest of Using ROC Curve and Fagan’s Nomogram

Authors: Nouira Mariem, Ben Rayana Hazem, Ennigrou Samir

Abstract:

Background and aim: During the past decade, the frequency of extrapulmonary forms of tuberculosis has increased. These forms are under-diagnosed using conventional tests. The aim of this study was to evaluate the performance of the Tuberculin Skin Test (TST) for the diagnosis of tuberculosis, using the ROC curve and Fagan’s Nomogram methodology. Methods: This was a case-control, multicenter study in 11 anti-tuberculosis centers in Tunisia, during the period from June to November2014. The cases were adults aged between 18 and 55 years with confirmed tuberculosis. Controls were free from tuberculosis. A data collection sheet was filled out and a TST was performed for each participant. Diagnostic accuracy measures of TST were estimated using ROC curve and Area Under Curve to estimate sensitivity and specificity of a determined cut-off point. Fagan’s nomogram was used to estimate its predictive values. Results: Overall, 1053 patients were enrolled, composed of 339 cases (sex-ratio (M/F)=0.87) and 714 controls (sex-ratio (M/F)=0.99). The mean age was 38.3±11.8 years for cases and 33.6±11 years for controls. The mean diameter of the TST induration was significantly higher among cases than controls (13.7mm vs.6.2mm;p=10-6). Area Under Curve was 0.789 [95% CI: 0.758-0.819; p=0.01], corresponding to a moderate discriminating power for this test. The most discriminative cut-off value of the TST, which were associated with the best sensitivity (73.7%) and specificity (76.6%) couple was about 11 mm with a Youden index of 0.503. Positive and Negative predictive values were 3.11% and 99.52%, respectively. Conclusion: In view of these results, we can conclude that the TST can be used for tuberculosis diagnosis with a good sensitivity and specificity. However, the skin induration measurement and its interpretation is operator dependent and remains difficult and subjective. The combination of the TST with another test such as the Quantiferon test would be a good alternative.

Keywords: tuberculosis, tuberculin skin test, ROC curve, cut-off

Procedia PDF Downloads 67

3002 Change Detection Analysis on Support Vector Machine Classifier of Land Use and Land Cover Changes: Case Study on Yangon

Authors: Khin Mar Yee, Mu Mu Than, Kyi Lint, Aye Aye Oo, Chan Mya Hmway, Khin Zar Chi Winn

Abstract:

The dynamic changes of Land Use and Land Cover (LULC) changes in Yangon have generally resulted the improvement of human welfare and economic development since the last twenty years. Making map of LULC is crucially important for the sustainable development of the environment. However, the exactly data on how environmental factors influence the LULC situation at the various scales because the nature of the natural environment is naturally composed of non-homogeneous surface features, so the features in the satellite data also have the mixed pixels. The main objective of this study is to the calculation of accuracy based on change detection of LULC changes by Support Vector Machines (SVMs). For this research work, the main data was satellite images of 1996, 2006 and 2015. Computing change detection statistics use change detection statistics to compile a detailed tabulation of changes between two classification images and Support Vector Machines (SVMs) process was applied with a soft approach at allocation as well as at a testing stage and to higher accuracy. The results of this paper showed that vegetation and cultivated area were decreased (average total 29 % from 1996 to 2015) because of conversion to the replacing over double of the built up area (average total 30 % from 1996 to 2015). The error matrix and confidence limits led to the validation of the result for LULC mapping.

Keywords: land use and land cover change, change detection, image processing, support vector machines

Procedia PDF Downloads 138

3001 Profiling Risky Code Using Machine Learning

Authors: Zunaira Zaman, David Bohannon

Abstract:

This study explores the application of machine learning (ML) for detecting security vulnerabilities in source code. The research aims to assist organizations with large application portfolios and limited security testing capabilities in prioritizing security activities. ML-based approaches offer benefits such as increased confidence scores, false positives and negatives tuning, and automated feedback. The initial approach using natural language processing techniques to extract features achieved 86% accuracy during the training phase but suffered from overfitting and performed poorly on unseen datasets during testing. To address these issues, the study proposes using the abstract syntax tree (AST) for Java and C++ codebases to capture code semantics and structure and generate path-context representations for each function. The Code2Vec model architecture is used to learn distributed representations of source code snippets for training a machine-learning classifier for vulnerability prediction. The study evaluates the performance of the proposed methodology using two datasets and compares the results with existing approaches. The Devign dataset yielded 60% accuracy in predicting vulnerable code snippets and helped resist overfitting, while the Juliet Test Suite predicted specific vulnerabilities such as OS-Command Injection, Cryptographic, and Cross-Site Scripting vulnerabilities. The Code2Vec model achieved 75% accuracy and a 98% recall rate in predicting OS-Command Injection vulnerabilities. The study concludes that even partial AST representations of source code can be useful for vulnerability prediction. The approach has the potential for automated intelligent analysis of source code, including vulnerability prediction on unseen source code. State-of-the-art models using natural language processing techniques and CNN models with ensemble modelling techniques did not generalize well on unseen data and faced overfitting issues. However, predicting vulnerabilities in source code using machine learning poses challenges such as high dimensionality and complexity of source code, imbalanced datasets, and identifying specific types of vulnerabilities. Future work will address these challenges and expand the scope of the research.

Keywords: code embeddings, neural networks, natural language processing, OS command injection, software security, code properties

Procedia PDF Downloads 106