Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 196

Search results for: naive bayes

106 Using Crowd-Sourced Data to Assess Safety in Developing Countries: The Case Study of Eastern Cairo, Egypt

Authors: Mahmoud Ahmed Farrag, Ali Zain Elabdeen Heikal, Mohamed Shawky Ahmed, Ahmed Osama Amer

Abstract:

Crowd-sourced data refers to data that is collected and shared by a large number of individuals or organizations, often through the use of digital technologies such as mobile devices and social media. The shortage in crash data collection in developing countries makes it difficult to fully understand and address road safety issues in these regions. In developing countries, crowd-sourced data can be a valuable tool for improving road safety, particularly in urban areas where the majority of road crashes occur. This study is -to our best knowledge- the first to develop safety performance functions using crowd-sourced data by adopting a negative binomial structure model and the Full Bayes model to investigate traffic safety for urban road networks and provide insights into the impact of roadway characteristics. Furthermore, as a part of the safety management process, network screening has been undergone through applying two different methods to rank the most hazardous road segments: PCR method (adopted in the Highway Capacity Manual HCM) as well as a graphical method using GIS tools to compare and validate. Lastly, recommendations were suggested for policymakers to ensure safer roads.

Keywords: crowdsourced data, road crashes, safety performance functions, Full Bayes models, network screening

Procedia PDF Downloads 52

105 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: classifier ensemble, breast cancer survivability, data mining, SEER

Procedia PDF Downloads 328

104 Principal Component Analysis Combined Machine Learning Techniques on Pharmaceutical Samples by Laser Induced Breakdown Spectroscopy

Authors: Kemal Efe Eseller, Göktuğ Yazici

Abstract:

Laser-induced breakdown spectroscopy (LIBS) is a rapid optical atomic emission spectroscopy which is used for material identification and analysis with the advantages of in-situ analysis, elimination of intensive sample preparation, and micro-destructive properties for the material to be tested. LIBS delivers short pulses of laser beams onto the material in order to create plasma by excitation of the material to a certain threshold. The plasma characteristics, which consist of wavelength value and intensity amplitude, depends on the material and the experiment’s environment. In the present work, medicine samples’ spectrum profiles were obtained via LIBS. Medicine samples’ datasets include two different concentrations for both paracetamol based medicines, namely Aferin and Parafon. The spectrum data of the samples were preprocessed via filling outliers based on quartiles, smoothing spectra to eliminate noise and normalizing both wavelength and intensity axis. Statistical information was obtained and principal component analysis (PCA) was incorporated to both the preprocessed and raw datasets. The machine learning models were set based on two different train-test splits, which were 70% training – 30% test and 80% training – 20% test. Cross-validation was preferred to protect the models against overfitting; thus the sample amount is small. The machine learning results of preprocessed and raw datasets were subjected to comparison for both splits. This is the first time that all supervised machine learning classification algorithms; consisting of Decision Trees, Discriminant, naïve Bayes, Support Vector Machines (SVM), k-NN(k-Nearest Neighbor) Ensemble Learning and Neural Network algorithms; were incorporated to LIBS data of paracetamol based pharmaceutical samples, and their different concentrations on preprocessed and raw dataset in order to observe the effect of preprocessing.

Keywords: machine learning, laser-induced breakdown spectroscopy, medicines, principal component analysis, preprocessing

Procedia PDF Downloads 87

103 Constructing a Semi-Supervised Model for Network Intrusion Detection

Authors: Tigabu Dagne Akal

Abstract:

While advances in computer and communications technology have made the network ubiquitous, they have also rendered networked systems vulnerable to malicious attacks devised from a distance. These attacks or intrusions start with attackers infiltrating a network through a vulnerable host and then launching further attacks on the local network or Intranet. Nowadays, system administrators and network professionals can attempt to prevent such attacks by developing intrusion detection tools and systems using data mining technology. In this study, the experiments were conducted following the Knowledge Discovery in Database Process Model. The Knowledge Discovery in Database Process Model starts from selection of the datasets. The dataset used in this study has been taken from Massachusetts Institute of Technology Lincoln Laboratory. After taking the data, it has been pre-processed. The major pre-processing activities include fill in missed values, remove outliers; resolve inconsistencies, integration of data that contains both labelled and unlabelled datasets, dimensionality reduction, size reduction and data transformation activity like discretization tasks were done for this study. A total of 21,533 intrusion records are used for training the models. For validating the performance of the selected model a separate 3,397 records are used as a testing set. For building a predictive model for intrusion detection J48 decision tree and the Naïve Bayes algorithms have been tested as a classification approach for both with and without feature selection approaches. The model that was created using 10-fold cross validation using the J48 decision tree algorithm with the default parameter values showed the best classification accuracy. The model has a prediction accuracy of 96.11% on the training datasets and 93.2% on the test dataset to classify the new instances as normal, DOS, U2R, R2L and probe classes. The findings of this study have shown that the data mining methods generates interesting rules that are crucial for intrusion detection and prevention in the networking industry. Future research directions are forwarded to come up an applicable system in the area of the study.

Keywords: intrusion detection, data mining, computer science, data mining

Procedia PDF Downloads 296

102 An Overbooking Model for Car Rental Service with Different Types of Cars

Authors: Naragain Phumchusri, Kittitach Pongpairoj

Abstract:

Overbooking is a very useful revenue management technique that could help reduce costs caused by either undersales or oversales. In this paper, we propose an overbooking model for two types of cars that can minimize the total cost for car rental service. With two types of cars, there is an upgrade possibility for lower type to upper type. This makes the model more complex than one type of cars scenario. We have found that convexity can be proved in this case. Sensitivity analysis of the parameters is conducted to observe the effects of relevant parameters on the optimal solution. Model simplification is proposed using multiple linear regression analysis, which can help estimate the optimal overbooking level using appropriate independent variables. The results show that the overbooking level from multiple linear regression model is relatively close to the optimal solution (with the adjusted R-squared value of at least 72.8%). To evaluate the performance of the proposed model, the total cost was compared with the case where the decision maker uses a naïve method for the overbooking level. It was found that the total cost from optimal solution is only 0.5 to 1 percent (on average) lower than the cost from regression model, while it is approximately 67% lower than the cost obtained by the naïve method. It indicates that our proposed simplification method using regression analysis can effectively perform in estimating the overbooking level.

Keywords: overbooking, car rental industry, revenue management, stochastic model

Procedia PDF Downloads 172

101 Study of Circulatory MiR-122 and MiR-130a Expression among Chronic Hepatitis C Egyptian Patients

Authors: Hend K. Moosa, Eman A. Rashwan, Ezzat M. Hassan, Amany A. Ghazy, Amel G. Sheredy

Abstract:

The stability of microRNA (miR) in the circulation can show a great progress toward the discovery of non-invasive diagnostic and prognostic biomarkers in many diseases. In the present study, circulatory miR-122 and miR-130a were analysed in chronic hepatitis C Egyptian patients in predicting the clinical outcome of interferon treatment. In addition, their expression levels were correlated to viral RNA levels, necro-inflammatory markers (AST, ALT) and to each other. This study was conducted on 51 subjects where 36 were chronic HCV patients in which they were divided into naive and interferon treated HCV patients (responders and non-responders) and 15 matched healthy controls. Serum quantification of miR-122 and miR-130a were performed by quantitative Real-time Polymerase Chain Reaction (qRT-PCR). The results showed a significant upregulation of miR-122 in non-responder patients (P=0.049). By receiver operating characteristic analysis curve, miR-122 revealed 65% sensitivity and 92.3% specificity in predicting non-responsiveness of patients to IFN treatment, while miR-130a showed a sensitivity of 100% and specificity of 53.85%. Remarkably, there was a significant positive correlation between miR-122 and miR-130a in naive HCV patients (r=0.714, p=0.003). However, there was no significant correlation between serum miR-122, miR-130a expression levels and necro-inflammatory markers (AST, ALT). To conclude, miR-122 and miR-130a have a significant association with viral RNA levels and accordingly, they may have a synergistic power in promoting viral replication. Interestingly, miR-122 and miR-130a have a predictive power in predicting clinical outcome of IFN treatment which can be further studied in currently used drugs in order to reduce the socio-economic burden of potentially non-responders.

Keywords: hepatitis C, microRNA, miR-122, miR-130a

Procedia PDF Downloads 170

100 Interpretation of the Russia-Ukraine 2022 War via N-Gram Analysis

Authors: Elcin Timur Cakmak, Ayse Oguzlar

Abstract:

This study presents the results of the tweets sent by Twitter users on social media about the Russia-Ukraine war by bigram and trigram methods. On February 24, 2022, Russian President Vladimir Putin declared a military operation against Ukraine, and all eyes were turned to this war. Many people living in Russia and Ukraine reacted to this war and protested and also expressed their deep concern about this war as they felt the safety of their families and their futures were at stake. Most people, especially those living in Russia and Ukraine, express their views on the war in different ways. The most popular way to do this is through social media. Many people prefer to convey their feelings using Twitter, one of the most frequently used social media tools. Since the beginning of the war, it is seen that there have been thousands of tweets about the war from many countries of the world on Twitter. These tweets accumulated in data sources are extracted using various codes for analysis through Twitter API and analysed by Python programming language. The aim of the study is to find the word sequences in these tweets by the n-gram method, which is known for its widespread use in computational linguistics and natural language processing. The tweet language used in the study is English. The data set consists of the data obtained from Twitter between February 24, 2022, and April 24, 2022. The tweets obtained from Twitter using the #ukraine, #russia, #war, #putin, #zelensky hashtags together were captured as raw data, and the remaining tweets were included in the analysis stage after they were cleaned through the preprocessing stage. In the data analysis part, the sentiments are found to present what people send as a message about the war on Twitter. Regarding this, negative messages make up the majority of all the tweets as a ratio of %63,6. Furthermore, the most frequently used bigram and trigram word groups are found. Regarding the results, the most frequently used word groups are “he, is”, “I, do”, “I, am” for bigrams. Also, the most frequently used word groups are “I, do, not”, “I, am, not”, “I, can, not” for trigrams. In the machine learning phase, the accuracy of classifications is measured by Classification and Regression Trees (CART) and Naïve Bayes (NB) algorithms. The algorithms are used separately for bigrams and trigrams. We gained the highest accuracy and F-measure values by the NB algorithm and the highest precision and recall values by the CART algorithm for bigrams. On the other hand, the highest values for accuracy, precision, and F-measure values are achieved by the CART algorithm, and the highest value for the recall is gained by NB for trigrams.

Keywords: classification algorithms, machine learning, sentiment analysis, Twitter

Procedia PDF Downloads 73

99 Theta-Phase Gamma-Amplitude Coupling as a Neurophysiological Marker in Neuroleptic-Naive Schizophrenia

Authors: Jun Won Kim

Abstract:

Objective: Theta-phase gamma-amplitude coupling (TGC) was used as a novel evidence-based tool to reflect the dysfunctional cortico-thalamic interaction in patients with schizophrenia. However, to our best knowledge, no studies have reported the diagnostic utility of the TGC in the resting-state electroencephalographic (EEG) of neuroleptic-naive patients with schizophrenia compared to healthy controls. Thus, the purpose of this EEG study was to understand the underlying mechanisms in patients with schizophrenia by comparing the TGC at rest between two groups and to evaluate the diagnostic utility of TGC. Method: The subjects included 90 patients with schizophrenia and 90 healthy controls. All patients were diagnosed with schizophrenia according to the criteria of Diagnostic and Statistical Manual of Mental Disorders, 4th edition (DSM-IV) by two independent psychiatrists using semi-structured clinical interviews. Because patients were either drug-naïve (first episode) or had not been taking psychoactive drugs for one month before the study, we could exclude the influence of medications. Five frequency bands were defined for spectral analyses: delta (1–4 Hz), theta (4–8 Hz), slow alpha (8–10 Hz), fast alpha (10–13.5 Hz), beta (13.5–30 Hz), and gamma (30-80 Hz). The spectral power of the EEG data was calculated with fast Fourier Transformation using the 'spectrogram.m' function of the signal processing toolbox in Matlab. An analysis of covariance (ANCOVA) was performed to compare the TGC results between the groups, which were adjusted using a Bonferroni correction (P < 0.05/19 = 0.0026). Receiver operator characteristic (ROC) analysis was conducted to examine the discriminating ability of the TGC data for schizophrenia diagnosis. Results: The patients with schizophrenia showed a significant increase in the resting-state TGC at all electrodes. The delta, theta, slow alpha, fast alpha, and beta powers showed low accuracies of 62.2%, 58.4%, 56.9%, 60.9%, and 59.0%, respectively, in discriminating the patients with schizophrenia from the healthy controls. The ROC analysis performed on the TGC data generated the most accurate result among the EEG measures, displaying an overall classification accuracy of 92.5%. Conclusion: As TGC includes phase, which contains information about neuronal interactions from the EEG recording, TGC is expected to be useful for understanding the mechanisms the dysfunctional cortico-thalamic interaction in patients with schizophrenia. The resting-state TGC value was increased in the patients with schizophrenia compared to that in the healthy controls and had a higher discriminating ability than the other parameters. These findings may be related to the compensatory hyper-arousal patterns of the dysfunctional default-mode network (DMN) in schizophrenia. Further research exploring the association between TGC and medical or psychiatric conditions that may confound EEG signals will help clarify the potential utility of TGC.

Keywords: quantitative electroencephalography (QEEG), theta-phase gamma-amplitude coupling (TGC), schizophrenia, diagnostic utility

Procedia PDF Downloads 143

98 Transcriptional Differences in B cell Subpopulations over the Course of Preclinical Autoimmunity Development

Authors: Aleksandra Bylinska, Samantha Slight-Webb, Kevin Thomas, Miles Smith, Susan Macwana, Nicolas Dominguez, Eliza Chakravarty, Joan T. Merrill, Judith A. James, Joel M. Guthridge

Abstract:

Background: Systemic Lupus Erythematosus (SLE) is an interferon-related autoimmune disease characterized by B cell dysfunction. One of the main hallmarks is a loss of tolerance to self-antigens leading to increased levels of autoantibodies against nuclear components (ANAs). However, up to 20% of healthy ANA+ individuals will not develop clinical illness. SLE is more prevalent among women and minority populations (African, Asian American and Hispanics). Moreover, African Americans have a stronger interferon (IFN) signature and develop more severe symptoms. The exact mechanisms involved in ethnicity-dependent B cell dysregulation and the progression of autoimmune disease from ANA+ healthy individuals to clinical disease remains unclear. Methods: Peripheral blood mononuclear cells (PBMCs) from African (AA) and European American (EA) ANA- (n=12), ANA+ (n=12) and SLE (n=12) individuals were assessed by multimodal scRNA-Seq/CITE-Seq methods to examine differential gene signatures in specific B cell subsets. Library preparation was done with a 10X Genomics Chromium according to established protocols and sequenced on Illumina NextSeq. The data were further analyzed for distinct cluster identification and differential gene signatures in the Seurat package in R and pathways analysis was performed using Ingenuity Pathways Analysis (IPA). Results: Comparing all subjects, 14 distinct B cell clusters were identified using a community detection algorithm and visualized with Uniform Manifold Approximation Projection (UMAP). The proportion of each of those clusters varied by disease status and ethnicity. Transitional B cells trended higher in ANA+ healthy individuals, especially in AA. Ribonucleoprotein high population (HNRNPH1 elevated, heterogeneous nuclear ribonucleoprotein, RNP-Hi) of proliferating Naïve B cells were more prevalent in SLE patients, specifically in EA. Interferon-induced protein high population (IFIT-Hi) of Naive B cells are increased in EA ANA- individuals. The proportion of memory B cells and plasma cells clusters tend to be expanded in SLE patients. As anticipated, we observed a higher signature of cytokine-related pathways, especially interferon, in SLE individuals. Pathway analysis among AA individuals revealed an NRF2-mediated Oxidative Stress response signature in the transitional B cell cluster, not seen in EA individuals. TNFR1/2 and Sirtuin Signaling pathway genes were higher in AA IFIT-Hi Naive B cells, whereas they were not detected in EA individuals. Interferon signaling was observed in B cells in both ethnicities. Oxidative phosphorylation was found in age-related B cells (ABCs) for both ethnicities, whereas Death Receptor Signaling was found only in EA patients in these cells. Interferon-related transcription factors were elevated in ABCs and IFIT-Hi Naive B cells in SLE subjects of both ethnicities. Conclusions: ANA+ healthy individuals have altered gene expression pathways in B cells that might drive apoptosis and subsequent clinical autoimmune pathogenesis. Increases in certain regulatory pathways may delay progression to SLE. Further, AA individuals have more elevated activation pathways that may make them more susceptible to SLE.

Keywords:

Procedia PDF Downloads 175

97 Data-Driven Surrogate Models for Damage Prediction of Steel Liquid Storage Tanks under Seismic Hazard

Authors: Laura Micheli, Majd Hijazi, Mahmoud Faytarouni

Abstract:

The damage reported by oil and gas industrial facilities revealed the utmost vulnerability of steel liquid storage tanks to seismic events. The failure of steel storage tanks may yield devastating and long-lasting consequences on built and natural environments, including the release of hazardous substances, uncontrolled fires, and soil contamination with hazardous materials. It is, therefore, fundamental to reliably predict the damage that steel liquid storage tanks will likely experience under future seismic hazard events. The seismic performance of steel liquid storage tanks is usually assessed using vulnerability curves obtained from the numerical simulation of a tank under different hazard scenarios. However, the computational demand of high-fidelity numerical simulation models, such as finite element models, makes the vulnerability assessment of liquid storage tanks time-consuming and often impractical. As a solution, this paper presents a surrogate model-based strategy for predicting seismic-induced damage in steel liquid storage tanks. In the proposed strategy, the surrogate model is leveraged to reduce the computational demand of time-consuming numerical simulations. To create the data set for training the surrogate model, field damage data from past earthquakes reconnaissance surveys and reports are collected. Features representative of steel liquid storage tank characteristics (e.g., diameter, height, liquid level, yielding stress) and seismic excitation parameters (e.g., peak ground acceleration, magnitude) are extracted from the field damage data. The collected data are then utilized to train a surrogate model that maps the relationship between tank characteristics, seismic hazard parameters, and seismic-induced damage via a data-driven surrogate model. Different types of surrogate algorithms, including naïve Bayes, k-nearest neighbors, decision tree, and random forest, are investigated, and results in terms of accuracy are reported. The model that yields the most accurate predictions is employed to predict future damage as a function of tank characteristics and seismic hazard intensity level. Results show that the proposed approach can be used to estimate the extent of damage in steel liquid storage tanks, where the use of data-driven surrogates represents a viable alternative to computationally expensive numerical simulation models.

Keywords: damage prediction , data-driven model, seismic performance, steel liquid storage tanks, surrogate model

Procedia PDF Downloads 143

96 Comparison of Deep Learning and Machine Learning Algorithms to Diagnose and Predict Breast Cancer

Authors: F. Ghazalnaz Sharifonnasabi, Iman Makhdoom

Abstract:

Breast cancer is a serious health concern that affects many people around the world. According to a study published in the Breast journal, the global burden of breast cancer is expected to increase significantly over the next few decades. The number of deaths from breast cancer has been increasing over the years, but the age-standardized mortality rate has decreased in some countries. It’s important to be aware of the risk factors for breast cancer and to get regular check- ups to catch it early if it does occur. Machin learning techniques have been used to aid in the early detection and diagnosis of breast cancer. These techniques, that have been shown to be effective in predicting and diagnosing the disease, have become a research hotspot. In this study, we consider two deep learning approaches including: Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN). We also considered the five-machine learning algorithm titled: Decision Tree (C4.5), Naïve Bayesian (NB), Support Vector Machine (SVM), K-Nearest Neighbors (KNN) Algorithm and XGBoost (eXtreme Gradient Boosting) on the Breast Cancer Wisconsin Diagnostic dataset. We have carried out the process of evaluating and comparing classifiers involving selecting appropriate metrics to evaluate classifier performance and selecting an appropriate tool to quantify this performance. The main purpose of the study is predicting and diagnosis breast cancer, applying the mentioned algorithms and also discovering of the most effective with respect to confusion matrix, accuracy and precision. It is realized that CNN outperformed all other classifiers and achieved the highest accuracy (0.982456). The work is implemented in the Anaconda environment based on Python programing language.

Keywords: breast cancer, multi-layer perceptron, Naïve Bayesian, SVM, decision tree, convolutional neural network, XGBoost, KNN

Procedia PDF Downloads 75

95 The Involvement of the Homing Receptors CCR7 and CD62L in the Pathogenesis of Graft-Versus-Host Disease

Authors: Federico Herrera, Valle Gomez García de Soria, Itxaso Portero Sainz, Carlos Fernández Arandojo, Mercedes Royg, Ana Marcos Jimenez, Anna Kreutzman, Cecilia MuñozCalleja

Abstract:

Introduction: Graft-versus-host disease (GVHD) still remains the major complication associated with allogeneic stem cell transplantation (SCT). The pathogenesis involves migration of donor naïve T-cells into recipient secondary lymphoid organs. Two molecules are important in this process: CD62L and CCR7, which are characteristically expressed in naïve/central memory T-cells. With this background, we aimed to study the influence of CCR7 and CD62L on donor lymphocytes in the development and severity of GVHD. Material and methods: This single center study included 98 donor-recipient pairs. Samples were collected prospectively from the apheresis product and phenotyped by flow cytometry. CCR7 and CD62L expression in CD4+ and CD8+ T-cells were compared between patients who developed acute (n=40) or chronic GVHD (n=33) and those who did not (n=38). Results: The patients who developed acute GVHD were transplanted with a higher percentage of CCR7+CD4+ T-cells (p = 0.05) compared to the no GVHD group. These results were confirmed when these patients were divided in degrees according to the severity of the disease; the more severe disease, the higher percentage of CCR7+CD4+ T-cells. Conversely, chronic GVHD patients received a higher percentage of CCR7+CD8+ T-cells (p=0.02) in comparison to those who did not develop the complication. These data were also confirmed when patients were subdivided in degrees of the disease severity. A multivariable analysis confirmed that percentage of CCR7+CD4+ T-cells is a predictive factor of acute GVHD whereas the percentage of CCR7+CD8+ T-cells is a predictive factor of chronic GVHD. In vitro functional assays (migration and activation assays) supported the idea of CCR7+ T-cells were involved in the development of GVHD. As low levels of CD62L expression were detected in all apheresis products, we tested the hypothesis that CD62L was shed during apheresis procedure. Comparing CD62L surface levels in T-cells from the same donor immediately before collecting the apheresis product, and the final apheresis product we found that this process down-regulated CD62L in both CD4+ and CD8+ T cells (p=0.008). Interestingly, when CD62L levels were analysed in days 30 or 60 after engraftment, they recovered to baseline (p=0.008). However, to investigate the relation between CD62L expression and the development of GVHD in the recipient samples after the engraftment, no differences were observed comparing patients with GVHD to those who did not develop the disease. Discussion: Our prospective study indicates that the CCR7+ T-cells from the donor, which include naïve and central memory T-cells, contain the alloreactive cells with a high ability to mediate GVHD (in the case of both migration and activation). Therefore we suggest that the proportion and functional properties of CCR7+CD4+ and CCR7+CD8+ T-cells in the apheresis could act as a predictive biomarker to both acute and chronic GVHD respectively. Importantly, our study precludes that CD62L is lost in the apheresis and therefore it is not a reliable biomarker for the development of GVHD.

Keywords: CCR7, CD62L, GVHD, SCT

Procedia PDF Downloads 288

94 Evaluation of Gesture-Based Password: User Behavioral Features Using Machine Learning Algorithms

Authors: Lakshmidevi Sreeramareddy, Komalpreet Kaur, Nane Pothier

Abstract:

Graphical-based passwords have existed for decades. Their major advantage is that they are easier to remember than an alphanumeric password. However, their disadvantage (especially recognition-based passwords) is the smaller password space, making them more vulnerable to brute force attacks. Graphical passwords are also highly susceptible to the shoulder-surfing effect. The gesture-based password method that we developed is a grid-free, template-free method. In this study, we evaluated the gesture-based passwords for usability and vulnerability. The results of the study are significant. We developed a gesture-based password application for data collection. Two modes of data collection were used: Creation mode and Replication mode. In creation mode (Session 1), users were asked to create six different passwords and reenter each password five times. In replication mode, users saw a password image created by some other user for a fixed duration of time. Three different duration timers, such as 5 seconds (Session 2), 10 seconds (Session 3), and 15 seconds (Session 4), were used to mimic the shoulder-surfing attack. After the timer expired, the password image was removed, and users were asked to replicate the password. There were 74, 57, 50, and 44 users participated in Session 1, Session 2, Session 3, and Session 4 respectfully. In this study, the machine learning algorithms have been applied to determine whether the person is a genuine user or an imposter based on the password entered. Five different machine learning algorithms were deployed to compare the performance in user authentication: namely, Decision Trees, Linear Discriminant Analysis, Naive Bayes Classifier, Support Vector Machines (SVMs) with Gaussian Radial Basis Kernel function, and K-Nearest Neighbor. Gesture-based password features vary from one entry to the next. It is difficult to distinguish between a creator and an intruder for authentication. For each password entered by the user, four features were extracted: password score, password length, password speed, and password size. All four features were normalized before being fed to a classifier. Three different classifiers were trained using data from all four sessions. Classifiers A, B, and C were trained and tested using data from the password creation session and the password replication with a timer of 5 seconds, 10 seconds, and 15 seconds, respectively. The classification accuracies for Classifier A using five ML algorithms are 72.5%, 71.3%, 71.9%, 74.4%, and 72.9%, respectively. The classification accuracies for Classifier B using five ML algorithms are 69.7%, 67.9%, 70.2%, 73.8%, and 71.2%, respectively. The classification accuracies for Classifier C using five ML algorithms are 68.1%, 64.9%, 68.4%, 71.5%, and 69.8%, respectively. SVMs with Gaussian Radial Basis Kernel outperform other ML algorithms for gesture-based password authentication. Results confirm that the shorter the duration of the shoulder-surfing attack, the higher the authentication accuracy. In conclusion, behavioral features extracted from the gesture-based passwords lead to less vulnerable user authentication.

Keywords: authentication, gesture-based passwords, machine learning algorithms, shoulder-surfing attacks, usability

Procedia PDF Downloads 107

93 MANIFEST-2, a Global, Phase 3, Randomized, Double-Blind, Active-Control Study of Pelabresib (CPI-0610) and Ruxolitinib vs. Placebo and Ruxolitinib in JAK Inhibitor-Naïve Myelofibrosis Patients

Authors: Claire Harrison, Raajit K. Rampal, Vikas Gupta, Srdan Verstovsek, Moshe Talpaz, Jean-Jacques Kiladjian, Ruben Mesa, Andrew Kuykendall, Alessandro Vannucchi, Francesca Palandri, Sebastian Grosicki, Timothy Devos, Eric Jourdan, Marielle J. Wondergem, Haifa Kathrin Al-Ali, Veronika Buxhofer-Ausch, Alberto Alvarez-Larrán, Sanjay Akhani, Rafael Muñoz-Carerras, Yury Sheykin, Gozde Colak, Morgan Harris, John Mascarenhas

Abstract:

Myelofibrosis (MF) is characterized by bone marrow fibrosis, anemia, splenomegaly and constitutional symptoms. Progressive bone marrow fibrosis results from aberrant megakaryopoeisis and expression of proinflammatory cytokines, both of which are heavily influenced by bromodomain and extraterminal domain (BET)-mediated gene regulation and lead to myeloproliferation and cytopenias. Pelabresib (CPI-0610) is an oral small-molecule investigational inhibitor of BET protein bromodomains currently being developed for the treatment of patients with MF. It is designed to downregulate BET target genes and modify nuclear factor kappa B (NF-κB) signaling. MANIFEST-2 was initiated based on data from Arm 3 of the ongoing Phase 2 MANIFEST study (NCT02158858), which is evaluating the combination of pelabresib and ruxolitinib in Janus kinase inhibitor (JAKi) treatment-naïve patients with MF. Primary endpoint analyses showed splenic and symptom responses in 68% and 56% of 84 enrolled patients, respectively. MANIFEST-2 (NCT04603495) is a global, Phase 3, randomized, double-blind, active-control study of pelabresib and ruxolitinib versus placebo and ruxolitinib in JAKi treatment-naïve patients with primary MF, post-polycythemia vera MF or post-essential thrombocythemia MF. The aim of this study is to evaluate the efficacy and safety of pelabresib in combination with ruxolitinib. Here we report updates from a recent protocol amendment. The MANIFEST-2 study schema is shown in Figure 1. Key eligibility criteria include a Dynamic International Prognostic Scoring System (DIPSS) score of Intermediate-1 or higher, platelet count ≥100 × 10^9/L, spleen volume ≥450 cc by computerized tomography or magnetic resonance imaging, ≥2 symptoms with an average score ≥3 or a Total Symptom Score (TSS) of ≥10 using the Myelofibrosis Symptom Assessment Form v4.0, peripheral blast count <5% and Eastern Cooperative Oncology Group performance status ≤2. Patient randomization will be stratified by DIPSS risk category (Intermediate-1 vs Intermediate-2 vs High), platelet count (>200 × 10^9/L vs 100–200 × 10^9/L) and spleen volume (≥1800 cm^3 vs <1800 cm^3). Double-blind treatment (pelabresib or matching placebo) will be administered once daily for 14 consecutive days, followed by a 7 day break, which is considered one cycle of treatment. Ruxolitinib will be administered twice daily for all 21 days of the cycle. The primary endpoint is SVR35 response (≥35% reduction in spleen volume from baseline) at Week 24, and the key secondary endpoint is TSS50 response (≥50% reduction in TSS from baseline) at Week 24. Other secondary endpoints include safety, pharmacokinetics, changes in bone marrow fibrosis, duration of SVR35 response, duration of TSS50 response, progression-free survival, overall survival, conversion from transfusion dependence to independence and rate of red blood cell transfusion for the first 24 weeks. Study recruitment is ongoing; 400 patients (200 per arm) from North America, Europe, Asia and Australia will be enrolled. The study opened for enrollment in November 2020. MANIFEST-2 was initiated based on data from the ongoing Phase 2 MANIFEST study with the aim of assessing the efficacy and safety of pelabresib and ruxolitinib in JAKi treatment-naïve patients with MF. MANIFEST-2 is currently open for enrollment.

Keywords: CPI-0610, JAKi treatment-naïve, MANIFEST-2, myelofibrosis, pelabresib

Procedia PDF Downloads 201

92 Prevalence of Pretreatment Drug HIV-1 Mutations in Moscow, Russia

Authors: Daria Zabolotnaya, Svetlana Degtyareva, Veronika Kanestri, Danila Konnov

Abstract:

An adequate choice of the initial antiretroviral treatment determines the treatment efficacy. In the clinical guidelines in Russia non-nucleoside reverse transcriptase inhibitors (NNRTIs) are still considered to be an option for first-line treatment while pretreatment drug resistance (PDR) testing is not routinely performed. We conducted a cohort retrospective study in HIV-positive treatment naïve patients of the H-clinic (Moscow, Russia) who performed PDR testing from July 2017 to November 2021. All the information was obtained from the medical records anonymously. We analyzed the mutations in reverse transcriptase and protease genes. RT-sequences were obtained by AmpliSens HIV-Resist-Seq kit. Drug resistance was defined using the HIVdb Program v. 8.9-1. PDR was estimated using the Stanford algorithm. Descriptive statistics were performed in Excel (Microsoft Office, 2019). A total of 261 HIV-1 infected patients were enrolled in the study including 197 (75.5%) male and 64 (24.5%) female. The mean age was 34.6±8.3 years. The median CD4 count – 521 cells/µl (IQR 367-687 cells/µl). Data on risk factors of HIV-infection were scarce. The total quantity of strains containing mutations in the reverse transcriptase gene was 75 (28.7%). From these 5 (1.9%) mutations were associated with PDR to nucleoside reverse transcriptase inhibitors (NRTIs) and 30 (11.5%) – with PDR to NNRTIs. The number of strains with mutations in protease gene was 43 (16.5%), from these only 3 (1.1%) mutations were associated with resistance to protease inhibitors. For NNRTIs the most prevalent PDR mutations were E138A, V106I. Most of the HIV variants exhibited a single PDR mutation, 2 were found in 3 samples. Most of HIV variants with PDR mutation displayed a single drug class resistance mutation. 2/37 (5.4%) strains had both NRTIs and NNRTIs mutations. There were no strains identified with PDR mutations to all three drug classes. Though earlier data demonstrated a lower level of PDR in HIV treatment naïve population in Russia and our cohort can be not fully representative as it is taken from the private clinic, it reflects the trend of increasing PDR especially to NNRTIs. Therefore, we consider either pretreatment testing or giving the priority to other drugs as first-line treatment necessary.

Keywords: HIV, resistance, mutations, treatment

Procedia PDF Downloads 94

91 Evaluation of Modern Natural Language Processing Techniques via Measuring a Company's Public Perception

Authors: Burak Oksuzoglu, Savas Yildirim, Ferhat Kutlu

Abstract:

Opinion mining (OM) is one of the natural language processing (NLP) problems to determine the polarity of opinions, mostly represented on a positive-neutral-negative axis. The data for OM is usually collected from various social media platforms. In an era where social media has considerable control over companies’ futures, it’s worth understanding social media and taking actions accordingly. OM comes to the fore here as the scale of the discussion about companies increases, and it becomes unfeasible to gauge opinion on individual levels. Thus, the companies opt to automize this process by applying machine learning (ML) approaches to their data. For the last two decades, OM or sentiment analysis (SA) has been mainly performed by applying ML classification algorithms such as support vector machines (SVM) and Naïve Bayes to a bag of n-gram representations of textual data. With the advent of deep learning and its apparent success in NLP, traditional methods have become obsolete. Transfer learning paradigm that has been commonly used in computer vision (CV) problems started to shape NLP approaches and language models (LM) lately. This gave a sudden rise to the usage of the pretrained language model (PTM), which contains language representations that are obtained by training it on the large datasets using self-supervised learning objectives. The PTMs are further fine-tuned by a specialized downstream task dataset to produce efficient models for various NLP tasks such as OM, NER (Named-Entity Recognition), Question Answering (QA), and so forth. In this study, the traditional and modern NLP approaches have been evaluated for OM by using a sizable corpus belonging to a large private company containing about 76,000 comments in Turkish: SVM with a bag of n-grams, and two chosen pre-trained models, multilingual universal sentence encoder (MUSE) and bidirectional encoder representations from transformers (BERT). The MUSE model is a multilingual model that supports 16 languages, including Turkish, and it is based on convolutional neural networks. The BERT is a monolingual model in our case and transformers-based neural networks. It uses a masked language model and next sentence prediction tasks that allow the bidirectional training of the transformers. During the training phase of the architecture, pre-processing operations such as morphological parsing, stemming, and spelling correction was not used since the experiments showed that their contribution to the model performance was found insignificant even though Turkish is a highly agglutinative and inflective language. The results show that usage of deep learning methods with pre-trained models and fine-tuning achieve about 11% improvement over SVM for OM. The BERT model achieved around 94% prediction accuracy while the MUSE model achieved around 88% and SVM did around 83%. The MUSE multilingual model shows better results than SVM, but it still performs worse than the monolingual BERT model.

Keywords: BERT, MUSE, opinion mining, pretrained language model, SVM, Turkish

Procedia PDF Downloads 146

90 Application of MALDI-MS to Differentiate SARS-CoV-2 and Non-SARS-CoV-2 Symptomatic Infections in the Early and Late Phases of the Pandemic

Authors: Dmitriy Babenko, Sergey Yegorov, Ilya Korshukov, Aidana Sultanbekova, Valentina Barkhanskaya, Tatiana Bashirova, Yerzhan Zhunusov, Yevgeniya Li, Viktoriya Parakhina, Svetlana Kolesnichenko, Yeldar Baiken, Aruzhan Pralieva, Zhibek Zhumadilova, Matthew S. Miller, Gonzalo H. Hortelano, Anar Turmuhambetova, Antonella E. Chesca, Irina Kadyrova

Abstract:

Introduction: The rapidly evolving COVID-19 pandemic, along with the re-emergence of pathogens causing acute respiratory infections (ARI), has necessitated the development of novel diagnostic tools to differentiate various causes of ARI. MALDI-MS, due to its wide usage and affordability, has been proposed as a potential instrument for diagnosing SARS-CoV-2 versus non-SARS-CoV-2 ARI. The aim of this study was to investigate the potential of MALDI-MS in conjunction with a machine learning model to accurately distinguish between symptomatic infections caused by SARS-CoV-2 and non-SARS-CoV-2 during both the early and later phases of the pandemic. Furthermore, this study aimed to analyze mass spectrometry (MS) data obtained from nasal swabs of healthy individuals. Methods: We gathered mass spectra from 252 samples, comprising 108 SARS-CoV-2-positive samples obtained in 2020 (Covid 2020), 7 SARS-CoV- 2-positive samples obtained in 2023 (Covid 2023), 71 samples from symptomatic individuals without SARS-CoV-2 (Control non-Covid ARVI), and 66 samples from healthy individuals (Control healthy). All the samples were subjected to RT-PCR testing. For data analysis, we employed the caret R package to train and test seven machine-learning algorithms: C5.0, KNN, NB, RF, SVM-L, SVM-R, and XGBoost. We conducted a training process using a five-fold (outer) nested repeated (five times) ten-fold (inner) cross-validation with a randomized stratified splitting approach. Results: In this study, we utilized the Covid 2020 dataset as a case group and the non-Covid ARVI dataset as a control group to train and test various machine learning (ML) models. Among these models, XGBoost and SVM-R demonstrated the highest performance, with accuracy values of 0.97 [0.93, 0.97] and 0.95 [0.95; 0.97], specificity values of 0.86 [0.71; 0.93] and 0.86 [0.79; 0.87], and sensitivity values of 0.984 [0.984; 1.000] and 1.000 [0.968; 1.000], respectively. When examining the Covid 2023 dataset, the Naive Bayes model achieved the highest classification accuracy of 43%, while XGBoost and SVM-R achieved accuracies of 14%. For the healthy control dataset, the accuracy of the models ranged from 0.27 [0.24; 0.32] for k-nearest neighbors to 0.44 [0.41; 0.45] for the Support Vector Machine with a radial basis function kernel. Conclusion: Therefore, ML models trained on MALDI MS of nasopharyngeal swabs obtained from patients with Covid during the initial phase of the pandemic, as well as symptomatic non-Covid individuals, showed excellent classification performance, which aligns with the results of previous studies. However, when applied to swabs from healthy individuals and a limited sample of patients with Covid in the late phase of the pandemic, ML models exhibited lower classification accuracy.

Keywords: SARS-CoV-2, MALDI-TOF MS, ML models, nasopharyngeal swabs, classification

Procedia PDF Downloads 108

89 Lung Icams and Vcam-1 in Innate and Adaptive Immunity to Influenza Infections: Implications for Vaccination Strategies

Authors: S. Kozlovski, S.W. Feigelson, R. Alon

Abstract:

The b2 integrin ligands ICAM-1 ICAM-2 and the endothelial VLA-4 integrin ligand VCAM-1 are constitutively expressed on different lung vessels and on high endothelial venules (HEVs), the main portal for lymphocyte entry from the blood into lung draining lymph nodes. ICAMs are also ubiquitously expressed by many antigen-presenting leukocytes and have been traditionally suggested as critical for the various antigen-specific immune synapses generated by these distinct leukocytes and specific naïve and effector T cells. Loss of both ICAM-1 and ICAM-2 on the lung vasculature reduces the ability to patrol monocytes and Tregs to patrol the lung vasculature at a steady state. Our new findings suggest, however, that in terms of innate leukocyte trafficking into the lung lamina propria, both constitutively expressed and virus-induced vascular VCAM-1 can functionally compensate for the loss of these ICAMs. In a mouse model for influenza infection, neutrophil and NK cell recruitment and clearance of influenza remained normal in mice deficient in both ICAMs. Strikingly, mice deficient in both ICAMs also mounted normal influenza-specific CD8 proliferation and differentiation. In addition, these mice normally combated secondary influenza infection, indicating that the presence of ICAMs on conventional dendritic cells (cDCs) that present viral antigens are not required for immune synapse formation between these APCs and naïve CD8 T cells as previously suggested. Furthermore, long-lasting humoral responses critical for protection from a secondary homosubtypic influenza infection were also normal in mice deficient in both ICAM-1 and ICAM-2. Collectively, our results suggest that the expression of ICAM-1 and ICAM-2 on lung endothelial and epithelial cells, as well as on DCs and B cells, is not critical for the generation of innate or adaptive anti-viral immunity in the lungs. Our findings also suggest that endothelial VCAM-1 can substitute for the functions of vascular ICAMs in leukocyte trafficking into various lung compartments.

Keywords: emigration, ICAM-1, lymph nodes, VCAM-1

Procedia PDF Downloads 128

88 Seismic Perimeter Surveillance System (Virtual Fence) for Threat Detection and Characterization Using Multiple ML Based Trained Models in Weighted Ensemble Voting

Authors: Vivek Mahadev, Manoj Kumar, Neelu Mathur, Brahm Dutt Pandey

Abstract:

Perimeter guarding and protection of critical installations require prompt intrusion detection and assessment to take effective countermeasures. Currently, visual and electronic surveillance are the primary methods used for perimeter guarding. These methods can be costly and complicated, requiring careful planning according to the location and terrain. Moreover, these methods often struggle to detect stealthy and camouflaged insurgents. The object of the present work is to devise a surveillance technique using seismic sensors that overcomes the limitations of existing systems. The aim is to improve intrusion detection, assessment, and characterization by utilizing seismic sensors. Most of the similar systems have only two types of intrusion detection capability viz., human or vehicle. In our work we could even categorize further to identify types of intrusion activity such as walking, running, group walking, fence jumping, tunnel digging and vehicular movements. A virtual fence of 60 meters at GCNEP, Bahadurgarh, Haryana, India, was created by installing four underground geophones at a distance of 15 meters each. The signals received from these geophones are then processed to find unique seismic signatures called features. Various feature optimization and selection methodologies, such as LightGBM, Boruta, Random Forest, Logistics, Recursive Feature Elimination, Chi-2 and Pearson Ratio were used to identify the best features for training the machine learning models. The trained models were developed using algorithms such as supervised support vector machine (SVM) classifier, kNN, Decision Tree, Logistic Regression, Naïve Bayes, and Artificial Neural Networks. These models were then used to predict the category of events, employing weighted ensemble voting to analyze and combine their results. The models were trained with 1940 training events and results were evaluated with 831 test events. It was observed that using the weighted ensemble voting increased the efficiency of predictions. In this study we successfully developed and deployed the virtual fence using geophones. Since these sensors are passive, do not radiate any energy and are installed underground, it is impossible for intruders to locate and nullify them. Their flexibility, quick and easy installation, low costs, hidden deployment and unattended surveillance make such systems especially suitable for critical installations and remote facilities with difficult terrain. This work demonstrates the potential of utilizing seismic sensors for creating better perimeter guarding and protection systems using multiple machine learning models in weighted ensemble voting. In this study the virtual fence achieved an intruder detection efficiency of over 97%.

Keywords: geophone, seismic perimeter surveillance, machine learning, weighted ensemble method

Procedia PDF Downloads 78

87 Confidence Intervals for Quantiles in the Two-Parameter Exponential Distributions with Type II Censored Data

Authors: Ayman Baklizi

Abstract:

Based on type II censored data, we consider interval estimation of the quantiles of the two-parameter exponential distribution and the difference between the quantiles of two independent two-parameter exponential distributions. We derive asymptotic intervals, Bayesian, as well as intervals based on the generalized pivot variable. We also include some bootstrap intervals in our comparisons. The performance of these intervals is investigated in terms of their coverage probabilities and expected lengths.

Keywords: asymptotic intervals, Bayes intervals, bootstrap, generalized pivot variables, two-parameter exponential distribution, quantiles

Procedia PDF Downloads 415

86 IL-23, an Inflammatory Cytokine, Decreased by Shark Cartilage and Vitamin A Oral Treatment in Patient with Gastric Cancer

Authors: Razieh Zarei, Hassan zm, Abolghasem Ajami, Darush Moslemi, Narges Afsary, Amrollah Mostafa-zade

Abstract:

Introduction: IL-23 is responsible for the differentiation and expansion of Th17/ThIL-17 cells from naive CD4+ T cells. Therefore, may be IL-23/IL17 axis involve in a variety of allergic and autoimmune diseases, such as RA, MS, inflammatory bowel disease (IBD), and asthma. TGF-β is also share for the differentiation Th17 producing IL-17 and CD4+CD25+Foxp3hiT regulatory cells from naïve CD4+ T cells which are involved in the regulation of immune response, maintaining immunological self-tolerance and immune homeostasis ,and the control of autoimmunity and cancer surveillance. Therefore, T regulatory cells play a key role in autoimmunity, allergy, cancer, infectious disease, and the induction of transplantation tolerance. Vitamin A and it's derivatives (retinoids) inhibit or reverse the carcinogenic process in some types of cancers in oral cavity,head and neck, breast, skin, liver, and blood cells. Shark is a murine organism and its cartilage has antitumor peptides to prevent angiogenesis, in vitro. Our purpose is whether simultaneous oral treatment vitamin A and shark cartilage can modulate IL-23/IL-17 and CD4CD25Foxp3 T regulatory cell/TGF-β pathways and Th1/Th2 immunity in patients with gastric cancer. Materials and Methods: First investigated an imbalanced supernatant of cytokines exist in patients with gastric cancer by ELISA. Associated with cytokines measuring such as IL-23,IL-17,TGF-β,IL-4 and γ-IFN, then flow cytometry was employed to determine whether the peripheral blood mononuclear cells such as CD4+CD25+Foxp3highT regulatory cells in patients with gastric cancer were changed correspondingly. Results: An imbalance between IL-17 secretion and TGF-β/Foxp3 t regulatory cell pathway and so, Th1 immunity (γ-IFN production) and TH2 immunity (IL-4 secretion) was not seen in patients with gastric cancer treated by vitamin A and shark cartilage. But, the simultaneously presented down-regulation of IL-23 indicated, at least cytokine level. Conclusion: Il-23, as a pro-angiogenesis cytokine, probably, help to tumor growth. Hence, suggested that down-regulation of IL-23, at least cytokine level, is useful for anti-tumor immune responses in patients with gastric cancer.

Keywords: IL-23/IL17 axis, TGF-β/CD4CD25Foxp3 T regulatory pathway, γ-IFN, IL-4, shark cartilage and gastric cancer

Procedia PDF Downloads 395

85 Review of Different Machine Learning Algorithms

Authors: Syed Romat Ali Shah, Bilal Shoaib, Saleem Akhtar, Munib Ahmad, Shahan Sadiqui

Abstract:

Classification is a data mining technique, which is recognizedon Machine Learning (ML) algorithm. It is used to classifythe individual articlein a knownofinformation into a set of predefinemodules or group. Web mining is also a portion of that sympathetic of data mining methods. The main purpose of this paper to analysis and compare the performance of Naïve Bayse Algorithm, Decision Tree, K-Nearest Neighbor (KNN), Artificial Neural Network (ANN)and Support Vector Machine (SVM). This paper consists of different ML algorithm and their advantages and disadvantages and also define research issues.

Keywords: Data Mining, Web Mining, classification, ML Algorithms

Procedia PDF Downloads 303

84 Forecasting Regional Data Using Spatial Vars

Authors: Taisiia Gorshkova

Abstract:

Since the 1980s, spatial correlation models have been used more often to model regional indicators. An increasingly popular method for studying regional indicators is modeling taking into account spatial relationships between objects that are part of the same economic zone. In 2000s the new class of model – spatial vector autoregressions was developed. The main difference between standard and spatial vector autoregressions is that in the spatial VAR (SpVAR), the values of indicators at time t may depend on the values of explanatory variables at the same time t in neighboring regions and on the values of explanatory variables at time t-k in neighboring regions. Thus, VAR is a special case of SpVAR in the absence of spatial lags, and the spatial panel data model is a special case of spatial VAR in the absence of time lags. Two specifications of SpVAR were applied to Russian regional data for 2000-2017. The values of GRP and regional CPI are used as endogenous variables. The lags of GRP, CPI and the unemployment rate were used as explanatory variables. For comparison purposes, the standard VAR without spatial correlation was used as “naïve” model. In the first specification of SpVAR the unemployment rate and the values of depending variables, GRP and CPI, in neighboring regions at the same moment of time t were included in equations for GRP and CPI respectively. To account for the values of indicators in neighboring regions, the adjacency weight matrix is used, in which regions with a common sea or land border are assigned a value of 1, and the rest - 0. In the second specification the values of depending variables in neighboring regions at the moment of time t were replaced by these values in the previous time moment t-1. According to the results obtained, when inflation and GRP of neighbors are added into the model both inflation and GRP are significantly affected by their previous values, and inflation is also positively affected by an increase in unemployment in the previous period and negatively affected by an increase in GRP in the previous period, which corresponds to economic theory. GRP is not affected by either the inflation lag or the unemployment lag. When the model takes into account lagged values of GRP and inflation in neighboring regions, the results of inflation modeling are practically unchanged: all indicators except the unemployment lag are significant at a 5% significance level. For GRP, in turn, GRP lags in neighboring regions also become significant at a 5% significance level. For both spatial and “naïve” VARs the RMSE were calculated. The minimum RMSE are obtained via SpVAR with lagged explanatory variables. Thus, according to the results of the study, it can be concluded that SpVARs can accurately model both the actual values of macro indicators (particularly CPI and GRP) and the general situation in the regions

Keywords: forecasting, regional data, spatial econometrics, vector autoregression

Procedia PDF Downloads 141

83 A Computer-Aided System for Tooth Shade Matching

Authors: Zuhal Kurt, Meral Kurt, Bilge T. Bal, Kemal Ozkan

Abstract:

Shade matching and reproduction is the most important element of success in prosthetic dentistry. Until recently, shade matching procedure was implemented by dentists visual perception with the help of shade guides. Since many factors influence visual perception; tooth shade matching using visual devices (shade guides) is highly subjective and inconsistent. Subjective nature of this process has lead to the development of instrumental devices. Nowadays, colorimeters, spectrophotometers, spectroradiometers and digital image analysing systems are used for instrumental shade selection. Instrumental devices have advantages that readings are quantifiable, can obtain more rapidly and simply, objectively and precisely. However, these devices have noticeable drawbacks. For example, translucent structure and irregular surfaces of teeth lead to defects on measurement with these devices. Also between the results acquired by devices with different measurement principles may make inconsistencies. So, its obligatory to search for new methods for dental shade matching process. A computer-aided system device; digital camera has developed rapidly upon today. Currently, advances in image processing and computing have resulted in the extensive use of digital cameras for color imaging. This procedure has a much cheaper process than the use of traditional contact-type color measurement devices. Digital cameras can be taken by the place of contact-type instruments for shade selection and overcome their disadvantages. Images taken from teeth show morphology and color texture of teeth. In last decades, a new method was recommended to compare the color of shade tabs taken by a digital camera using color features. This method showed that visual and computer-aided shade matching systems should be used as concatenated. Recently using methods of feature extraction techniques are based on shape description and not used color information. However, color is mostly experienced as an essential property in depicting and extracting features from objects in the world around us. When local feature descriptors with color information are extended by concatenating color descriptor with the shape descriptor, that descriptor will be effective on visual object recognition and classification task. Therefore, the color descriptor is to be used in combination with a shape descriptor it does not need to contain any spatial information, which leads us to use local histograms. This local color histogram method is remain reliable under variation of photometric changes, geometrical changes and variation of image quality. So, coloring local feature extraction methods are used to extract features, and also the Scale Invariant Feature Transform (SIFT) descriptor used to for shape description in the proposed method. After the combination of these descriptors, the state-of-art descriptor named by Color-SIFT will be used in this study. Finally, the image feature vectors obtained from quantization algorithm are fed to classifiers such as Nearest Neighbor (KNN), Naive Bayes or Support Vector Machines (SVM) to determine label(s) of the visual object category or matching. In this study, SVM are used as classifiers for color determination and shade matching. Finally, experimental results of this method will be compared with other recent studies. It is concluded from the study that the proposed method is remarkable development on computer aided tooth shade determination system.

Keywords: classifiers, color determination, computer-aided system, tooth shade matching, feature extraction

Procedia PDF Downloads 444

82 A Human Activity Recognition System Based on Sensory Data Related to Object Usage

Authors: M. Abdullah, Al-Wadud

Abstract:

Sensor-based activity recognition systems usually accounts which sensors have been activated to perform an activity. The system then combines the conditional probabilities of those sensors to represent different activities and takes the decision based on that. However, the information about the sensors which are not activated may also be of great help in deciding which activity has been performed. This paper proposes an approach where the sensory data related to both usage and non-usage of objects are utilized to make the classification of activities. Experimental results also show the promising performance of the proposed method.

Keywords: Naïve Bayesian, based classification, activity recognition, sensor data, object-usage model

Procedia PDF Downloads 322

81 Machine Learning Methods for Network Intrusion Detection

Authors: Mouhammad Alkasassbeh, Mohammad Almseidin

Abstract:

Network security engineers work to keep services available all the time by handling intruder attacks. Intrusion Detection System (IDS) is one of the obtainable mechanisms that is used to sense and classify any abnormal actions. Therefore, the IDS must be always up to date with the latest intruder attacks signatures to preserve confidentiality, integrity, and availability of the services. The speed of the IDS is a very important issue as well learning the new attacks. This research work illustrates how the Knowledge Discovery and Data Mining (or Knowledge Discovery in Databases) KDD dataset is very handy for testing and evaluating different Machine Learning Techniques. It mainly focuses on the KDD preprocess part in order to prepare a decent and fair experimental data set. The J48, MLP, and Bayes Network classifiers have been chosen for this study. It has been proven that the J48 classifier has achieved the highest accuracy rate for detecting and classifying all KDD dataset attacks, which are of type DOS, R2L, U2R, and PROBE.

Keywords: IDS, DDoS, MLP, KDD

Procedia PDF Downloads 234

80 Predicting Groundwater Areas Using Data Mining Techniques: Groundwater in Jordan as Case Study

Authors: Faisal Aburub, Wael Hadi

Abstract:

Data mining is the process of extracting useful or hidden information from a large database. Extracted information can be used to discover relationships among features, where data objects are grouped according to logical relationships; or to predict unseen objects to one of the predefined groups. In this paper, we aim to investigate four well-known data mining algorithms in order to predict groundwater areas in Jordan. These algorithms are Support Vector Machines (SVMs), Naïve Bayes (NB), K-Nearest Neighbor (kNN) and Classification Based on Association Rule (CBA). The experimental results indicate that the SVMs algorithm outperformed other algorithms in terms of classification accuracy, precision and F1 evaluation measures using the datasets of groundwater areas that were collected from Jordanian Ministry of Water and Irrigation.

Keywords: classification, data mining, evaluation measures, groundwater

Procedia PDF Downloads 280

79 EEG-Based Screening Tool for School Student’s Brain Disorders Using Machine Learning Algorithms

Authors: Abdelrahman A. Ramzy, Bassel S. Abdallah, Mohamed E. Bahgat, Sarah M. Abdelkader, Sherif H. ElGohary

Abstract:

Attention-Deficit/Hyperactivity Disorder (ADHD), epilepsy, and autism affect millions of children worldwide, many of which are undiagnosed despite the fact that all of these disorders are detectable in early childhood. Late diagnosis can cause severe problems due to the late treatment and to the misconceptions and lack of awareness as a whole towards these disorders. Moreover, electroencephalography (EEG) has played a vital role in the assessment of neural function in children. Therefore, quantitative EEG measurement will be utilized as a tool for use in the evaluation of patients who may have ADHD, epilepsy, and autism. We propose a screening tool that uses EEG signals and machine learning algorithms to detect these disorders at an early age in an automated manner. The proposed classifiers used with epilepsy as a step taken for the work done so far, provided an accuracy of approximately 97% using SVM, Naïve Bayes and Decision tree, while 98% using KNN, which gives hope for the work yet to be conducted.

Keywords: ADHD, autism, epilepsy, EEG, SVM

Procedia PDF Downloads 190

78 Modeling of System Availability and Bayesian Analysis of Bivariate Distribution

Authors: Muhammad Farooq, Ahtasham Gul

Abstract:

To meet the desired standard, it is important to monitor and analyze different engineering processes to get desired output. The bivariate distributions got a lot of attention in recent years to describe the randomness of natural as well as artificial mechanisms. In this article, a bivariate model is constructed using two independent models developed by the nesting approach to study the effect of each component on reliability for better understanding. Further, the Bayes analysis of system availability is studied by considering prior parametric variations in the failure time and repair time distributions. Basic statistical characteristics of marginal distribution, like mean median and quantile function, are discussed. We use inverse Gamma prior to study its frequentist properties by conducting Monte Carlo Markov Chain (MCMC) sampling scheme.

Keywords: reliability, system availability Weibull, inverse Lomax, Monte Carlo Markov Chain, Bayesian

Procedia PDF Downloads 71

77 Point Estimation for the Type II Generalized Logistic Distribution Based on Progressively Censored Data

Authors: Rana Rimawi, Ayman Baklizi

Abstract:

Skewed distributions are important models that are frequently used in applications. Generalized distributions form a class of skewed distributions and gain widespread use in applications because of their flexibility in data analysis. More specifically, the Generalized Logistic Distribution with its different types has received considerable attention recently. In this study, based on progressively type-II censored data, we will consider point estimation in type II Generalized Logistic Distribution (Type II GLD). We will develop several estimators for its unknown parameters, including maximum likelihood estimators (MLE), Bayes estimators and linear estimators (BLUE). The estimators will be compared using simulation based on the criteria of bias and Mean square error (MSE). An illustrative example of a real data set will be given.

Keywords: point estimation, type II generalized logistic distribution, progressive censoring, maximum likelihood estimation

Procedia PDF Downloads 198