Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 292

Search results for: Bayes estimator

232 Parameter Estimation for Contact Tracing in Graph-Based Models

Authors: Augustine Okolie, Johannes Müller, Mirjam Kretzchmar

Abstract:

We adopt a maximum-likelihood framework to estimate parameters of a stochastic susceptible-infected-recovered (SIR) model with contact tracing on a rooted random tree. Given the number of detectees per index case, our estimator allows to determine the degree distribution of the random tree as well as the tracing probability. Since we do not discover all infectees via contact tracing, this estimation is non-trivial. To keep things simple and stable, we develop an approximation suited for realistic situations (contract tracing probability small, or the probability for the detection of index cases small). In this approximation, the only epidemiological parameter entering the estimator is the basic reproduction number R0. The estimator is tested in a simulation study and applied to covid-19 contact tracing data from India. The simulation study underlines the efficiency of the method. For the empirical covid-19 data, we are able to compare different degree distributions and perform a sensitivity analysis. We find that particularly a power-law and a negative binomial degree distribution meet the data well and that the tracing probability is rather large. The sensitivity analysis shows no strong dependency on the reproduction number.

Keywords: stochastic SIR model on graph, contact tracing, branching process, parameter inference

Procedia PDF Downloads 46

231 A Supervised Approach for Detection of Singleton Spam Reviews

Authors: Atefeh Heydari, Mohammadali Tavakoli, Naomie Salim

Abstract:

In recent years, we have witnessed that online reviews are the most important source of customers’ opinion. They are progressively more used by individuals and organisations to make purchase and business decisions. Unfortunately, for the reason of profit or fame, frauds produce deceptive reviews to hoodwink potential customers. Their activities mislead not only potential customers to make appropriate purchasing decisions and organisations to reshape their business, but also opinion mining techniques by preventing them from reaching accurate results. Spam reviews could be divided into two main groups, i.e. multiple and singleton spam reviews. Detecting a singleton spam review that is the only review written by a user ID is extremely challenging due to lack of clue for detection purposes. Singleton spam reviews are very harmful and various features and proofs used in multiple spam reviews detection are not applicable in this case. Current research aims to propose a novel supervised technique to detect singleton spam reviews. To achieve this, various features are proposed in this study and are to be combined with the most appropriate features extracted from literature and employed in a classifier. In order to compare the performance of different classifiers, SVM and naive Bayes classification algorithms were used for model building. The results revealed that SVM was more accurate than naive Bayes and our proposed technique is capable to detect singleton spam reviews effectively.

Keywords: classification algorithms, Naïve Bayes, opinion review spam detection, singleton review spam detection, support vector machine

Procedia PDF Downloads 278

230 Estimation of Population Mean under Random Non-Response in Two-Phase Successive Sampling

Authors: M. Khalid, G. N. Singh

Abstract:

In this paper, we have considered the problem of estimation for population mean, on current (second) occasion in the presence of random non response in two-occasion successive sampling under two phase set-up. Modified exponential type estimators have been proposed, and their properties are studied under the assumptions that numbers of sampling units follow a distribution due to random non response situations. The performances of the proposed estimators are compared with linear combinations of two estimators, (a) sample mean estimator for fresh sample and (b) ratio estimator for matched sample under the complete response situations. Results are demonstrated through empirical studies which present the effectiveness of the proposed estimators. Suitable recommendations have been made to the survey practitioners.

Keywords: successive sampling, random non-response, auxiliary variable, bias, mean square error

Procedia PDF Downloads 488

229 Local Interpretable Model-agnostic Explanations (LIME) Approach to Email Spam Detection

Authors: Rohini Hariharan, Yazhini R., Blessy Maria Mathew

Abstract:

The task of detecting email spam is a very important one in the era of digital technology that needs effective ways of curbing unwanted messages. This paper presents an approach aimed at making email spam categorization algorithms transparent, reliable and more trustworthy by incorporating Local Interpretable Model-agnostic Explanations (LIME). Our technique assists in providing interpretable explanations for specific classifications of emails to help users understand the decision-making process by the model. In this study, we developed a complete pipeline that incorporates LIME into the spam classification framework and allows creating simplified, interpretable models tailored to individual emails. LIME identifies influential terms, pointing out key elements that drive classification results, thus reducing opacity inherent in conventional machine learning models. Additionally, we suggest a visualization scheme for displaying keywords that will improve understanding of categorization decisions by users. We test our method on a diverse email dataset and compare its performance with various baseline models, such as Gaussian Naive Bayes, Multinomial Naive Bayes, Bernoulli Naive Bayes, Support Vector Classifier, K-Nearest Neighbors, Decision Tree, and Logistic Regression. Our testing results show that our model surpasses all other models, achieving an accuracy of 96.59% and a precision of 99.12%.

Keywords: text classification, LIME (local interpretable model-agnostic explanations), stemming, tokenization, logistic regression.

Procedia PDF Downloads 16

228 Early Stage Suicide Ideation Detection Using Supervised Machine Learning and Neural Network Classifier

Authors: Devendra Kr Tayal, Vrinda Gupta, Aastha Bansal, Khushi Singh, Sristi Sharma, Hunny Gaur

Abstract:

In today's world, suicide is a serious problem. In order to save lives, early suicide attempt detection and prevention should be addressed. A good number of at-risk people utilize social media platforms to talk about their issues or find knowledge on related chores. Twitter and Reddit are two of the most common platforms that are used for expressing oneself. Extensive research has already been done in this field. Through supervised classification techniques like Nave Bayes, Bernoulli Nave Bayes, and Multiple Layer Perceptron on a Reddit dataset, we demonstrate the early recognition of suicidal ideation. We also performed comparative analysis on these approaches and used accuracy, recall score, F1 score, and precision score for analysis.

Keywords: machine learning, suicide ideation detection, supervised classification, natural language processing

Procedia PDF Downloads 62

227 Normalizing Logarithms of Realized Volatility in an ARFIMA Model

Authors: G. L. C. Yap

Abstract:

Modelling realized volatility with high-frequency returns is popular as it is an unbiased and efficient estimator of return volatility. A computationally simple model is fitting the logarithms of the realized volatilities with a fractionally integrated long-memory Gaussian process. The Gaussianity assumption simplifies the parameter estimation using the Whittle approximation. Nonetheless, this assumption may not be met in the finite samples and there may be a need to normalize the financial series. Based on the empirical indices S&P500 and DAX, this paper examines the performance of the linear volatility model pre-treated with normalization compared to its existing counterpart. The empirical results show that by including normalization as a pre-treatment procedure, the forecast performance outperforms the existing model in terms of statistical and economic evaluations.

Keywords: Gaussian process, long-memory, normalization, value-at-risk, volatility, Whittle estimator

Procedia PDF Downloads 329

226 Investigating the Efficiency of Stratified Double Median Ranked Set Sample for Estimating the Population Mean

Authors: Mahmoud I. Syam

Abstract:

Stratified double median ranked set sampling (SDMRSS) method is suggested for estimating the population mean. The SDMRSS is compared with the simple random sampling (SRS), stratified simple random sampling (SSRS), and stratified ranked set sampling (SRSS). It is shown that SDMRSS estimator is an unbiased of the population mean and more efficient than SRS, SSRS, and SRSS. Also, by SDMRSS, we can increase the efficiency of mean estimator for specific value of the sample size. SDMRSS is applied on real life examples, and the results of the example agreed the theoretical results.

Keywords: efficiency, double ranked set sampling, median ranked set sampling, ranked set sampling, stratified

Procedia PDF Downloads 217

225 Estimation of a Finite Population Mean under Random Non Response Using Improved Nadaraya and Watson Kernel Weights

Authors: Nelson Bii, Christopher Ouma, John Odhiambo

Abstract:

Non-response is a potential source of errors in sample surveys. It introduces bias and large variance in the estimation of finite population parameters. Regression models have been recognized as one of the techniques of reducing bias and variance due to random non-response using auxiliary data. In this study, it is assumed that random non-response occurs in the survey variable in the second stage of cluster sampling, assuming full auxiliary information is available throughout. Auxiliary information is used at the estimation stage via a regression model to address the problem of random non-response. In particular, the auxiliary information is used via an improved Nadaraya-Watson kernel regression technique to compensate for random non-response. The asymptotic bias and mean squared error of the estimator proposed are derived. Besides, a simulation study conducted indicates that the proposed estimator has smaller values of the bias and smaller mean squared error values compared to existing estimators of finite population mean. The proposed estimator is also shown to have tighter confidence interval lengths at a 95% coverage rate. The results obtained in this study are useful, for instance, in choosing efficient estimators of the finite population mean in demographic sample surveys.

Keywords: mean squared error, random non-response, two-stage cluster sampling, confidence interval lengths

Procedia PDF Downloads 107

224 A Study of Permission-Based Malware Detection Using Machine Learning

Authors: Ratun Rahman, Rafid Islam, Akin Ahmed, Kamrul Hasan, Hasan Mahmud

Abstract:

Malware is becoming more prevalent, and several threat categories have risen dramatically in recent years. This paper provides a bird's-eye view of the world of malware analysis. The efficiency of five different machine learning methods (Naive Bayes, K-Nearest Neighbor, Decision Tree, Random Forest, and TensorFlow Decision Forest) combined with features picked from the retrieval of Android permissions to categorize applications as harmful or benign is investigated in this study. The test set consists of 1,168 samples (among these android applications, 602 are malware and 566 are benign applications), each consisting of 948 features (permissions). Using the permission-based dataset, the machine learning algorithms then produce accuracy rates above 80%, except the Naive Bayes Algorithm with 65% accuracy. Of the considered algorithms TensorFlow Decision Forest performed the best with an accuracy of 90%.

Keywords: android malware detection, machine learning, malware, malware analysis

Procedia PDF Downloads 122

223 Aliasing Free and Additive Error in Spectra for Alpha Stable Signals

Authors: R. Sabre

Abstract:

This work focuses on the symmetric alpha stable process with continuous time frequently used in modeling the signal with indefinitely growing variance, often observed with an unknown additive error. The objective of this paper is to estimate this error from discrete observations of the signal. For that, we propose a method based on the smoothing of the observations via Jackson polynomial kernel and taking into account the width of the interval where the spectral density is non-zero. This technique allows avoiding the “Aliasing phenomenon” encountered when the estimation is made from the discrete observations of a process with continuous time. We have studied the convergence rate of the estimator and have shown that the convergence rate improves in the case where the spectral density is zero at the origin. Thus, we set up an estimator of the additive error that can be subtracted for approaching the original signal without error.

Keywords: spectral density, stable processes, aliasing, non parametric

Procedia PDF Downloads 104

222 Population Size Estimation Based on the GPD

Authors: O. Anan, D. Böhning, A. Maruotti

Abstract:

The purpose of the study is to estimate the elusive target population size under a truncated count model that accounts for heterogeneity. The purposed estimator is based on the generalized Poisson distribution (GPD), which extends the Poisson distribution by adding a dispersion parameter. Thus, it becomes an useful model for capture-recapture data where concurrent events are not homogeneous. In addition, it can account for over-dispersion and under-dispersion. The ratios of neighboring frequency counts are used as a tool for investigating the validity of whether generalized Poisson or Poisson distribution. Since capture-recapture approaches do not provide the zero counts, the estimated parameters can be achieved by modifying the EM-algorithm technique for the zero-truncated generalized Poisson distribution. The properties and the comparative performance of proposed estimator were investigated through simulation studies. Furthermore, some empirical examples are represented insights on the behavior of the estimators.

Keywords: capture, recapture methods, ratio plot, heterogeneous population, zero-truncated count

Procedia PDF Downloads 415

221 A Kolmogorov-Smirnov Type Goodness-Of-Fit Test of Multinomial Logistic Regression Model in Case-Control Studies

Authors: Chen Li-Ching

Abstract:

The multinomial logistic regression model is used popularly for inferring the relationship of risk factors and disease with multiple categories. This study based on the discrepancy between the nonparametric maximum likelihood estimator and semiparametric maximum likelihood estimator of the cumulative distribution function to propose a Kolmogorov-Smirnov type test statistic to assess adequacy of the multinomial logistic regression model for case-control data. A bootstrap procedure is presented to calculate the critical value of the proposed test statistic. Empirical type I error rates and powers of the test are performed by simulation studies. Some examples will be illustrated the implementation of the test.

Keywords: case-control studies, goodness-of-fit test, Kolmogorov-Smirnov test, multinomial logistic regression

Procedia PDF Downloads 427

220 Using Arellano-Bover/Blundell-Bond Estimator in Dynamic Panel Data Analysis – Case of Finnish Housing Price Dynamics

Authors: Janne Engblom, Elias Oikarinen

Abstract:

A panel dataset is one that follows a given sample of individuals over time, and thus provides multiple observations on each individual in the sample. Panel data models include a variety of fixed and random effects models which form a wide range of linear models. A special case of panel data models are dynamic in nature. A complication regarding a dynamic panel data model that includes the lagged dependent variable is endogeneity bias of estimates. Several approaches have been developed to account for this problem. In this paper, the panel models were estimated using the Arellano-Bover/Blundell-Bond Generalized method of moments (GMM) estimator which is an extension of the Arellano-Bond model where past values and different transformations of past values of the potentially problematic independent variable are used as instruments together with other instrumental variables. The Arellano–Bover/Blundell–Bond estimator augments Arellano–Bond by making an additional assumption that first differences of instrument variables are uncorrelated with the fixed effects. This allows the introduction of more instruments and can dramatically improve efficiency. It builds a system of two equations—the original equation and the transformed one—and is also known as system GMM. In this study, Finnish housing price dynamics were examined empirically by using the Arellano–Bover/Blundell–Bond estimation technique together with ordinary OLS. The aim of the analysis was to provide a comparison between conventional fixed-effects panel data models and dynamic panel data models. The Arellano–Bover/Blundell–Bond estimator is suitable for this analysis for a number of reasons: It is a general estimator designed for situations with 1) a linear functional relationship; 2) one left-hand-side variable that is dynamic, depending on its own past realizations; 3) independent variables that are not strictly exogenous, meaning they are correlated with past and possibly current realizations of the error; 4) fixed individual effects; and 5) heteroskedasticity and autocorrelation within individuals but not across them. Based on data of 14 Finnish cities over 1988-2012 differences of short-run housing price dynamics estimates were considerable when different models and instrumenting were used. Especially, the use of different instrumental variables caused variation of model estimates together with their statistical significance. This was particularly clear when comparing estimates of OLS with different dynamic panel data models. Estimates provided by dynamic panel data models were more in line with theory of housing price dynamics.

Keywords: dynamic model, fixed effects, panel data, price dynamics

Procedia PDF Downloads 1431

219 The Usage of Bridge Estimator for Hegy Seasonal Unit Root Tests

Authors: Huseyin Guler, Cigdem Kosar

Abstract:

The aim of this study is to propose Bridge estimator for seasonal unit root tests. Seasonality is an important factor for many economic time series. Some variables may contain seasonal patterns and forecasts that ignore important seasonal patterns have a high variance. Therefore, it is very important to eliminate seasonality for seasonal macroeconomic data. There are some methods to eliminate the impacts of seasonality in time series. One of them is filtering the data. However, this method leads to undesired consequences in unit root tests, especially if the data is generated by a stochastic seasonal process. Another method to eliminate seasonality is using seasonal dummy variables. Some seasonal patterns may result from stationary seasonal processes, which are modelled using seasonal dummies but if there is a varying and changing seasonal pattern over time, so the seasonal process is non-stationary, deterministic seasonal dummies are inadequate to capture the seasonal process. It is not suitable to use seasonal dummies for modeling such seasonally nonstationary series. Instead of that, it is necessary to take seasonal difference if there are seasonal unit roots in the series. Different alternative methods are proposed in the literature to test seasonal unit roots, such as Dickey, Hazsa, Fuller (DHF) and Hylleberg, Engle, Granger, Yoo (HEGY) tests. HEGY test can be also used to test the seasonal unit root in different frequencies (monthly, quarterly, and semiannual). Another issue in unit root tests is the lag selection. Lagged dependent variables are added to the model in seasonal unit root tests as in the unit root tests to overcome the autocorrelation problem. In this case, it is necessary to choose the lag length and determine any deterministic components (i.e., a constant and trend) first, and then use the proper model to test for seasonal unit roots. However, this two-step procedure might lead size distortions and lack of power in seasonal unit root tests. Recent studies show that Bridge estimators are good in selecting optimal lag length while differentiating nonstationary versus stationary models for nonseasonal data. The advantage of this estimator is the elimination of the two-step nature of conventional unit root tests and this leads a gain in size and power. In this paper, the Bridge estimator is proposed to test seasonal unit roots in a HEGY model. A Monte-Carlo experiment is done to determine the efficiency of this approach and compare the size and power of this method with HEGY test. Since Bridge estimator performs well in model selection, our approach may lead to some gain in terms of size and power over HEGY test.

Keywords: bridge estimators, HEGY test, model selection, seasonal unit root

Procedia PDF Downloads 298

218 Evaluation of Sensor Pattern Noise Estimators for Source Camera Identification

Authors: Benjamin Anderson-Sackaney, Amr Abdel-Dayem

Abstract:

This paper presents a comprehensive survey of recent source camera identification (SCI) systems. Then, the performance of various sensor pattern noise (SPN) estimators was experimentally assessed, under common photo response non-uniformity (PRNU) frameworks. The experiments used 1350 natural and 900 flat-field images, captured by 18 individual cameras. 12 different experiments, grouped into three sets, were conducted. The results were analyzed using the receiver operator characteristic (ROC) curves. The experimental results demonstrated that combining the basic SPN estimator with a wavelet-based filtering scheme provides promising results. However, the phase SPN estimator fits better with both patch-based (BM3D) and anisotropic diffusion (AD) filtering schemes.

Keywords: sensor pattern noise, source camera identification, photo response non-uniformity, anisotropic diffusion, peak to correlation energy ratio

Procedia PDF Downloads 414

217 Comparing Xbar Charts: Conventional versus Reweighted Robust Estimation Methods for Univariate Data Sets

Authors: Ece Cigdem Mutlu, Burak Alakent

Abstract:

Maintaining the quality of manufactured products at a desired level depends on the stability of process dispersion and location parameters and detection of perturbations in these parameters as promptly as possible. Shewhart control chart is the most widely used technique in statistical process monitoring to monitor the quality of products and control process mean and variability. In the application of Xbar control charts, sample standard deviation and sample mean are known to be the most efficient conventional estimators in determining process dispersion and location parameters, respectively, based on the assumption of independent and normally distributed datasets. On the other hand, there is no guarantee that the real-world data would be normally distributed. In the cases of estimated process parameters from Phase I data clouded with outliers, efficiency of traditional estimators is significantly reduced, and performance of Xbar charts are undesirably low, e.g. occasional outliers in the rational subgroups in Phase I data set may considerably affect the sample mean and standard deviation, resulting a serious delay in detection of inferior products in Phase II. For more efficient application of control charts, it is required to use robust estimators against contaminations, which may exist in Phase I. In the current study, we present a simple approach to construct robust Xbar control charts using average distance to the median, Qn-estimator of scale, M-estimator of scale with logistic psi-function in the estimation of process dispersion parameter, and Harrell-Davis qth quantile estimator, Hodge-Lehmann estimator and M-estimator of location with Huber psi-function and logistic psi-function in the estimation of process location parameter. Phase I efficiency of proposed estimators and Phase II performance of Xbar charts constructed from these estimators are compared with the conventional mean and standard deviation statistics both under normality and against diffuse-localized and symmetric-asymmetric contaminations using 50,000 Monte Carlo simulations on MATLAB. Consequently, it is found that robust estimators yield parameter estimates with higher efficiency against all types of contaminations, and Xbar charts constructed using robust estimators have higher power in detecting disturbances, compared to conventional methods. Additionally, utilizing individuals charts to screen outlier subgroups and employing different combination of dispersion and location estimators on subgroups and individual observations are found to improve the performance of Xbar charts.

Keywords: average run length, M-estimators, quality control, robust estimators

Procedia PDF Downloads 166

216 Safety Effect of Smart Right-Turn Design at Intersections

Authors: Upal Barua

Abstract:

The risk of severe crashes at high-speed right-turns at intersections is a major safety concern these days. The application of a smart right-turn at an intersection is increasing day by day to address is an issue. The design, ‘Smart Right-turn’ consists of a narrow-angle of channelization at approximately 70°. This design increases the cone of vision of the right-tuning drivers towards the crossing pedestrians as well as traffic on the cross-road. As part of the Safety Improvement Program in Austin Transportation Department, several smart right-turns were constructed at high crash intersections where high-speed right-turns were found to be a contributing factor. This paper features the state of the art techniques applied in planning, engineering, designing and construction of this smart right-turn, key factors driving the success, and lessons learned in the process. This paper also presents the significant crash reductions achieved from the application of this smart right-turn design using Empirical Bayes method. The result showed that smart right-turns can reduce overall right-turn crashes by 43% and severe right-turn crashes by 70%.

Keywords: smart right-turn, intersection, cone of vision, empirical Bayes method

Procedia PDF Downloads 234

215 Optimal Linear Quadratic Digital Tracker for the Discrete-Time Proper System with an Unknown Disturbance

Authors: Jason Sheng-Hong Tsai, Faezeh Ebrahimzadeh, Min-Ching Chung, Shu-Mei Guo, Leang-San Shieh, Tzong-Jiy Tsai, Li Wang

Abstract:

In this paper, we first construct a new state and disturbance estimator using discrete-time proportional plus integral observer to estimate the system state and the unknown external disturbance for the discrete-time system with an input-to-output direct-feedthrough term. Then, the generalized optimal linear quadratic digital tracker design is applied to construct a proportional plus integral observer-based tracker for the system with an unknown external disturbance to have a desired tracking performance. Finally, a numerical simulation is given to demonstrate the effectiveness of the new application of our proposed approach.

Keywords: non-minimum phase system, optimal linear quadratic tracker, proportional plus integral observer, state and disturbance estimator

Procedia PDF Downloads 479

214 Sentiment Analysis of Ensemble-Based Classifiers for E-Mail Data

Authors: Muthukumarasamy Govindarajan

Abstract:

Detection of unwanted, unsolicited mails called spam from email is an interesting area of research. It is necessary to evaluate the performance of any new spam classifier using standard data sets. Recently, ensemble-based classifiers have gained popularity in this domain. In this research work, an efficient email filtering approach based on ensemble methods is addressed for developing an accurate and sensitive spam classifier. The proposed approach employs Naive Bayes (NB), Support Vector Machine (SVM) and Genetic Algorithm (GA) as base classifiers along with different ensemble methods. The experimental results show that the ensemble classifier was performing with accuracy greater than individual classifiers, and also hybrid model results are found to be better than the combined models for the e-mail dataset. The proposed ensemble-based classifiers turn out to be good in terms of classification accuracy, which is considered to be an important criterion for building a robust spam classifier.

Keywords: accuracy, arcing, bagging, genetic algorithm, Naive Bayes, sentiment mining, support vector machine

Procedia PDF Downloads 112

213 X̄ and S Control Charts based on Weighted Standard Deviation Method

Authors: Derya Karagöz

Abstract:

A Shewhart chart based on normality assumption is not appropriate for skewed distributions since its Type-I error rate is inflated. This study presents X̄ and S control charts for monitoring the process variability for skewed distributions. We propose Weighted Standard Deviation (WSD) X̄ and S control charts. Standard deviation estimator is applied to monitor the process variability for estimating the process standard deviation, in the case of the W SD X̄ and S control charts as this estimator is simple and easy to compute. Unlike the Shewhart control chart, the proposed charts provide asymmetric limits in accordance with the direction and degree of skewness to construct the upper and lower limits. The performances of the proposed charts are compared with other heuristic charts for skewed distributions by using Simulation study. The Simulation studies show that the proposed control charts have good properties for skewed distributions and large sample sizes.

Keywords: weighted standard deviation, MAD, skewed distributions, S control charts

Procedia PDF Downloads 368

212 A Probabilistic Theory of the Buy-Low and Sell-High for Algorithmic Trading

Authors: Peter Shi

Abstract:

Algorithmic trading is a rapidly expanding domain within quantitative finance, constituting a substantial portion of trading volumes in the US financial market. The demand for rigorous and robust mathematical theories underpinning these trading algorithms is ever-growing. In this study, the author establishes a new stock market model that integrates the Efficient Market Hypothesis and the statistical arbitrage. The model, for the first time, finds probabilistic relations between the rational price and the market price in terms of the conditional expectation. The theory consequently leads to a mathematical justification of the old market adage: buy-low and sell-high. The thresholds for “low” and “high” are precisely derived using a max-min operation on Bayes’s error. This explicit connection harmonizes the Efficient Market Hypothesis and Statistical Arbitrage, demonstrating their compatibility in explaining market dynamics. The amalgamation represents a pioneering contribution to quantitative finance. The study culminates in comprehensive numerical tests using historical market data, affirming that the “buy-low” and “sell-high” algorithm derived from this theory significantly outperforms the general market over the long term in four out of six distinct market environments.

Keywords: efficient market hypothesis, behavioral finance, Bayes' decision, algorithmic trading, risk control, stock market

Procedia PDF Downloads 44

211 Applied Complement of Probability and Information Entropy for Prediction in Student Learning

Authors: Kennedy Efosa Ehimwenma, Sujatha Krishnamoorthy, Safiya Al‑Sharji

Abstract:

The probability computation of events is in the interval of [0, 1], which are values that are determined by the number of outcomes of events in a sample space S. The probability Pr(A) that an event A will never occur is 0. The probability Pr(B) that event B will certainly occur is 1. This makes both events A and B a certainty. Furthermore, the sum of probabilities Pr(E₁) + Pr(E₂) + … + Pr(Eₙ) of a finite set of events in a given sample space S equals 1. Conversely, the difference of the sum of two probabilities that will certainly occur is 0. This paper first discusses Bayes, the complement of probability, and the difference of probability for occurrences of learning-events before applying them in the prediction of learning objects in student learning. Given the sum of 1; to make a recommendation for student learning, this paper proposes that the difference of argMaxPr(S) and the probability of student-performance quantifies the weight of learning objects for students. Using a dataset of skill-set, the computational procedure demonstrates i) the probability of skill-set events that have occurred that would lead to higher-level learning; ii) the probability of the events that have not occurred that requires subject-matter relearning; iii) accuracy of the decision tree in the prediction of student performance into class labels and iv) information entropy about skill-set data and its implication on student cognitive performance and recommendation of learning.

Keywords: complement of probability, Bayes’ rule, prediction, pre-assessments, computational education, information theory

Procedia PDF Downloads 127

210 New Estimation in Autoregressive Models with Exponential White Noise by Using Reversible Jump MCMC Algorithm

Authors: Suparman Suparman

Abstract:

A white noise in autoregressive (AR) model is often assumed to be normally distributed. In application, the white noise usually do not follows a normal distribution. This paper aims to estimate a parameter of AR model that has a exponential white noise. A Bayesian method is adopted. A prior distribution of the parameter of AR model is selected and then this prior distribution is combined with a likelihood function of data to get a posterior distribution. Based on this posterior distribution, a Bayesian estimator for the parameter of AR model is estimated. Because the order of AR model is considered a parameter, this Bayesian estimator cannot be explicitly calculated. To resolve this problem, a method of reversible jump Markov Chain Monte Carlo (MCMC) is adopted. A result is a estimation of the parameter AR model can be simultaneously calculated.

Keywords: autoregressive (AR) model, exponential white Noise, bayesian, reversible jump Markov Chain Monte Carlo (MCMC)

Procedia PDF Downloads 327

209 A Nonlinear Dynamical System with Application

Authors: Abdullah Eqal Al Mazrooei

Abstract:

In this paper, a nonlinear dynamical system is presented. This system is a bilinear class. The bilinear systems are very important kind of nonlinear systems because they have many applications in real life. They are used in biology, chemistry, manufacturing, engineering, and economics where linear models are ineffective or inadequate. They have also been recently used to analyze and forecast weather conditions. Bilinear systems have three advantages: First, they define many problems which have a great applied importance. Second, they give us approximations to nonlinear systems. Thirdly, they have a rich geometric and algebraic structures, which promises to be a fruitful field of research for scientists and applications. The type of nonlinearity that is treated and analyzed consists of bilinear interaction between the states vectors and the system input. By using some properties of the tensor product, these systems can be transformed to linear systems. But, here we discuss the nonlinearity when the state vector is multiplied by itself. So, this model will be able to handle evolutions according to the Lotka-Volterra models or the Lorenz weather models, thus enabling a wider and more flexible application of such models. Here we apply by using an estimator to estimate temperatures. The results prove the efficiency of the proposed system.

Keywords: Lorenz models, nonlinear systems, nonlinear estimator, state-space model

Procedia PDF Downloads 233

208 A Novel Software Model for Enhancement of System Performance and Security through an Optimal Placement of PMU and FACTS

Authors: R. Kiran, B. R. Lakshmikantha, R. V. Parimala

Abstract:

Secure operation of power systems requires monitoring of the system operating conditions. Phasor measurement units (PMU) are the device, which uses synchronized signals from the GPS satellites, and provide the phasors information of voltage and currents at a given substation. The optimal locations for the PMUs must be determined, in order to avoid redundant use of PMUs. The objective of this paper is to make system observable by using minimum number of PMUs & the implementation of stability software at 22OkV grid for on-line estimation of the power system transfer capability based on voltage and thermal limitations and for security monitoring. This software utilizes State Estimator (SE) and synchrophasor PMU data sets for determining the power system operational margin under normal and contingency conditions. This software improves security of transmission system by continuously monitoring operational margin expressed in MW or in bus voltage angles, and alarms the operator if the margin violates a pre-defined threshold.

Keywords: state estimator (SE), flexible ac transmission systems (FACTS), optimal location, phasor measurement units (PMU)

Procedia PDF Downloads 384

207 Weighted Rank Regression with Adaptive Penalty Function

Authors: Kang-Mo Jung

Abstract:

The use of regularization for statistical methods has become popular. The least absolute shrinkage and selection operator (LASSO) framework has become the standard tool for sparse regression. However, it is well known that the LASSO is sensitive to outliers or leverage points. We consider a new robust estimation which is composed of the weighted loss function of the pairwise difference of residuals and the adaptive penalty function regulating the tuning parameter for each variable. Rank regression is resistant to regression outliers, but not to leverage points. By adopting a weighted loss function, the proposed method is robust to leverage points of the predictor variable. Furthermore, the adaptive penalty function gives us good statistical properties in variable selection such as oracle property and consistency. We develop an efficient algorithm to compute the proposed estimator using basic functions in program R. We used an optimal tuning parameter based on the Bayesian information criterion (BIC). Numerical simulation shows that the proposed estimator is effective for analyzing real data set and contaminated data.

Keywords: adaptive penalty function, robust penalized regression, variable selection, weighted rank regression

Procedia PDF Downloads 430

206 Predictive Analysis of the Stock Price Market Trends with Deep Learning

Authors: Suraj Mehrotra

Abstract:

The stock market is a volatile, bustling marketplace that is a cornerstone of economics. It defines whether companies are successful or in spiral. A thorough understanding of it is important - many companies have whole divisions dedicated to analysis of both their stock and of rivaling companies. Linking the world of finance and artificial intelligence (AI), especially the stock market, has been a relatively recent development. Predicting how stocks will do considering all external factors and previous data has always been a human task. With the help of AI, however, machine learning models can help us make more complete predictions in financial trends. Taking a look at the stock market specifically, predicting the open, closing, high, and low prices for the next day is very hard to do. Machine learning makes this task a lot easier. A model that builds upon itself that takes in external factors as weights can predict trends far into the future. When used effectively, new doors can be opened up in the business and finance world, and companies can make better and more complete decisions. This paper explores the various techniques used in the prediction of stock prices, from traditional statistical methods to deep learning and neural networks based approaches, among other methods. It provides a detailed analysis of the techniques and also explores the challenges in predictive analysis. For the accuracy of the testing set, taking a look at four different models - linear regression, neural network, decision tree, and naïve Bayes - on the different stocks, Apple, Google, Tesla, Amazon, United Healthcare, Exxon Mobil, J.P. Morgan & Chase, and Johnson & Johnson, the naïve Bayes model and linear regression models worked best. For the testing set, the naïve Bayes model had the highest accuracy along with the linear regression model, followed by the neural network model and then the decision tree model. The training set had similar results except for the fact that the decision tree model was perfect with complete accuracy in its predictions, which makes sense. This means that the decision tree model likely overfitted the training set when used for the testing set.

Keywords: machine learning, testing set, artificial intelligence, stock analysis

Procedia PDF Downloads 57

205 Median-Based Nonparametric Estimation of Returns in Mean-Downside Risk Portfolio Frontier

Authors: H. Ben Salah, A. Gannoun, C. de Peretti, A. Trabelsi

Abstract:

The Downside Risk (DSR) model for portfolio optimisation allows to overcome the drawbacks of the classical mean-variance model concerning the asymetry of returns and the risk perception of investors. This model optimization deals with a positive definite matrix that is endogenous with respect to portfolio weights. This aspect makes the problem far more difficult to handle. For this purpose, Athayde (2001) developped a new recurcive minimization procedure that ensures the convergence to the solution. However, when a finite number of observations is available, the portfolio frontier presents an appearance which is not very smooth. In order to overcome that, Athayde (2003) proposed a mean kernel estimation of the returns, so as to create a smoother portfolio frontier. This technique provides an effect similar to the case in which we had continuous observations. In this paper, taking advantage on the the robustness of the median, we replace the mean estimator in Athayde's model by a nonparametric median estimator of the returns. Then, we give a new version of the former algorithm (of Athayde (2001, 2003)). We eventually analyse the properties of this improved portfolio frontier and apply this new method on real examples.

Keywords: Downside Risk, Kernel Method, Median, Nonparametric Estimation, Semivariance

Procedia PDF Downloads 459

204 Environmentally Adaptive Acoustic Echo Suppression for Barge-in Speech Recognition

Authors: Jong Han Joo, Jung Hoon Lee, Young Sun Kim, Jae Young Kang, Seung Ho Choi

Abstract:

In this study, we propose a novel technique for acoustic echo suppression (AES) during speech recognition under barge-in conditions. Conventional AES methods based on spectral subtraction apply fixed weights to the estimated echo path transfer function (EPTF) at the current signal segment and to the EPTF estimated until the previous time interval. We propose a new approach that adaptively updates weight parameters in response to abrupt changes in the acoustic environment due to background noises or double-talk. Furthermore, we devised a voice activity detector and an initial time-delay estimator for barge-in speech recognition in communication networks. The initial time delay is estimated using log-spectral distance measure, as well as cross-correlation coefficients. The experimental results show that the developed techniques can be successfully applied in barge-in speech recognition systems.

Keywords: acoustic echo suppression, barge-in, speech recognition, echo path transfer function, initial delay estimator, voice activity detector

Procedia PDF Downloads 342

203 An Integrated Lightweight Naïve Bayes Based Webpage Classification Service for Smartphone Browsers

Authors: Mayank Gupta, Siba Prasad Samal, Vasu Kakkirala

Abstract:

The internet world and its priorities have changed considerably in the last decade. Browsing on smart phones has increased manifold and is set to explode much more. Users spent considerable time browsing different websites, that gives a great deal of insight into user’s preferences. Instead of plain information classifying different aspects of browsing like Bookmarks, History, and Download Manager into useful categories would improve and enhance the user’s experience. Most of the classification solutions are server side that involves maintaining server and other heavy resources. It has security constraints and maybe misses on contextual data during classification. On device, classification solves many such problems, but the challenge is to achieve accuracy on classification with resource constraints. This on device classification can be much more useful in personalization, reducing dependency on cloud connectivity and better privacy/security. This approach provides more relevant results as compared to current standalone solutions because it uses content rendered by browser which is customized by the content provider based on user’s profile. This paper proposes a Naive Bayes based lightweight classification engine targeted for a resource constraint devices. Our solution integrates with Web Browser that in turn triggers classification algorithm. Whenever a user browses a webpage, this solution extracts DOM Tree data from the browser’s rendering engine. This DOM data is a dynamic, contextual and secure data that can’t be replicated. This proposal extracts different features of the webpage that runs on an algorithm to classify into multiple categories. Naive Bayes based engine is chosen in this solution for its inherent advantages in using limited resources compared to other classification algorithms like Support Vector Machine, Neural Networks, etc. Naive Bayes classification requires small memory footprint and less computation suitable for smartphone environment. This solution has a feature to partition the model into multiple chunks that in turn will facilitate less usage of memory instead of loading a complete model. Classification of the webpages done through integrated engine is faster, more relevant and energy efficient than other standalone on device solution. This classification engine has been tested on Samsung Z3 Tizen hardware. The Engine is integrated into Tizen Browser that uses Chromium Rendering Engine. For this solution, extensive dataset is sourced from dmoztools.net and cleaned. This cleaned dataset has 227.5K webpages which are divided into 8 generic categories ('education', 'games', 'health', 'entertainment', 'news', 'shopping', 'sports', 'travel'). Our browser integrated solution has resulted in 15% less memory usage (due to partition method) and 24% less power consumption in comparison with standalone solution. This solution considered 70% of the dataset for training the data model and the rest 30% dataset for testing. An average accuracy of ~96.3% is achieved across the above mentioned 8 categories. This engine can be further extended for suggesting Dynamic tags and using the classification for differential uses cases to enhance browsing experience.

Keywords: chromium, lightweight engine, mobile computing, Naive Bayes, Tizen, web browser, webpage classification

Procedia PDF Downloads 135