Search results for: Bayes estimator
Commenced in January 2007
Frequency: Monthly
Edition: International
Paper Count: 292

Search results for: Bayes estimator

202 Detecting Venomous Files in IDS Using an Approach Based on Data Mining Algorithm

Authors: Sukhleen Kaur

Abstract:

In security groundwork, Intrusion Detection System (IDS) has become an important component. The IDS has received increasing attention in recent years. IDS is one of the effective way to detect different kinds of attacks and malicious codes in a network and help us to secure the network. Data mining techniques can be implemented to IDS, which analyses the large amount of data and gives better results. Data mining can contribute to improving intrusion detection by adding a level of focus to anomaly detection. So far the study has been carried out on finding the attacks but this paper detects the malicious files. Some intruders do not attack directly, but they hide some harmful code inside the files or may corrupt those file and attack the system. These files are detected according to some defined parameters which will form two lists of files as normal files and harmful files. After that data mining will be performed. In this paper a hybrid classifier has been used via Naive Bayes and Ripper classification methods. The results show how the uploaded file in the database will be tested against the parameters and then it is characterised as either normal or harmful file and after that the mining is performed. Moreover, when a user tries to mine on harmful file it will generate an exception that mining cannot be made on corrupted or harmful files.

Keywords: data mining, association, classification, clustering, decision tree, intrusion detection system, misuse detection, anomaly detection, naive Bayes, ripper

Procedia PDF Downloads 389
201 Home Range and Spatial Interaction Modelling of Black Bears

Authors: Fekadu L. Bayisa, Elvan Ceyhan, Todd D. Steury

Abstract:

Interaction between individuals within the same species is an important component of population dynamics. An interaction can be either static (based on spatial overlap) or dynamic (based on movement interactions). Using GPS collar data, we can quantify both static and dynamic interactions between black bears. The goal of this work is to determine the level of black bear interactions using the 95% and 50% home ranges, as well as to model black bear spatial interactions, which could be attraction, avoidance/repulsion, or a lack of interaction at all, to gain new insights and improve our understanding of ecological processes. Recent methodological developments in home range estimation, inhomogeneous multitype/cross-type summary statistics, and envelope testing methods are explored to study the nature of black bear interactions. Our findings, in general, indicate that the black bears of one type in our data set tend to cluster around another type.

Keywords: autocorrelated kernel density estimator, cross-type summary function, inhomogeneous multitype Poisson process, kernel density estimator, minimum convex polygon, pointwise and global envelope tests

Procedia PDF Downloads 49
200 Carbohydrate Intake Estimation in Type I Diabetic Patients Described by UVA/Padova Model

Authors: David A. Padilla, Rodolfo Villamizar

Abstract:

In recent years, closed loop control strategies have been developed in order to establish a healthy glucose profile in type 1 diabetic mellitus (T1DM) patients. However, the controller itself is unable to define a suitable reference trajectory for glucose. In this paper, a control strategy Is proposed where the shape of the reference trajectory is generated bases in the amount of carbohydrates present during the digestive process, due to the effect of carbohydrate intake. Since there no exists a sensor to measure the amount of carbohydrates consumed, an estimator is proposed. Thus this paper presents the entire process of designing a carbohydrate estimator, which allows estimate disturbance for a predictive controller (MPC) in a T1MD patient, the estimation will be used to establish a profile of reference and improve the response of the controller by providing the estimated information of ingested carbohydrates. The dynamics of the diabetic model used are due to the equations described by the UVA/Padova model of the T1DMS simulator, the system was developed and simulated in Simulink, taking into account the noise and limitations of the glucose control system actuators.

Keywords: estimation, glucose control, predictive controller, MPC, UVA/Padova

Procedia PDF Downloads 229
199 A Comparative Study of Additive and Nonparametric Regression Estimators and Variable Selection Procedures

Authors: Adriano Z. Zambom, Preethi Ravikumar

Abstract:

One of the biggest challenges in nonparametric regression is the curse of dimensionality. Additive models are known to overcome this problem by estimating only the individual additive effects of each covariate. However, if the model is misspecified, the accuracy of the estimator compared to the fully nonparametric one is unknown. In this work the efficiency of completely nonparametric regression estimators such as the Loess is compared to the estimators that assume additivity in several situations, including additive and non-additive regression scenarios. The comparison is done by computing the oracle mean square error of the estimators with regards to the true nonparametric regression function. Then, a backward elimination selection procedure based on the Akaike Information Criteria is proposed, which is computed from either the additive or the nonparametric model. Simulations show that if the additive model is misspecified, the percentage of time it fails to select important variables can be higher than that of the fully nonparametric approach. A dimension reduction step is included when nonparametric estimator cannot be computed due to the curse of dimensionality. Finally, the Boston housing dataset is analyzed using the proposed backward elimination procedure and the selected variables are identified.

Keywords: additive model, nonparametric regression, variable selection, Akaike Information Criteria

Procedia PDF Downloads 240
198 Kernel-Based Double Nearest Proportion Feature Extraction for Hyperspectral Image Classification

Authors: Hung-Sheng Lin, Cheng-Hsuan Li

Abstract:

Over the past few years, kernel-based algorithms have been widely used to extend some linear feature extraction methods such as principal component analysis (PCA), linear discriminate analysis (LDA), and nonparametric weighted feature extraction (NWFE) to their nonlinear versions, kernel principal component analysis (KPCA), generalized discriminate analysis (GDA), and kernel nonparametric weighted feature extraction (KNWFE), respectively. These nonlinear feature extraction methods can detect nonlinear directions with the largest nonlinear variance or the largest class separability based on the given kernel function. Moreover, they have been applied to improve the target detection or the image classification of hyperspectral images. The double nearest proportion feature extraction (DNP) can effectively reduce the overlap effect and have good performance in hyperspectral image classification. The DNP structure is an extension of the k-nearest neighbor technique. For each sample, there are two corresponding nearest proportions of samples, the self-class nearest proportion and the other-class nearest proportion. The term “nearest proportion” used here consider both the local information and other more global information. With these settings, the effect of the overlap between the sample distributions can be reduced. Usually, the maximum likelihood estimator and the related unbiased estimator are not ideal estimators in high dimensional inference problems, particularly in small data-size situation. Hence, an improved estimator by shrinkage estimation (regularization) is proposed. Based on the DNP structure, LDA is included as a special case. In this paper, the kernel method is applied to extend DNP to kernel-based DNP (KDNP). In addition to the advantages of DNP, KDNP surpasses DNP in the experimental results. According to the experiments on the real hyperspectral image data sets, the classification performance of KDNP is better than that of PCA, LDA, NWFE, and their kernel versions, KPCA, GDA, and KNWFE.

Keywords: feature extraction, kernel method, double nearest proportion feature extraction, kernel double nearest feature extraction

Procedia PDF Downloads 298
197 Optimization of Hate Speech and Abusive Language Detection on Indonesian-language Twitter using Genetic Algorithms

Authors: Rikson Gultom

Abstract:

Hate Speech and Abusive language on social media is difficult to detect, usually, it is detected after it becomes viral in cyberspace, of course, it is too late for prevention. An early detection system that has a fairly good accuracy is needed so that it can reduce conflicts that occur in society caused by postings on social media that attack individuals, groups, and governments in Indonesia. The purpose of this study is to find an early detection model on Twitter social media using machine learning that has high accuracy from several machine learning methods studied. In this study, the support vector machine (SVM), Naïve Bayes (NB), and Random Forest Decision Tree (RFDT) methods were compared with the Support Vector machine with genetic algorithm (SVM-GA), Nave Bayes with genetic algorithm (NB-GA), and Random Forest Decision Tree with Genetic Algorithm (RFDT-GA). The study produced a comparison table for the accuracy of the hate speech and abusive language detection model, and presented it in the form of a graph of the accuracy of the six algorithms developed based on the Indonesian-language Twitter dataset, and concluded the best model with the highest accuracy.

Keywords: abusive language, hate speech, machine learning, optimization, social media

Procedia PDF Downloads 100
196 Performance Comparison of Situation-Aware Models for Activating Robot Vacuum Cleaner in a Smart Home

Authors: Seongcheol Kwon, Jeongmin Kim, Kwang Ryel Ryu

Abstract:

We assume an IoT-based smart-home environment where the on-off status of each of the electrical appliances including the room lights can be recognized in a real time by monitoring and analyzing the smart meter data. At any moment in such an environment, we can recognize what the household or the user is doing by referring to the status data of the appliances. In this paper, we focus on a smart-home service that is to activate a robot vacuum cleaner at right time by recognizing the user situation, which requires a situation-aware model that can distinguish the situations that allow vacuum cleaning (Yes) from those that do not (No). We learn as our candidate models a few classifiers such as naïve Bayes, decision tree, and logistic regression that can map the appliance-status data into Yes and No situations. Our training and test data are obtained from simulations of user behaviors, in which a sequence of user situations such as cooking, eating, dish washing, and so on is generated with the status of the relevant appliances changed in accordance with the situation changes. During the simulation, both the situation transition and the resulting appliance status are determined stochastically. To compare the performances of the aforementioned classifiers we obtain their learning curves for different types of users through simulations. The result of our empirical study reveals that naïve Bayes achieves a slightly better classification accuracy than the other compared classifiers.

Keywords: situation-awareness, smart home, IoT, machine learning, classifier

Procedia PDF Downloads 392
195 Charting Sentiments with Naive Bayes and Logistic Regression

Authors: Jummalla Aashrith, N. L. Shiva Sai, K. Bhavya Sri

Abstract:

The swift progress of web technology has not only amassed a vast reservoir of internet data but also triggered a substantial surge in data generation. The internet has metamorphosed into one of the dynamic hubs for online education, idea dissemination, as well as opinion-sharing. Notably, the widely utilized social networking platform Twitter is experiencing considerable expansion, providing users with the ability to share viewpoints, participate in discussions spanning diverse communities, and broadcast messages on a global scale. The upswing in online engagement has sparked a significant curiosity in subjective analysis, particularly when it comes to Twitter data. This research is committed to delving into sentiment analysis, focusing specifically on the realm of Twitter. It aims to offer valuable insights into deciphering information within tweets, where opinions manifest in a highly unstructured and diverse manner, spanning a spectrum from positivity to negativity, occasionally punctuated by neutrality expressions. Within this document, we offer a comprehensive exploration and comparative assessment of modern approaches to opinion mining. Employing a range of machine learning algorithms such as Naive Bayes and Logistic Regression, our investigation plunges into the domain of Twitter data streams. We delve into overarching challenges and applications inherent in the realm of subjectivity analysis over Twitter.

Keywords: machine learning, sentiment analysis, visualisation, python

Procedia PDF Downloads 22
194 Parkinson’s Disease Detection Analysis through Machine Learning Approaches

Authors: Muhtasim Shafi Kader, Fizar Ahmed, Annesha Acharjee

Abstract:

Machine learning and data mining are crucial in health care, as well as medical information and detection. Machine learning approaches are now being utilized to improve awareness of a variety of critical health issues, including diabetes detection, neuron cell tumor diagnosis, COVID 19 identification, and so on. Parkinson’s disease is basically a disease for our senior citizens in Bangladesh. Parkinson's Disease indications often seem progressive and get worst with time. People got affected trouble walking and communicating with the condition advances. Patients can also have psychological and social vagaries, nap problems, hopelessness, reminiscence loss, and weariness. Parkinson's disease can happen in both men and women. Though men are affected by the illness at a proportion that is around partial of them are women. In this research, we have to get out the accurate ML algorithm to find out the disease with a predictable dataset and the model of the following machine learning classifiers. Therefore, nine ML classifiers are secondhand to portion study to use machine learning approaches like as follows, Naive Bayes, Adaptive Boosting, Bagging Classifier, Decision Tree Classifier, Random Forest classifier, XBG Classifier, K Nearest Neighbor Classifier, Support Vector Machine Classifier, and Gradient Boosting Classifier are used.

Keywords: naive bayes, adaptive boosting, bagging classifier, decision tree classifier, random forest classifier, XBG classifier, k nearest neighbor classifier, support vector classifier, gradient boosting classifier

Procedia PDF Downloads 99
193 Breast Cancer Survivability Prediction via Classifier Ensemble

Authors: Mohamed Al-Badrashiny, Abdelghani Bellaachia

Abstract:

This paper presents a classifier ensemble approach for predicting the survivability of the breast cancer patients using the latest database version of the Surveillance, Epidemiology, and End Results (SEER) Program of the National Cancer Institute. The system consists of two main components; features selection and classifier ensemble components. The features selection component divides the features in SEER database into four groups. After that it tries to find the most important features among the four groups that maximizes the weighted average F-score of a certain classification algorithm. The ensemble component uses three different classifiers, each of which models different set of features from SEER through the features selection module. On top of them, another classifier is used to give the final decision based on the output decisions and confidence scores from each of the underlying classifiers. Different classification algorithms have been examined; the best setup found is by using the decision tree, Bayesian network, and Na¨ıve Bayes algorithms for the underlying classifiers and Na¨ıve Bayes for the classifier ensemble step. The system outperforms all published systems to date when evaluated against the exact same data of SEER (period of 1973-2002). It gives 87.39% weighted average F-score compared to 85.82% and 81.34% of the other published systems. By increasing the data size to cover the whole database (period of 1973-2014), the overall weighted average F-score jumps to 92.4% on the held out unseen test set.

Keywords: classifier ensemble, breast cancer survivability, data mining, SEER

Procedia PDF Downloads 293
192 The Reproducibility and Repeatability of Modified Likelihood Ratio for Forensics Handwriting Examination

Authors: O. Abiodun Adeyinka, B. Adeyemo Adesesan

Abstract:

The forensic use of handwriting depends on the analysis, comparison, and evaluation decisions made by forensic document examiners. When using biometric technology in forensic applications, it is necessary to compute Likelihood Ratio (LR) for quantifying strength of evidence under two competing hypotheses, namely the prosecution and the defense hypotheses wherein a set of assumptions and methods for a given data set will be made. It is therefore important to know how repeatable and reproducible our estimated LR is. This paper evaluated the accuracy and reproducibility of examiners' decisions. Confidence interval for the estimated LR were presented so as not get an incorrect estimate that will be used to deliver wrong judgment in the court of Law. The estimate of LR is fundamentally a Bayesian concept and we used two LR estimators, namely Logistic Regression (LoR) and Kernel Density Estimator (KDE) for this paper. The repeatability evaluation was carried out by retesting the initial experiment after an interval of six months to observe whether examiners would repeat their decisions for the estimated LR. The experimental results, which are based on handwriting dataset, show that LR has different confidence intervals which therefore implies that LR cannot be estimated with the same certainty everywhere. Though the LoR performed better than the KDE when tested using the same dataset, the two LR estimators investigated showed a consistent region in which LR value can be estimated confidently. These two findings advance our understanding of LR when used in computing the strength of evidence in handwriting using forensics.

Keywords: confidence interval, handwriting, kernel density estimator, KDE, logistic regression LoR, repeatability, reproducibility

Procedia PDF Downloads 94
191 Efficient Principal Components Estimation of Large Factor Models

Authors: Rachida Ouysse

Abstract:

This paper proposes a constrained principal components (CnPC) estimator for efficient estimation of large-dimensional factor models when errors are cross sectionally correlated and the number of cross-sections (N) may be larger than the number of observations (T). Although principal components (PC) method is consistent for any path of the panel dimensions, it is inefficient as the errors are treated to be homoskedastic and uncorrelated. The new CnPC exploits the assumption of bounded cross-sectional dependence, which defines Chamberlain and Rothschild’s (1983) approximate factor structure, as an explicit constraint and solves a constrained PC problem. The CnPC method is computationally equivalent to the PC method applied to a regularized form of the data covariance matrix. Unlike maximum likelihood type methods, the CnPC method does not require inverting a large covariance matrix and thus is valid for panels with N ≥ T. The paper derives a convergence rate and an asymptotic normality result for the CnPC estimators of the common factors. We provide feasible estimators and show in a simulation study that they are more accurate than the PC estimator, especially for panels with N larger than T, and the generalized PC type estimators, especially for panels with N almost as large as T.

Keywords: high dimensionality, unknown factors, principal components, cross-sectional correlation, shrinkage regression, regularization, pseudo-out-of-sample forecasting

Procedia PDF Downloads 119
190 Methods of Variance Estimation in Two-Phase Sampling

Authors: Raghunath Arnab

Abstract:

The two-phase sampling which is also known as double sampling was introduced in 1938. In two-phase sampling, samples are selected in phases. In the first phase, a relatively large sample of size is selected by some suitable sampling design and only information on the auxiliary variable is collected. During the second phase, a sample of size is selected either from, the sample selected in the first phase or from the entire population by using a suitable sampling design and information regarding the study and auxiliary variable is collected. Evidently, two phase sampling is useful if the auxiliary information is relatively easy and cheaper to collect than the study variable as well as if the strength of the relationship between the variables and is high. If the sample is selected in more than two phases, the resulting sampling design is called a multi-phase sampling. In this article we will consider how one can use data collected at the first phase sampling at the stages of estimation of the parameter, stratification, selection of sample and their combinations in the second phase in a unified setup applicable to any sampling design and wider classes of estimators. The problem of the estimation of variance will also be considered. The variance of estimator is essential for estimating precision of the survey estimates, calculation of confidence intervals, determination of the optimal sample sizes and for testing of hypotheses amongst others. Although, the variance is a non-negative quantity but its estimators may not be non-negative. If the estimator of variance is negative, then it cannot be used for estimation of confidence intervals, testing of hypothesis or measure of sampling error. The non-negativity properties of the variance estimators will also be studied in details.

Keywords: auxiliary information, two-phase sampling, varying probability sampling, unbiased estimators

Procedia PDF Downloads 560
189 Housing Price Dynamics: Comparative Study of 1980-1999 and the New Millenium

Authors: Janne Engblom, Elias Oikarinen

Abstract:

The understanding of housing price dynamics is of importance to a great number of agents: to portfolio investors, banks, real estate brokers and construction companies as well as to policy makers and households. A panel dataset is one that follows a given sample of individuals over time, and thus provides multiple observations on each individual in the sample. Panel data models include a variety of fixed and random effects models which form a wide range of linear models. A special case of panel data models is dynamic in nature. A complication regarding a dynamic panel data model that includes the lagged dependent variable is endogeneity bias of estimates. Several approaches have been developed to account for this problem. In this paper, the panel models were estimated using the Common Correlated Effects estimator (CCE) of dynamic panel data which also accounts for cross-sectional dependence which is caused by common structures of the economy. In presence of cross-sectional dependence standard OLS gives biased estimates. In this study, U.S housing price dynamics were examined empirically using the dynamic CCE estimator with first-difference of housing price as the dependent and first-differences of per capita income, interest rate, housing stock and lagged price together with deviation of housing prices from their long-run equilibrium level as independents. These deviations were also estimated from the data. The aim of the analysis was to provide estimates with comparisons of estimates between 1980-1999 and 2000-2012. Based on data of 50 U.S cities over 1980-2012 differences of short-run housing price dynamics estimates were mostly significant when two time periods were compared. Significance tests of differences were provided by the model containing interaction terms of independents and time dummy variable. Residual analysis showed very low cross-sectional correlation of the model residuals compared with the standard OLS approach. This means a good fit of CCE estimator model. Estimates of the dynamic panel data model were in line with the theory of housing price dynamics. Results also suggest that dynamics of a housing market is evolving over time.

Keywords: dynamic model, panel data, cross-sectional dependence, interaction model

Procedia PDF Downloads 228
188 Exploring the Role of Data Mining in Crime Classification: A Systematic Literature Review

Authors: Faisal Muhibuddin, Ani Dijah Rahajoe

Abstract:

This in-depth exploration, through a systematic literature review, scrutinizes the nuanced role of data mining in the classification of criminal activities. The research focuses on investigating various methodological aspects and recent developments in leveraging data mining techniques to enhance the effectiveness and precision of crime categorization. Commencing with an exposition of the foundational concepts of crime classification and its evolutionary dynamics, this study details the paradigm shift from conventional methods towards approaches supported by data mining, addressing the challenges and complexities inherent in the modern crime landscape. Specifically, the research delves into various data mining techniques, including K-means clustering, Naïve Bayes, K-nearest neighbour, and clustering methods. A comprehensive review of the strengths and limitations of each technique provides insights into their respective contributions to improving crime classification models. The integration of diverse data sources takes centre stage in this research. A detailed analysis explores how the amalgamation of structured data (such as criminal records) and unstructured data (such as social media) can offer a holistic understanding of crime, enriching classification models with more profound insights. Furthermore, the study explores the temporal implications in crime classification, emphasizing the significance of considering temporal factors to comprehend long-term trends and seasonality. The availability of real-time data is also elucidated as a crucial element in enhancing responsiveness and accuracy in crime classification.

Keywords: data mining, classification algorithm, naïve bayes, k-means clustering, k-nearest neigbhor, crime, data analysis, sistematic literature review

Procedia PDF Downloads 26
187 Efficient Estimation for the Cox Proportional Hazards Cure Model

Authors: Khandoker Akib Mohammad

Abstract:

While analyzing time-to-event data, it is possible that a certain fraction of subjects will never experience the event of interest, and they are said to be cured. When this feature of survival models is taken into account, the models are commonly referred to as cure models. In the presence of covariates, the conditional survival function of the population can be modelled by using the cure model, which depends on the probability of being uncured (incidence) and the conditional survival function of the uncured subjects (latency), and a combination of logistic regression and Cox proportional hazards (PH) regression is used to model the incidence and latency respectively. In this paper, we have shown the asymptotic normality of the profile likelihood estimator via asymptotic expansion of the profile likelihood and obtain the explicit form of the variance estimator with an implicit function in the profile likelihood. We have also shown the efficient score function based on projection theory and the profile likelihood score function are equal. Our contribution in this paper is that we have expressed the efficient information matrix as the variance of the profile likelihood score function. A simulation study suggests that the estimated standard errors from bootstrap samples (SMCURE package) and the profile likelihood score function (our approach) are providing similar and comparable results. The numerical result of our proposed method is also shown by using the melanoma data from SMCURE R-package, and we compare the results with the output obtained from the SMCURE package.

Keywords: Cox PH model, cure model, efficient score function, EM algorithm, implicit function, profile likelihood

Procedia PDF Downloads 109
186 Low Complexity Carrier Frequency Offset Estimation for Cooperative Orthogonal Frequency Division Multiplexing Communication Systems without Cyclic Prefix

Authors: Tsui-Tsai Lin

Abstract:

Cooperative orthogonal frequency division multiplexing (OFDM) transmission, which possesses the advantages of better connectivity, expanded coverage, and resistance to frequency selective fading, has been a more powerful solution for the physical layer in wireless communications. However, such a hybrid scheme suffers from the carrier frequency offset (CFO) effects inherited from the OFDM-based systems, which lead to a significant degradation in performance. In addition, insertion of a cyclic prefix (CP) at each symbol block head for combating inter-symbol interference will lead to a reduction in spectral efficiency. The design on the CFO estimation for the cooperative OFDM system without CP is a suspended problem. This motivates us to develop a low complexity CFO estimator for the cooperative OFDM decode-and-forward (DF) communication system without CP over the multipath fading channel. Especially, using a block-type pilot, the CFO estimation is first derived in accordance with the least square criterion. A reliable performance can be obtained through an exhaustive two-dimensional (2D) search with a penalty of heavy computational complexity. As a remedy, an alternative solution realized with an iteration approach is proposed for the CFO estimation. In contrast to the 2D-search estimator, the iterative method enjoys the advantage of the substantially reduced implementation complexity without sacrificing the estimate performance. Computer simulations have been presented to demonstrate the efficacy of the proposed CFO estimation.

Keywords: cooperative transmission, orthogonal frequency division multiplexing (OFDM), carrier frequency offset, iteration

Procedia PDF Downloads 241
185 A Comparative Analysis of Classification Models with Wrapper-Based Feature Selection for Predicting Student Academic Performance

Authors: Abdullah Al Farwan, Ya Zhang

Abstract:

In today’s educational arena, it is critical to understand educational data and be able to evaluate important aspects, particularly data on student achievement. Educational Data Mining (EDM) is a research area that focusing on uncovering patterns and information in data from educational institutions. Teachers, if they are able to predict their students' class performance, can use this information to improve their teaching abilities. It has evolved into valuable knowledge that can be used for a wide range of objectives; for example, a strategic plan can be used to generate high-quality education. Based on previous data, this paper recommends employing data mining techniques to forecast students' final grades. In this study, five data mining methods, Decision Tree, JRip, Naive Bayes, Multi-layer Perceptron, and Random Forest with wrapper feature selection, were used on two datasets relating to Portuguese language and mathematics classes lessons. The results showed the effectiveness of using data mining learning methodologies in predicting student academic success. The classification accuracy achieved with selected algorithms lies in the range of 80-94%. Among all the selected classification algorithms, the lowest accuracy is achieved by the Multi-layer Perceptron algorithm, which is close to 70.45%, and the highest accuracy is achieved by the Random Forest algorithm, which is close to 94.10%. This proposed work can assist educational administrators to identify poor performing students at an early stage and perhaps implement motivational interventions to improve their academic success and prevent educational dropout.

Keywords: classification algorithms, decision tree, feature selection, multi-layer perceptron, Naïve Bayes, random forest, students’ academic performance

Procedia PDF Downloads 131
184 Stereo Motion Tracking

Authors: Yudhajit Datta, Hamsi Iyer, Jonathan Bandi, Ankit Sethia

Abstract:

Motion Tracking and Stereo Vision are complicated, albeit well-understood problems in computer vision. Existing softwares that combine the two approaches to perform stereo motion tracking typically employ complicated and computationally expensive procedures. The purpose of this study is to create a simple and effective solution capable of combining the two approaches. The study aims to explore a strategy to combine the two techniques of two-dimensional motion tracking using Kalman Filter; and depth detection of object using Stereo Vision. In conventional approaches objects in the scene of interest are observed using a single camera. However for Stereo Motion Tracking; the scene of interest is observed using video feeds from two calibrated cameras. Using two simultaneous measurements from the two cameras a calculation for the depth of the object from the plane containing the cameras is made. The approach attempts to capture the entire three-dimensional spatial information of each object at the scene and represent it through a software estimator object. In discrete intervals, the estimator tracks object motion in the plane parallel to plane containing cameras and updates the perpendicular distance value of the object from the plane containing the cameras as depth. The ability to efficiently track the motion of objects in three-dimensional space using a simplified approach could prove to be an indispensable tool in a variety of surveillance scenarios. The approach may find application from high security surveillance scenes such as premises of bank vaults, prisons or other detention facilities; to low cost applications in supermarkets and car parking lots.

Keywords: kalman filter, stereo vision, motion tracking, matlab, object tracking, camera calibration, computer vision system toolbox

Procedia PDF Downloads 295
183 Enhancement of Primary User Detection in Cognitive Radio by Scattering Transform

Authors: A. Moawad, K. C. Yao, A. Mansour, R. Gautier

Abstract:

The detecting of an occupied frequency band is a major issue in cognitive radio systems. The detection process becomes difficult if the signal occupying the band of interest has faded amplitude due to multipath effects. These effects make it hard for an occupying user to be detected. This work mitigates the missed-detection problem in the context of cognitive radio in frequency-selective fading channel by proposing blind channel estimation method that is based on scattering transform. By initially applying conventional energy detection, the missed-detection probability is evaluated, and if it is greater than or equal to 50%, channel estimation is applied on the received signal followed by channel equalization to reduce the channel effects. In the proposed channel estimator, we modify the Morlet wavelet by using its first derivative for better frequency resolution. A mathematical description of the modified function and its frequency resolution is formulated in this work. The improved frequency resolution is required to follow the spectral variation of the channel. The channel estimation error is evaluated in the mean-square sense for different channel settings, and energy detection is applied to the equalized received signal. The simulation results show improvement in reducing the missed-detection probability as compared to the detection based on principal component analysis. This improvement is achieved at the expense of increased estimator complexity, which depends on the number of wavelet filters as related to the channel taps. Also, the detection performance shows an improvement in detection probability for low signal-to-noise scenarios over principal component analysis- based energy detection.

Keywords: channel estimation, cognitive radio, scattering transform, spectrum sensing

Procedia PDF Downloads 174
182 Household's Willingness to Pay for Safe Non-Timber Forest Products at Morikouali-Ye Community Forest in Cameroon

Authors: Eke Balla Sophie Michelle

Abstract:

Forest provides a wide range of environmental goods and services among which, biodiversity or consumption goods and constitute public goods. Despite the importance of non-timber forest products (NTFPs) in sustaining livelihood and poverty smoothening in rural communities, they are highly depleted and poorly conserved. Yokadouma is a town where NTFPs is a renewable resource in active exploitation. It has been found that such exploitation is done in the same conditions as other localities that have experienced a rapid depletion of their NTFPs in destination to cities across Cameroon, Central Africa, and overseas. Given these realities, it is necessary to access the consequences of this overexploitation through negative effects on both the population and the environment. Therefore, to enhance participatory conservation initiatives, this study determines the household’s willingness to pay in community forest (CF) of Morikouali-ye, eastern region of Cameroon, for sustainable exploitation of NTFPs using contingent valuation method (CVM) through two approaches, one parametric (Logit model) and the other non-parametric (estimator of the Turnbull lower bound). The results indicate that five species are the most collected in the study area: Irvingia gabonensis, the Ricinodendron heudelotii, Gnetum, the Jujube and bark, their sale contributes significantly to 41 % of total household income. The average willingness to pay through the Logit model and the Turnbull estimator is 6845.2861 FCFA and 4940 FCFA respectively per household per year with a social cost of degradation estimated at 3237820.3253 FCFA years. The probability to pay increases with income, gender, number of women in the household, age, the commercial activity of NTFPs and decreases with the concept of sustainable development.

Keywords: non timber forest product, contingent valuation method, willingness to pay, sustainable development

Procedia PDF Downloads 409
181 Trends of Seasonal and Annual Rainfall in the South-Central Climatic Zone of Bangladesh Using Mann-Kendall Trend Test

Authors: M. T. Islam, S. H. Shakif, R. Hasan, S. H. Kobi

Abstract:

Investigation of rainfall trends is crucial considering climate change, food security, and the economy of a particular region. This research aims to study seasonal and annual precipitation trends and their abrupt changes over time in the south-central climatic zone of Bangladesh using monthly time series data of 50 years (1970-2019). A trend-free pre-whitening method has been employed to make necessary adjustments for autocorrelations in the rainfall data. Trends in rainfall and their intensity have been observed using the non-parametric Mann-Kendall test and Theil-Sen estimator. Significant changes and fluctuation points in the data series have been detected using the sequential Mann-Kendall test at the 95% confidence limit. The study findings show that most of the rainfall stations in the study area have a decreasing precipitation pattern throughout all seasons. The maximum decline in the rainfall intensity has been found for the Tangail station (-8.24 mm/year) during monsoon. Madaripur and Chandpur stations have shown slight positive trends in post-monsoon rainfall. In terms of annual precipitation, a negative rainfall pattern has been identified in each station, with a maximum decrement (-) of 14.48 mm/year at Chandpur. However, all the trends are statistically non-significant within the 95% confidence interval, and their monotonic association with time ranges from very weak to weak. From the sequential Mann-Kendall test, the year of changing points for annual and seasonal downward precipitation trends occur mostly after the 90s for Dhaka and Barishal stations. For Chandpur, the fluctuation points arrive after the mid-70s in most cases.

Keywords: trend analysis, Mann-Kendall test, Theil-Sen estimator, sequential Mann-Kendall test, rainfall trend

Procedia PDF Downloads 52
180 Landslide Susceptibility Mapping Using Soft Computing in Amhara Saint

Authors: Semachew M. Kassa, Africa M Geremew, Tezera F. Azmatch, Nandyala Darga Kumar

Abstract:

Frequency ratio (FR) and analytical hierarchy process (AHP) methods are developed based on past landslide failure points to identify the landslide susceptibility mapping because landslides can seriously harm both the environment and society. However, it is still difficult to select the most efficient method and correctly identify the main driving factors for particular regions. In this study, we used fourteen landslide conditioning factors (LCFs) and five soft computing algorithms, including Random Forest (RF), Support Vector Machine (SVM), Logistic Regression (LR), Artificial Neural Network (ANN), and Naïve Bayes (NB), to predict the landslide susceptibility at 12.5 m spatial scale. The performance of the RF (F1-score: 0.88, AUC: 0.94), ANN (F1-score: 0.85, AUC: 0.92), and SVM (F1-score: 0.82, AUC: 0.86) methods was significantly better than the LR (F1-score: 0.75, AUC: 0.76) and NB (F1-score: 0.73, AUC: 0.75) method, according to the classification results based on inventory landslide points. The findings also showed that around 35% of the study region was made up of places with high and very high landslide risk (susceptibility greater than 0.5). The very high-risk locations were primarily found in the western and southeastern regions, and all five models showed good agreement and similar geographic distribution patterns in landslide susceptibility. The towns with the highest landslide risk include Amhara Saint Town's western part, the Northern part, and St. Gebreal Church villages, with mean susceptibility values greater than 0.5. However, rainfall, distance to road, and slope were typically among the top leading factors for most villages. The primary contributing factors to landslide vulnerability were slightly varied for the five models. Decision-makers and policy planners can use the information from our study to make informed decisions and establish policies. It also suggests that various places should take different safeguards to reduce or prevent serious damage from landslide events.

Keywords: artificial neural network, logistic regression, landslide susceptibility, naïve Bayes, random forest, support vector machine

Procedia PDF Downloads 36
179 Parameters Estimation of Multidimensional Possibility Distributions

Authors: Sergey Sorokin, Irina Sorokina, Alexander Yazenin

Abstract:

We present a solution to the Maxmin u/E parameters estimation problem of possibility distributions in m-dimensional case. Our method is based on geometrical approach, where minimal area enclosing ellipsoid is constructed around the sample. Also we demonstrate that one can improve results of well-known algorithms in fuzzy model identification task using Maxmin u/E parameters estimation.

Keywords: possibility distribution, parameters estimation, Maxmin u\E estimator, fuzzy model identification

Procedia PDF Downloads 436
178 Using Derivative Free Method to Improve the Error Estimation of Numerical Quadrature

Authors: Chin-Yun Chen

Abstract:

Numerical integration is an essential tool for deriving different physical quantities in engineering and science. The effectiveness of a numerical integrator depends on different factors, where the crucial one is the error estimation. This work presents an error estimator that combines a derivative free method to improve the performance of verified numerical quadrature.

Keywords: numerical quadrature, error estimation, derivative free method, interval computation

Procedia PDF Downloads 431
177 Moment Estimators of the Parameters of Zero-One Inflated Negative Binomial Distribution

Authors: Rafid Saeed Abdulrazak Alshkaki

Abstract:

In this paper, zero-one inflated negative binomial distribution is considered, along with some of its structural properties, then its parameters were estimated using the method of moments. It is found that the method of moments to estimate the parameters of the zero-one inflated negative binomial models is not a proper method and may give incorrect conclusions.

Keywords: zero one inflated models, negative binomial distribution, moments estimator, non negative integer sampling

Procedia PDF Downloads 253
176 Comparison of Receiver Operating Characteristic Curve Smoothing Methods

Authors: D. Sigirli

Abstract:

The Receiver Operating Characteristic (ROC) curve is a commonly used statistical tool for evaluating the diagnostic performance of screening and diagnostic test with continuous or ordinal scale results which aims to predict the presence or absence probability of a condition, usually a disease. When the test results were measured as numeric values, sensitivity and specificity can be computed across all possible threshold values which discriminate the subjects as diseased and non-diseased. There are infinite numbers of possible decision thresholds along the continuum of the test results. The ROC curve presents the trade-off between sensitivity and the 1-specificity as the threshold changes. The empirical ROC curve which is a non-parametric estimator of the ROC curve is robust and it represents data accurately. However, especially for small sample sizes, it has a problem of variability and as it is a step function there can be different false positive rates for a true positive rate value and vice versa. Besides, the estimated ROC curve being in a jagged form, since the true ROC curve is a smooth curve, it underestimates the true ROC curve. Since the true ROC curve is assumed to be smooth, several smoothing methods have been explored to smooth a ROC curve. These include using kernel estimates, using log-concave densities, to fit parameters for the specified density function to the data with the maximum-likelihood fitting of univariate distributions or to create a probability distribution by fitting the specified distribution to the data nd using smooth versions of the empirical distribution functions. In the present paper, we aimed to propose a smooth ROC curve estimation based on the boundary corrected kernel function and to compare the performances of ROC curve smoothing methods for the diagnostic test results coming from different distributions in different sample sizes. We performed simulation study to compare the performances of different methods for different scenarios with 1000 repetitions. It is seen that the performance of the proposed method was typically better than that of the empirical ROC curve and only slightly worse compared to the binormal model when in fact the underlying samples were generated from the normal distribution.

Keywords: empirical estimator, kernel function, smoothing, receiver operating characteristic curve

Procedia PDF Downloads 123
175 Using the Bootstrap for Problems Statistics

Authors: Brahim Boukabcha, Amar Rebbouh

Abstract:

The bootstrap method based on the idea of exploiting all the information provided by the initial sample, allows us to study the properties of estimators. In this article we will present a theoretical study on the different methods of bootstrapping and using the technique of re-sampling in statistics inference to calculate the standard error of means of an estimator and determining a confidence interval for an estimated parameter. We apply these methods tested in the regression models and Pareto model, giving the best approximations.

Keywords: bootstrap, error standard, bias, jackknife, mean, median, variance, confidence interval, regression models

Procedia PDF Downloads 354
174 Chaotic Behavior in Monetary Systems: Comparison among Different Types of Taylor Rule

Authors: Reza Moosavi Mohseni, Wenjun Zhang, Jiling Cao

Abstract:

The aim of the present study is to detect the chaotic behavior in monetary economic relevant dynamical system. The study employs three different forms of Taylor rules: current, forward, and backward looking. The result suggests the existence of the chaotic behavior in all three systems. In addition, the results strongly represent that using expectations especially rational expectation hypothesis can increase complexity of the system and leads to more chaotic behavior.

Keywords: taylor rule, monetary system, chaos theory, lyapunov exponent, GMM estimator

Procedia PDF Downloads 490
173 Confidence Intervals for Quantiles in the Two-Parameter Exponential Distributions with Type II Censored Data

Authors: Ayman Baklizi

Abstract:

Based on type II censored data, we consider interval estimation of the quantiles of the two-parameter exponential distribution and the difference between the quantiles of two independent two-parameter exponential distributions. We derive asymptotic intervals, Bayesian, as well as intervals based on the generalized pivot variable. We also include some bootstrap intervals in our comparisons. The performance of these intervals is investigated in terms of their coverage probabilities and expected lengths.

Keywords: asymptotic intervals, Bayes intervals, bootstrap, generalized pivot variables, two-parameter exponential distribution, quantiles

Procedia PDF Downloads 385