Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 292

Search results for: Bayes estimator

142 Sentiment Analysis on the East Timor Accession Process to the ASEAN

Authors: Marcelino Caetano Noronha, Vosco Pereira, Jose Soares Pinto, Ferdinando Da C. Saores

Abstract:

One particularly popular social media platform is Youtube. It’s a video-sharing platform where users can submit videos, and other users can like, dislike or comment on the videos. In this study, we conduct a binary classification task on YouTube’s video comments and review from the users regarding the accession process of Timor Leste to become the eleventh member of the Association of South East Asian Nations (ASEAN). We scrape the data directly from the public YouTube video and apply several pre-processing and weighting techniques. Before conducting the classification, we categorized the data into two classes, namely positive and negative. In the classification part, we apply Support Vector Machine (SVM) algorithm. By comparing with Naïve Bayes Algorithm, the experiment showed SVM achieved 84.1% of Accuracy, 94.5% of Precision, and Recall 73.8% simultaneously.

Keywords: classification, YouTube, sentiment analysis, support sector machine

Procedia PDF Downloads 67

141 Non-Parametric, Unconditional Quantile Estimation of Efficiency in Microfinance Institutions

Authors: Komlan Sedzro

Abstract:

We apply the non-parametric, unconditional, hyperbolic order-α quantile estimator to appraise the relative efficiency of Microfinance Institutions in Africa in terms of outreach. Our purpose is to verify if these institutions, which must constantly try to strike a compromise between their social role and financial sustainability are operationally efficient. Using data on African MFIs extracted from the Microfinance Information eXchange (MIX) database and covering the 2004 to 2006 periods, we find that more efficient MFIs are also the most profitable. This result is in line with the view that social performance is not in contradiction with the pursuit of excellent financial performance. Our results also show that large MFIs in terms of asset and those charging the highest fees are not necessarily the most efficient.

Keywords: data envelopment analysis, microfinance institutions, quantile estimation of efficiency, social and financial performance

Procedia PDF Downloads 272

140 Incorporating Information Gain in Regular Expressions Based Classifiers

Authors: Rosa L. Figueroa, Christopher A. Flores, Qing Zeng-Treitler

Abstract:

A regular expression consists of sequence characters which allow describing a text path. Usually, in clinical research, regular expressions are manually created by programmers together with domain experts. Lately, there have been several efforts to investigate how to generate them automatically. This article presents a text classification algorithm based on regexes. The algorithm named REX was designed, and then, implemented as a simplified method to create regexes to classify Spanish text automatically. In order to classify ambiguous cases, such as, when multiple labels are assigned to a testing example, REX includes an information gain method Two sets of data were used to evaluate the algorithm’s effectiveness in clinical text classification tasks. The results indicate that the regular expression based classifier proposed in this work performs statically better regarding accuracy and F-measure than Support Vector Machine and Naïve Bayes for both datasets.

Keywords: information gain, regular expressions, smith-waterman algorithm, text classification

Procedia PDF Downloads 292

139 Sentiment Classification of Documents

Authors: Swarnadip Ghosh

Abstract:

Sentiment Analysis is the process of detecting the contextual polarity of text. In other words, it determines whether a piece of writing is positive, negative or neutral.Sentiment analysis of documents holds great importance in today's world, when numerous information is stored in databases and in the world wide web. An efficient algorithm to illicit such information, would be beneficial for social, economic as well as medical purposes. In this project, we have developed an algorithm to classify a document into positive or negative. Using our algorithm, we obtained a feature set from the data, and classified the documents based on this feature set. It is important to note that, in the classification, we have not used the independence assumption, which is considered by many procedures like the Naive Bayes. This makes the algorithm more general in scope. Moreover, because of the sparsity and high dimensionality of such data, we did not use empirical distribution for estimation, but developed a method by finding degree of close clustering of the data points. We have applied our algorithm on a movie review data set obtained from IMDb and obtained satisfactory results.

Keywords: sentiment, Run's Test, cross validation, higher dimensional pmf estimation

Procedia PDF Downloads 372

138 An Adjusted Network Information Criterion for Model Selection in Statistical Neural Network Models

Authors: Christopher Godwin Udomboso, Angela Unna Chukwu, Isaac Kwame Dontwi

Abstract:

In selecting a Statistical Neural Network model, the Network Information Criterion (NIC) has been observed to be sample biased, because it does not account for sample sizes. The selection of a model from a set of fitted candidate models requires objective data-driven criteria. In this paper, we derived and investigated the Adjusted Network Information Criterion (ANIC), based on Kullback’s symmetric divergence, which has been designed to be an asymptotically unbiased estimator of the expected Kullback-Leibler information of a fitted model. The analyses show that on a general note, the ANIC improves model selection in more sample sizes than does the NIC.

Keywords: statistical neural network, network information criterion, adjusted network, information criterion, transfer function

Procedia PDF Downloads 530

137 Evaluation of Robust Feature Descriptors for Texture Classification

Authors: Jia-Hong Lee, Mei-Yi Wu, Hsien-Tsung Kuo

Abstract:

Texture is an important characteristic in real and synthetic scenes. Texture analysis plays a critical role in inspecting surfaces and provides important techniques in a variety of applications. Although several descriptors have been presented to extract texture features, the development of object recognition is still a difficult task due to the complex aspects of texture. Recently, many robust and scaling-invariant image features such as SIFT, SURF and ORB have been successfully used in image retrieval and object recognition. In this paper, we have tried to compare the performance for texture classification using these feature descriptors with k-means clustering. Different classifiers including K-NN, Naive Bayes, Back Propagation Neural Network , Decision Tree and Kstar were applied in three texture image sets － UIUCTex, KTH-TIPS and Brodatz, respectively. Experimental results reveal SIFTS as the best average accuracy rate holder in UIUCTex, KTH-TIPS and SURF is advantaged in Brodatz texture set. BP neuro network works best in the test set classification among all used classifiers.

Keywords: texture classification, texture descriptor, SIFT, SURF, ORB

Procedia PDF Downloads 333

136 Robust Variable Selection Based on Schwarz Information Criterion for Linear Regression Models

Authors: Shokrya Saleh A. Alshqaq, Abdullah Ali H. Ahmadini

Abstract:

The Schwarz information criterion (SIC) is a popular tool for selecting the best variables in regression datasets. However, SIC is defined using an unbounded estimator, namely, the least-squares (LS), which is highly sensitive to outlying observations, especially bad leverage points. A method for robust variable selection based on SIC for linear regression models is thus needed. This study investigates the robustness properties of SIC by deriving its influence function and proposes a robust SIC based on the MM-estimation scale. The aim of this study is to produce a criterion that can effectively select accurate models in the presence of vertical outliers and high leverage points. The advantages of the proposed robust SIC is demonstrated through a simulation study and an analysis of a real dataset.

Keywords: influence function, robust variable selection, robust regression, Schwarz information criterion

Procedia PDF Downloads 113

135 A Dynamic Panel Model to Evaluate the Impact of Debt Relief on Poverty

Authors: Loujaina Abdelwahed

Abstract:

Debt relief granted to low-and middle-income countries effectively provides additional funds for governments that can be used to increase public investment on poverty-reducing services to alleviate poverty and boost economic growth. However, little is known about the extent to which the poor benefit from the increased public investment. This study aims to assess the impact of debt relief granted through multiple initiatives during the 1990s on poverty reduction. In particular, it assesses the impact on the level, depth and severity of poverty in 76 low-and middle income countries over the period 1990-2011. Debt relief is found to have a significant impact on reducing the level, the depth and the severity of poverty. Analysis of the different types of debt relief reveals that debt service relief reduces poverty, whereas debt principle relief does not have a significant impact.

Keywords: debt relief, developing countries, HIPC, poverty, system GMM estimator

Procedia PDF Downloads 369

134 Machine Learning Automatic Detection on Twitter Cyberbullying

Authors: Raghad A. Altowairgi

Abstract:

With the wide spread of social media platforms, young people tend to use them extensively as the first means of communication due to their ease and modernity. But these platforms often create a fertile ground for bullies to practice their aggressive behavior against their victims. Platform usage cannot be reduced, but intelligent mechanisms can be implemented to reduce the abuse. This is where machine learning comes in. Understanding and classifying text can be helpful in order to minimize the act of cyberbullying. Artificial intelligence techniques have expanded to formulate an applied tool to address the phenomenon of cyberbullying. In this research, machine learning models are built to classify text into two classes; cyberbullying and non-cyberbullying. After preprocessing the data in 4 stages; removing characters that do not provide meaningful information to the models, tokenization, removing stop words, and lowering text. BoW and TF-IDF are used as the main features for the five classifiers, which are; logistic regression, Naïve Bayes, Random Forest, XGboost, and Catboost classifiers. Each of them scores 92%, 90%, 92%, 91%, 86% respectively.

Keywords: cyberbullying, machine learning, Bag-of-Words, term frequency-inverse document frequency, natural language processing, Catboost

Procedia PDF Downloads 96

133 Spatial Point Process Analysis of Dengue Fever in Tainan, Taiwan

Authors: Ya-Mei Chang

Abstract:

This research is intended to apply spatio-temporal point process methods to the dengue fever data in Tainan. The spatio-temporal intensity function of the dataset is assumed to be separable. The kernel estimation is a widely used approach to estimate intensity functions. The intensity function is very helpful to study the relation of the spatio-temporal point process and some covariates. The covariate effects might be nonlinear. An nonparametric smoothing estimator is used to detect the nonlinearity of the covariate effects. A fitted parametric model could describe the influence of the covariates to the dengue fever. The correlation between the data points is detected by the K-function. The result of this research could provide useful information to help the government or the stakeholders making decisions.

Keywords: dengue fever, spatial point process, kernel estimation, covariate effect

Procedia PDF Downloads 324

132 Using Machine Learning Techniques for Autism Spectrum Disorder Analysis and Detection in Children

Authors: Norah Mohammed Alshahrani, Abdulaziz Almaleh

Abstract:

Autism Spectrum Disorder (ASD) is a condition related to issues with brain development that affects how a person recognises and communicates with others which results in difficulties with interaction and communication socially and it is constantly growing. Early recognition of ASD allows children to lead safe and healthy lives and helps doctors with accurate diagnoses and management of conditions. Therefore, it is crucial to develop a method that will achieve good results and with high accuracy for the measurement of ASD in children. In this paper, ASD datasets of toddlers and children have been analyzed. We employed the following machine learning techniques to attempt to explore ASD and they are Random Forest (RF), Decision Tree (DT), Na¨ıve Bayes (NB) and Support Vector Machine (SVM). Then Feature selection was used to provide fewer attributes from ASD datasets while preserving model performance. As a result, we found that the best result has been provided by the Support Vector Machine (SVM), achieving 0.98% in the toddler dataset and 0.99% in the children dataset.

Keywords: autism spectrum disorder, machine learning, feature selection, support vector machine

Procedia PDF Downloads 111

131 An Application to Predict the Best Study Path for Information Technology Students in Learning Institutes

Authors: L. S. Chathurika

Abstract:

Early prediction of student performance is an important factor to be gained academic excellence. Whatever the study stream in secondary education, students lay the foundation for higher studies during the first year of their degree or diploma program in Sri Lanka. The information technology (IT) field has certain improvements in the education domain by selecting specialization areas to show the talents and skills of students. These specializations can be software engineering, network administration, database administration, multimedia design, etc. After completing the first-year, students attempt to select the best path by considering numerous factors. The purpose of this experiment is to predict the best study path using machine learning algorithms. Five classification algorithms: decision tree, support vector machine, artificial neural network, Naïve Bayes, and logistic regression are selected and tested. The support vector machine obtained the highest accuracy, 82.4%. Then affecting features are recognized to select the best study path.

Keywords: algorithm, classification, evaluation, features, testing, training

Procedia PDF Downloads 97

130 Model for Introducing Products to New Customers through Decision Tree Using Algorithm C4.5 (J-48)

Authors: Komol Phaisarn, Anuphan Suttimarn, Vitchanan Keawtong, Kittisak Thongyoun, Chaiyos Jamsawang

Abstract:

This article is intended to analyze insurance information which contains information on the customer decision when purchasing life insurance pay package. The data were analyzed in order to present new customers with Life Insurance Perfect Pay package to meet new customers’ needs as much as possible. The basic data of insurance pay package were collect to get data mining; thus, reducing the scattering of information. The data were then classified in order to get decision model or decision tree using Algorithm C4.5 (J-48). In the classification, WEKA tools are used to form the model and testing datasets are used to test the decision tree for the accurate decision. The validation of this model in classifying showed that the accurate prediction was 68.43% while 31.25% were errors. The same set of data were then tested with other models, i.e. Naive Bayes and Zero R. The results showed that J-48 method could predict more accurately. So, the researcher applied the decision tree in writing the program used to introduce the product to new customers to persuade customers’ decision making in purchasing the insurance package that meets the new customers’ needs as much as possible.

Keywords: decision tree, data mining, customers, life insurance pay package

Procedia PDF Downloads 402

129 Recursive Parametric Identification of a Doubly Fed Induction Generator-Based Wind Turbine

Authors: A. El Kachani, E. Chakir, A. Ait Laachir, A. Niaaniaa, J. Zerouaoui

Abstract:

This document presents an adaptive controller based on recursive parametric identification applied to a wind turbine based on the doubly-fed induction machine (DFIG), to compensate the faults and guarantee efficient of the DFIG. The proposed adaptive controller is based on the recursive least square algorithm which considers that the best estimator for the vector parameter is the vector x minimizing a quadratic criterion. Furthermore, this method can improve the rapidity and precision of the controller based on a model. The proposed controller is validated via simulation on a 5.5 kW DFIG-based wind turbine. The results obtained seem to be good. In addition, they show the advantages of an adaptive controller based on recursive least square algorithm.

Keywords: adaptive controller, recursive least squares algorithm, wind turbine, doubly fed induction generator

Procedia PDF Downloads 257

128 A Predictive Machine Learning Model of the Survival of Female-led and Co-Led Small and Medium Enterprises in the UK

Authors: Mais Khader, Xingjie Wei

Abstract:

This research sheds light on female entrepreneurs by providing new insights on the survival predictions of companies led by females in the UK. This study aims to build a predictive machine learning model of the survival of female-led & co-led small & medium enterprises (SMEs) in the UK over the period 2000-2020. The predictive model built utilised a combination of financial and non-financial features related to both companies and their directors to predict SMEs' survival. These features were studied in terms of their contribution to the resultant predictive model. Five machine learning models are used in the modelling: Decision tree, AdaBoost, Naïve Bayes, Logistic regression and SVM. The AdaBoost model had the highest performance of the five models, with an accuracy of 73% and an AUC of 80%. The results show high feature importance in predicting companies' survival for company size, management experience, financial performance, industry, region, and females' percentage in management.

Keywords: company survival, entrepreneurship, females, machine learning, SMEs

Procedia PDF Downloads 56

127 The Effect of Political Characteristics on the Budget Balance of Local Governments: A Dynamic System Generalized Method of Moments Data Approach

Authors: Stefanie M. Vanneste, Stijn Goeminne

Abstract:

This paper studies the effect of political characteristics of 308 Flemish municipalities on their budget balance in the period 1995-2011. All local governments experience the same economic and financial setting, however some governments have high budget balances, while others have low budget balances. The aim of this paper is to explain the differences in municipal budget balances by a number of economic, socio-demographic and political variables. The economic and socio-demographic variables will be used as control variables, while the focus of this paper will be on the political variables. We test four hypotheses resulting from the literature, namely (i) the partisan hypothesis tests if left wing governments have lower budget balances, (ii) the fragmentation hypothesis stating that more fragmented governments have lower budget balances, (iii) the hypothesis regarding the power of the government, higher powered governments would resolve in higher budget balances, and (iv) the opportunistic budget cycle to test whether politicians manipulate the economic situation before elections in order to maximize their reelection possibilities and therefore have lower budget balances before elections. The contributions of our paper to the existing literature are multiple. First, we use the whole array of political variables and not just a selection of them. Second, we are dealing with a homogeneous database with the same budget and election rules, making it easier to focus on the political factors without having to control for the impact of differences in the political systems. Third, our research extends the existing literature on Flemish municipalities as this is the first dynamic research on local budget balances. We use a dynamic panel data model. Because of the two lagged dependent variables as explanatory variables, we employ the system GMM (Generalized Method of Moments) estimator. This is the best possible estimator as we are dealing with political panel data that is rather persistent. Our empirical results show that the effect of the ideological position and the power of the coalition are of less importance to explain the budget balance. The political fragmentation of the government on the other hand has a negative and significant effect on the budget balance. The more parties in a coalition the worse the budget balance is ceteris paribus. Our results also provide evidence of an opportunistic budget cycle, the budget balances are lower in pre-election years relative to the other years to try and increase the incumbents reelection possibilities. An additional finding is that the incremental effect of the budget balance is very important and should not be ignored like is being done in a lot of empirical research. The coefficients of the lagged dependent variables are always positive and very significant. This proves that the budget balance is subject to incrementalism. It is not possible to change the entire policy from one year to another so the actions taken in recent past years still have an impact on the current budget balance. Only a relatively small amount of research concerning the budget balance takes this considerable incremental effect into account. Our findings survive several robustness checks.

Keywords: budget balance, fragmentation, ideology, incrementalism, municipalities, opportunistic budget cycle, panel data, political characteristics, power, system GMM

Procedia PDF Downloads 277

126 A Scalable Model of Fair Socioeconomic Relations Based on Blockchain and Machine Learning Algorithms-1: On Hyperinteraction and Intuition

Authors: Merey M. Sarsengeldin, Alexandr S. Kolokhmatov, Galiya Seidaliyeva, Alexandr Ozerov, Sanim T. Imatayeva

Abstract:

This series of interdisciplinary studies is an attempt to investigate and develop a scalable model of fair socioeconomic relations on the base of blockchain using positive psychology techniques and Machine Learning algorithms for data analytics. In this particular study, we use hyperinteraction approach and intuition to investigate their influence on 'wisdom of crowds' via created mobile application which was created for the purpose of this research. Along with the public blockchain and private Decentralized Autonomous Organization (DAO) which were elaborated by us on the base of Ethereum blockchain, a model of fair financial relations of members of DAO was developed. We developed a smart contract, so-called, Fair Price Protocol and use it for implementation of model. The data obtained from mobile application was analyzed by ML algorithms. A model was tested on football matches.

Keywords: blockchain, Naïve Bayes algorithm, hyperinteraction, intuition, wisdom of crowd, decentralized autonomous organization

Procedia PDF Downloads 139

125 Nonparametric Quantile Regression for Multivariate Spatial Data

Authors: S. H. Arnaud Kanga, O. Hili, S. Dabo-Niang

Abstract:

Spatial prediction is an issue appealing and attracting several fields such as agriculture, environmental sciences, ecology, econometrics, and many others. Although multiple non-parametric prediction methods exist for spatial data, those are based on the conditional expectation. This paper took a different approach by examining a non-parametric spatial predictor of the conditional quantile. The study especially observes the stationary multidimensional spatial process over a rectangular domain. Indeed, the proposed quantile is obtained by inverting the conditional distribution function. Furthermore, the proposed estimator of the conditional distribution function depends on three kernels, where one of them controls the distance between spatial locations, while the other two control the distance between observations. In addition, the almost complete convergence and the convergence in mean order q of the kernel predictor are obtained when the sample considered is alpha-mixing. Such approach of the prediction method gives the advantage of accuracy as it overcomes sensitivity to extreme and outliers values.

Keywords: conditional quantile, kernel, nonparametric, stationary

Procedia PDF Downloads 123

124 Nonparametric Specification Testing for the Drift of the Short Rate Diffusion Process Using a Panel of Yields

Authors: John Knight, Fuchun Li, Yan Xu

Abstract:

Based on a new method of the nonparametric estimator of the drift function, we propose a consistent test for the parametric specification of the drift function in the short rate diffusion process using observations from a panel of yields. The test statistic is shown to follow an asymptotic normal distribution under the null hypothesis that the parametric drift function is correctly specified, and converges to infinity under the alternative. Taking the daily 7-day European rates as a proxy of the short rate, we use our test to examine whether the drift of the short rate diffusion process is linear or nonlinear, which is an unresolved important issue in the short rate modeling literature. The testing results indicate that none of the drift functions in this literature adequately captures the dynamics of the drift, but nonlinear specification performs better than the linear specification.

Keywords: diffusion process, nonparametric estimation, derivative security price, drift function and volatility function

Procedia PDF Downloads 341

123 Study of Cavitation Erosion of Pump-Storage Hydro Power Plant Prototype

Authors: Tine Cencič, Marko Hočevar, Brane Širok

Abstract:

An experimental investigation has been made to detect cavitation in pump–storage hydro power plant prototype suffering from cavitation in pump mode. Vibrations and acoustic emission on the housing of turbine bearing and pressure fluctuations in the draft tube were measured and the corresponding signals have been recorded and analyzed. The analysis was based on the analysis of high-frequency content of measured variables. The pump-storage hydro power plant prototype has been operated at various input loads and Thoma numbers. Several estimators of cavitation were evaluated according to coefficient of determination between Thoma number and cavitation estimators. The best results were achieved with a compound discharge coefficient cavitation estimator. Cavitation estimators were evaluated in several intervals of frequencies. Also, a prediction of cavitation erosion was made in order to choose the appropriate maintenance and repair periods.

Keywords: cavitation erosion, turbine, cavitation measurement, fluid dynamics

Procedia PDF Downloads 373

122 Sustainable Development Goals: The Effect of a Board Structure on the Sustainability Performance

Authors: V. Naciti, L. Pulejo, F. Cesaroni

Abstract:

This study empirically analyzes whether the composition of the board of directors (BoD) enhances sustainability performance, in order to understand how the BoD contribute to the integration of Sustainable Development Goals (SDGs) in their businesses. Hypotheses are developed based on the agency theory and stakeholder theory. Using a system generalized method of the moment (SGMM) two-step estimator, with data from Sustainalytics and Compustat databases for 362 firms in six regions, we find that firms with more diversity on the board and a separation of chair and CEO roles have higher sustainability performance. Moreover, our findings provide that a higher number of independent directors is negatively associated with sustainability performance. This study contributes to the literature on corporate governance and the firm’s performance by demonstrating that the composition of the board of directors contributes to a better sustainability performance: by the implementation of a particular corporate governance mechanism, it is possible to integrate SDGs in the corporate strategy.

Keywords: sustainable development goals, corporate governance, board of directors, sustainability performance

Procedia PDF Downloads 145

121 Analyses of Reference Evapotranspiration in West of Iran under Climate Change

Authors: Saeed Jahanbakhsh Asl, Yaghob Dinpazhoh, Masoumeh Foroughi

Abstract:

Reference evapotranspiration (ET₀) is an important element in the water cycle that integrates atmospheric demands and surface conditions, and analysis of changes in ET₀ is of great significance for understanding climate change and its impacts on hydrology. As ET₀ is an integrated effect of climate variables, increases in air temperature should lead to increases in ET₀. ET₀ estimated by using the globally accepted Food and Agriculture Organization (FAO) Penman-Monteith (FAO-56 PM) method in 18 meteorological stations located in the West of Iran. The trends of ET₀ detected by using the Mann-Kendall (MK) test. The slopes of the trend lines were computed by using the Sen’s slope estimator. The results showed significant increasing as well as decreasing trends in the annual and monthly ET₀. However, ET₀ trends were increasing. In the monthly scale, the number of the increasing trends was more than the number of decreasing trends, in the majority of warm months of the year.

Keywords: climate change, Mann–Kendall, Penman-Monteith method (FAO-56 PM), reference crop evapotranspiration

Procedia PDF Downloads 255

120 Development of Fake News Model Using Machine Learning through Natural Language Processing

Authors: Sajjad Ahmed, Knut Hinkelmann, Flavio Corradini

Abstract:

Fake news detection research is still in the early stage as this is a relatively new phenomenon in the interest raised by society. Machine learning helps to solve complex problems and to build AI systems nowadays and especially in those cases where we have tacit knowledge or the knowledge that is not known. We used machine learning algorithms and for identification of fake news; we applied three classifiers; Passive Aggressive, Naïve Bayes, and Support Vector Machine. Simple classification is not completely correct in fake news detection because classification methods are not specialized for fake news. With the integration of machine learning and text-based processing, we can detect fake news and build classifiers that can classify the news data. Text classification mainly focuses on extracting various features of text and after that incorporating those features into classification. The big challenge in this area is the lack of an efficient way to differentiate between fake and non-fake due to the unavailability of corpora. We applied three different machine learning classifiers on two publicly available datasets. Experimental analysis based on the existing dataset indicates a very encouraging and improved performance.

Keywords: fake news detection, natural language processing, machine learning, classification techniques.

Procedia PDF Downloads 130

119 Cross Project Software Fault Prediction at Design Phase

Authors: Pradeep Singh, Shrish Verma

Abstract:

Software fault prediction models are created by using the source code, processed metrics from the same or previous version of code and related fault data. Some company do not store and keep track of all artifacts which are required for software fault prediction. To construct fault prediction model for such company, the training data from the other projects can be one potential solution. The earlier we predict the fault the less cost it requires to correct. The training data consists of metrics data and related fault data at function/module level. This paper investigates fault predictions at early stage using the cross-project data focusing on the design metrics. In this study, empirical analysis is carried out to validate design metrics for cross project fault prediction. The machine learning techniques used for evaluation is Naïve Bayes. The design phase metrics of other projects can be used as initial guideline for the projects where no previous fault data is available. We analyze seven data sets from NASA Metrics Data Program which offer design as well as code metrics. Overall, the results of cross project is comparable to the within company data learning.

Keywords: software metrics, fault prediction, cross project, within project.

Procedia PDF Downloads 309

118 Feature Weighting Comparison Based on Clustering Centers in the Detection of Diabetic Retinopathy

Authors: Kemal Polat

Abstract:

In this paper, three feature weighting methods have been used to improve the classification performance of diabetic retinopathy (DR). To classify the diabetic retinopathy, features extracted from the output of several retinal image processing algorithms, such as image-level, lesion-specific and anatomical components, have been used and fed them into the classifier algorithms. The dataset used in this study has been taken from University of California, Irvine (UCI) machine learning repository. Feature weighting methods including the fuzzy c-means clustering based feature weighting, subtractive clustering based feature weighting, and Gaussian mixture clustering based feature weighting, have been used and compered with each other in the classification of DR. After feature weighting, five different classifier algorithms comprising multi-layer perceptron (MLP), k- nearest neighbor (k-NN), decision tree, support vector machine (SVM), and Naïve Bayes have been used. The hybrid method based on combination of subtractive clustering based feature weighting and decision tree classifier has been obtained the classification accuracy of 100% in the screening of DR. These results have demonstrated that the proposed hybrid scheme is very promising in the medical data set classification.

Keywords: machine learning, data weighting, classification, data mining

Procedia PDF Downloads 301

117 Linear Quadratic Gaussian/Loop Transfer Recover Control Flight Control on a Nonlinear Model

Authors: T. Sanches, K. Bousson

Abstract:

As part of the development of a 4D autopilot system for unmanned aerial vehicles (UAVs), i.e. a time-dependent robust trajectory generation and control algorithm, this work addresses the problem of optimal path control based on the flight sensors data output that may be unreliable due to noise on data acquisition and/or transmission under certain circumstances. Although several filtering methods, such as the Kalman-Bucy filter or the Linear Quadratic Gaussian/Loop Transfer Recover Control (LQG/LTR), are available, the utter complexity of the control system, together with the robustness and reliability required of such a system on a UAV for airworthiness certifiable autonomous flight, required the development of a proper robust filter for a nonlinear system, as a way of further mitigate errors propagation to the control system and improve its ,performance. As such, a nonlinear algorithm based upon the LQG/LTR, is validated through computational simulation testing, is proposed on this paper.

Keywords: autonomous flight, LQG/LTR, nonlinear state estimator, robust flight control

Procedia PDF Downloads 105

116 Polarity Classification of Social Media Comments in Turkish

Authors: Migena Ceyhan, Zeynep Orhan, Dimitrios Karras

Abstract:

People in modern societies are continuously sharing their experiences, emotions, and thoughts in different areas of life. The information reaches almost everyone in real-time and can have an important impact in shaping people’s way of living. This phenomenon is very well recognized and advantageously used by the market representatives, trying to earn the most from this means. Given the abundance of information, people and organizations are looking for efficient tools that filter the countless data into important information, ready to analyze. This paper is a modest contribution in this field, describing the process of automatically classifying social media comments in the Turkish language into positive or negative. Once data is gathered and preprocessed, feature sets of selected single words or groups of words are build according to the characteristics of language used in the texts. These features are used later to train, and test a system according to different machine learning algorithms (Naïve Bayes, Sequential Minimal Optimization, J48, and Bayesian Linear Regression). The resultant high accuracies can be important feedback for decision-makers to improve the business strategies accordingly.

Keywords: feature selection, machine learning, natural language processing, sentiment analysis, social media reviews

Procedia PDF Downloads 120

115 Hybrid Robust Estimation via Median Filter and Wavelet Thresholding with Automatic Boundary Correction

Authors: Alsaidi M. Altaher, Mohd Tahir Ismail

Abstract:

Wavelet thresholding has been a power tool in curve estimation and data analysis. In the presence of outliers this non parametric estimator can not suppress the outliers involved. This study proposes a new two-stage combined method based on the use of the median filter as primary step before applying wavelet thresholding. After suppressing the outliers in a signal through the median filter, the classical wavelet thresholding is then applied for removing the remaining noise. We use automatic boundary corrections; using a low order polynomial model or local polynomial model as a more realistic rule to correct the bias at the boundary region; instead of using the classical assumptions such periodic or symmetric. A simulation experiment has been conducted to evaluate the numerical performance of the proposed method. Results show strong evidences that the proposed method is extremely effective in terms of correcting the boundary bias and eliminating outlier’s sensitivity.

Keywords: boundary correction, median filter, simulation, wavelet thresholding

Procedia PDF Downloads 398

114 Variogram Fitting Based on the Wilcoxon Norm

Authors: Hazem Al-Mofleh, John Daniels, Joseph McKean

Abstract:

Within geostatistics research, effective estimation of the variogram points has been examined, particularly in developing robust alternatives. The parametric fit of these variogram points which eventually defines the kriging weights, however, has not received the same attention from a robust perspective. This paper proposes the use of the non-linear Wilcoxon norm over weighted non-linear least squares as a robust variogram fitting alternative. First, we introduce the concept of variogram estimation and fitting. Then, as an alternative to non-linear weighted least squares, we discuss the non-linear Wilcoxon estimator. Next, the robustness properties of the non-linear Wilcoxon are demonstrated using a contaminated spatial data set. Finally, under simulated conditions, increasing levels of contaminated spatial processes have their variograms points estimated and fit. In the fitting of these variogram points, both non-linear Weighted Least Squares and non-linear Wilcoxon fits are examined for efficiency. At all levels of contamination (including 0%), using a robust estimation and robust fitting procedure, the non-weighted Wilcoxon outperforms weighted Least Squares.

Keywords: non-linear wilcoxon, robust estimation, variogram estimation, wilcoxon norm

Procedia PDF Downloads 428

113 Neural Network Based Compressor Flow Estimator in an Aircraft Vapor Cycle System

Authors: Justin Reverdi, Sixin Zhang, Serge Gratton, Said Aoues, Thomas Pellegrini

Abstract:

In Vapor Cycle Systems, the flow sensor plays a key role in different monitoring and control purposes. However, physical sensors can be expensive, inaccurate, heavy, cumbersome, or highly sensitive to vibrations, which is especially problematic when embedded into an aircraft. The conception of a virtual sensor based on other standard sensors is a good alternative. In this paper, a data-driven model using a Convolutional Neural Network is proposed to estimate the flow of the compressor. To fit the model to our dataset, we tested different loss functions. We show in our application that a Dynamic Time Warping based loss function called DILATE leads to better dynamical performance than the vanilla mean squared error (MSE) loss function. DILATE allows choosing a trade-off between static and dynamic performance.

Keywords: deep learning, dynamic time warping, vapor cycle system, virtual sensor

Procedia PDF Downloads 118