Thanh Nguyen

Publications

1 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Data Mining, classification, Decision Tree, spam filtering, naive Bayes

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1119

Abstracts

2 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Data Mining, classification, Decision Tree, naive Bayes, spam filtering

Procedia PDF Downloads 265
1 Organic Farming Profitability: Evidence from South Korea

Authors: Saem Lee, Thanh Nguyen, Hio-Jung Shin, Thomas Koellner

Abstract:

Land-use management has an influence on the provision of ecosystem service in dynamic, agricultural landscapes. Agricultural land use is important for maintaining the productivity and sustainability of agricultural ecosystems. However, in Korea, intensive farming activities in this highland agricultural zone, the upper stream of Soyang has led to contaminated soil caused by over-use pesticides and fertilizers. This has led to decrease in water and soil quality, which has consequences for ecosystem services and human wellbeing. Conventional farming has still high percentage in this area and there is no special measure to prevent low water quality caused by farming activities. Therefore, the adoption of environmentally friendly farming has been considered one of the alternatives that lead to improved water quality and increase in biomass production. Concurrently, farm households with environmentally friendly farming have occupied still low rates. Therefore, our research involved a farm household survey spanning conventional farming, the farm in transition and organic farming in Soyang watershed. Another purpose of our research was to compare economic advantage of the farmers adopting environmentally friendly farming and non-adaptors and to investigate the different factors by logistic regression analysis with socio-economic and benefit-cost ratio variables. The results found that farmers with environmentally friendly farming tended to be younger than conventional farming and farmer in transition. They are similar in terms of gender which was predominately male. Farmers with environmentally friendly farming were more educated and had less farming experience than conventional farming and farmer in transition. Based on the benefit-cost analysis, total costs that farm in transition farmers spent for one year are about two times as much as the sum of costs in environmentally friendly farming. The benefit of organic farmers was assessed with 2,800 KRW per household per year. In logistic regression, the factors having statistical significance are subsidy and district, residence period and benefit-cost ratio. And district and residence period have the negative impact on the practice of environmentally friendly farming techniques. The results of our research make a valuable contribution to provide important information to describe Korean policy-making for agricultural and water management and to consider potential approaches to policy that would substantiate ways beneficial for sustainable resource management.

Keywords: Profitability, Organic Farming, Logistic Regression, agricultural land-use

Procedia PDF Downloads 266