Search results for: spam

18 Layout Based Spam Filtering

Abstract:

Due to the constant increase in the volume of information available to applications in fields varying from medical diagnosis to web search engines, accurate support of similarity becomes an important task. This is also the case of spam filtering techniques where the similarities between the known and incoming messages are the fundaments of making the spam/not spam decision. We present a novel approach to filtering based solely on layout, whose goal is not only to correctly identify spam, but also warn about major emerging threats. We propose a mathematical formulation of the email message layout and based on it we elaborate an algorithm to separate different types of emails and find the new, numerically relevant spam types.

Keywords: Clustering, layout, k-means, spam.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1649

17 Spam E-mail: How Malaysian E-mail Users Deal with It?

Authors: Yanti Rosmunie Bujang, Husnayati Hussin

Abstract:

This paper attempts to discuss the spam issue from the Malaysian e-mail users- perspective. The purpose is to discover how Malaysian users handle the spam e-mail problem. From the experiences we hope to discover the necessary effort needed to be undertaken to face this problem in the context of Malaysia. A survey was conducted to understand how Malaysian individual perceived spam and what they actually do with the spam e-mail they received in their daily life. The findings indicate that the level of awareness on spam issue in action is still low and need some extra effort by government and relevant agencies to increase their level of awareness.

Keywords: E-mail, Malaysia, spam, users' perspective.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1888

16 Facebook Spam and Spam Filter Using Artificial Neural Networks

Authors: Fahim A., Mutahira N. Naseem

Abstract:

Spam is any unwanted electronic message or material in any form posted too many people. As the world is growing as global world, social networking sites play an important role in making world global providing people from different parts of the world a platform to meet and express their views. Among different social networking sites Facebook become the leading one. With increase in usage different users start abusive use of Facebook by posting or creating ways to post spam. This paper highlights the potential spam types nowadays Facebook users’ faces. This paper also provide the reason how user become victim to spam attack. A methodology is proposed in the end discusses how to handle different types of spam.

Keywords: Artificial neural networks, Facebook spam, social networking sites, spam filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3138

15 Image Spam Detection Using Color Features and K-Nearest Neighbor Classification

Authors: T. Kumaresan, S. Sanjushree, C. Palanisamy

Abstract:

Image spam is a kind of email spam where the spam text is embedded with an image. It is a new spamming technique being used by spammers to send their messages to bulk of internet users. Spam email has become a big problem in the lives of internet users, causing time consumption and economic losses. The main objective of this paper is to detect the image spam by using histogram properties of an image. Though there are many techniques to automatically detect and avoid this problem, spammers employing new tricks to bypass those techniques, as a result those techniques are inefficient to detect the spam mails. In this paper we have proposed a new method to detect the image spam. Here the image features are extracted by using RGB histogram, HSV histogram and combination of both RGB and HSV histogram. Based on the optimized image feature set classification is done by using k- Nearest Neighbor(k-NN) algorithm. Experimental result shows that our method has achieved better accuracy. From the result it is known that combination of RGB and HSV histogram with k-NN algorithm gives the best accuracy in spam detection.

Keywords: File Type, HSV Histogram, k-NN, RGB Histogram, Spam Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2142

14 Adaptive Naïve Bayesian Anti-Spam Engine

Authors: Wojciech P. Gajewski

Abstract:

The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.

Keywords: Text classification, naïve Bayesian classification, spam, email.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4415

13 Analysis of Classifications of Unsolicited Bulk Emails

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam Email, has increased at a tremendous growth rate. We present an analysis of survey based on classifications of UBE in various research works. There are many research instances for classification between spam and non-spam emails but very few research instances are available for classification of spam emails, per se. This paper does not intend to assert some UBE classification to be better than the others nor does it propose any new classification but it bemoans the lack of harmony on number and definition of categories proposed by different researchers. The paper also elaborates on factors like intent of spammer, content of UBE and ambiguity in different categories as proposed in related research works of classifications of UBE.

Keywords: E-mail, Scams, Spam Email, Unsolicited Bulk Email(UBE)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1728

12 A Framework for Review Spam Detection Research

Authors: Mohammadali Tavakoli, Atefeh Heydari, Zuriati Ismail, Naomie Salim

Abstract:

With the increasing number of people reviewing products online in recent years, opinion sharing websites has become the most important source of customers’ opinions. Unfortunately, spammers generate and post fake reviews in order to promote or demote brands and mislead potential customers. These are notably destructive not only for potential customers, but also for business holders and manufacturers. However, research in this area is not adequate, and many critical problems related to spam detection have not been solved to date. To provide green researchers in the domain with a great aid, in this paper, we have attempted to create a highquality framework to make a clear vision on review spam-detection methods. In addition, this report contains a comprehensive collection of detection metrics used in proposed spam-detection approaches. These metrics are extremely applicable for developing novel detection methods.

Keywords: Fake reviews, Feature collection, Opinion spam, Spam detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2517

11 An Efficient Spam Mail Detection by Counter Technique

Authors: Raheleh Kholghi, Soheil Behnam Roudsari, Alireza Nemaney Pour

Abstract:

Spam mails are unwanted mails sent to large number of users. Spam mails not only consume the network resources, but cause security threats as well. This paper proposes an efficient technique to detect, and to prevent spam mail in the sender side rather than the receiver side. This technique is based on a counter set on the sender server. When a mail is transmitted to the server, the mail server checks the number of the recipients based on its counter policy. The counter policy performed by the mail server is based on some pre-defined criteria. When the number of recipients exceeds the counter policy, the mail server discontinues the rest of the process, and sends a failure mail to sender of the mail; otherwise the mail is transmitted through the network. By using this technique, the usage of network resources such as bandwidth, and memory is preserved. The simulation results in real network show that when the counter is set on the sender side, the time required for spam mail detection is 100 times faster than the time the counter is set on the receiver side, and the network resources are preserved largely compared with other anti-spam mail techniques in the receiver side.

Keywords: Anti-spam, Mail server, Sender side, Spam mail

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1768

10 Identification of Spam Keywords Using Hierarchical Category in C2C E-commerce

Authors: Shao Bo Cheng, Yong-Jin Han, Se Young Park, Seong-Bae Park

Abstract:

Consumer-to-Consumer (C2C) E-commerce has been growing at a very high speed in recent years. Since identical or nearly-same kinds of products compete one another by relying on keyword search in C2C E-commerce, some sellers describe their products with spam keywords that are popular but are not related to their products. Though such products get more chances to be retrieved and selected by consumers than those without spam keywords, the spam keywords mislead the consumers and waste their time. This problem has been reported in many commercial services like ebay and taobao, but there have been little research to solve this problem. As a solution to this problem, this paper proposes a method to classify whether keywords of a product are spam or not. The proposed method assumes that a keyword for a given product is more reliable if the keyword is observed commonly in specifications of products which are the same or the same kind as the given product. This is because that a hierarchical category of a product in general determined precisely by a seller of the product and so is the specification of the product. Since higher layers of the hierarchical category represent more general kinds of products, a reliable degree is differently determined according to the layers. Hence, reliable degrees from different layers of a hierarchical category become features for keywords and they are used together with features only from specifications for classification of the keywords. Support Vector Machines are adopted as a basic classifier using the features, since it is powerful, and widely used in many classification tasks. In the experiments, the proposed method is evaluated with a golden standard dataset from Yi-han-wang, a Chinese C2C E-commerce, and is compared with a baseline method that does not consider the hierarchical category. The experimental results show that the proposed method outperforms the baseline in F1-measure, which proves that spam keywords are effectively identified by a hierarchical category in C2C E-commerce.

Keywords: Spam Keyword, E-commerce, keyword features, spam filtering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2508

9 Bin Bloom Filter Using Heuristic Optimization Techniques for Spam Detection

Authors: N. Arulanand, K. Premalatha

Abstract:

Bloom filter is a probabilistic and memory efficient data structure designed to answer rapidly whether an element is present in a set. It tells that the element is definitely not in the set but its presence is with certain probability. The trade-off to use Bloom filter is a certain configurable risk of false positives. The odds of a false positive can be made very low if the number of hash function is sufficiently large. For spam detection, weight is attached to each set of elements. The spam weight for a word is a measure used to rate the e-mail. Each word is assigned to a Bloom filter based on its weight. The proposed work introduces an enhanced concept in Bloom filter called Bin Bloom Filter (BBF). The performance of BBF over conventional Bloom filter is evaluated under various optimization techniques. Real time data set and synthetic data sets are used for experimental analysis and the results are demonstrated for bin sizes 4, 5, 6 and 7. Finally analyzing the results, it is found that the BBF which uses heuristic techniques performs better than the traditional Bloom filter in spam detection.

Keywords: Cuckoo search algorithm, levy’s flight, metaheuristic, optimal weight.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2261

8 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Classification, data mining, spam filtering, naive Bayes, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1497

7 DWM-CDD: Dynamic Weighted Majority Concept Drift Detection for Spam Mail Filtering

Authors: Leili Nosrati, Alireza Nemaney Pour

Abstract:

Although e-mail is the most efficient and popular communication method, unwanted and mass unsolicited e-mails, also called spam mail, endanger the existence of the mail system. This paper proposes a new algorithm called Dynamic Weighted Majority Concept Drift Detection (DWM-CDD) for content-based filtering. The design purposes of DWM-CDD are first to accurate the performance of the previously proposed algorithms, and second to speed up the time to construct the model. The results show that DWM-CDD can detect both sudden and gradual changes quickly and accurately. Moreover, the time needed for model construction is less than previously proposed algorithms.

Keywords: Concept drift, Content-based filtering, E-mail, Spammail.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1962

6 Identification of Non-Lexicon Non-Slang Unigrams in Body-enhancement Medicinal UBE

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that advertise body-enhancement drugs. The identification is based on the requirement that the unigram is neither present in dictionary, nor is a slang term. The motives of the paper are many fold. This is an attempt to analyze spamming behaviour and employment of wordmutation technique. On the side-lines of the paper, we have attempted to better understand the spam, the slang and their interplay. The problem has been addressed by employing Tokenization technique and Unigram BOW model. We found that the non-lexicon words constitute nearly 66% of total number of lexis of corpus whereas non-slang words constitute nearly 2.4% of non-lexicon words. Further, non-lexicon non-slang unigrams composed of 2 lexicon words, form more than 71% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon non-slang unigrams in any kind of UBE.

Keywords: Body Enhancement, Lexicon, Medicinal, Slang, Unigram, Unsolicited Bulk e-mail (UBE)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1820

5 Analysis of Spamming Threats and Some Possible Solutions for Online Social Networking Sites (OSNS)

Authors: Dilip Singh Sisodia, Shrish Verma

Abstract:

In this paper we are presenting some spamming techniques their behaviour and possible solutions. We have analyzed how Spammers enters into online social networking sites (OSNSs) to target them and diverse techniques used by them for this purpose. Spamming is very common issue in present era of Internet especially through Online Social Networking Sites (like Facebook, Twitter, and Google+ etc.). Spam messages keep wasting Internet bandwidth and the storage space of servers. On social networking sites; spammers often disguise themselves by creating fake accounts and hijacking user’s accounts for personal gains. They behave like normal user and they continue to change their spamming strategy. Following spamming techniques are discussed in this paper like clickjacking, social engineered attacks, cross site scripting, URL shortening, and drive by download. We have used elgg framework for demonstration of some of spamming threats and respective implementation of solutions.

Keywords: Online social networking sites, spam attacks, Internet, clickjacking/likejacking, drive-by-download, URL shortening, cross site scripting, socially engineered attacks, elgg framework.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2359

4 A Content Vector Model for Text Classification

Authors: Eric Jiang

Abstract:

As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.

Keywords: Feature Selection, Latent Semantic Indexing, Text Classification, Vector Space Model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1885

3 Cyber Security in Nigeria: A Collaboration between Communities and Professionals

Authors: K. Alese Boniface, K. Adu Michael, K. Owa Victor

Abstract:

Security can be defined as the degree of resistance to, or protection from harm. It applies to any vulnerable and valuable assets, such as persons, dwellings, communities, nations or organizations. Cybercrime is any crime committed or facilitated via the Internet. It is any criminal activity involving computers and networks. It can range from fraud to unsolicited emails (spam). It includes the distant theft of government or corporate secrets through criminal trespass into remote systems around the globe. Nigeria like any other nations of the world is currently having her own share of the menace that has been used even as tools by terrorists. This paper is an attempt at presenting cyber security as an issue that requires a coordinated national response. It also acknowledges and advocates the key roles to be played by stakeholders and the importance of forging strong partnerships to prevent and tackle cybercrime in Nigeria.

Keywords: Security, Cybercrime, Internet, Government, Stakeholders, Partnerships.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2817

2 Improve of Evaluation Method for Information Security Levels of CIIP (Critical Information Infrastructure Protection)

Authors: Dong-Young Yoo, Jong-Whoi Shin, Gang Shin Lee, Jae-Il Lee

Abstract:

As the disfunctions of the information society and social development progress, intrusion problems such as malicious replies, spam mail, private information leakage, phishing, and pharming, and side effects such as the spread of unwholesome information and privacy invasion are becoming serious social problems. Illegal access to information is also becoming a problem as the exchange and sharing of information increases on the basis of the extension of the communication network. On the other hand, as the communication network has been constructed as an international, global system, the legal response against invasion and cyber-attack from abroad is facing its limit. In addition, in an environment where the important infrastructures are managed and controlled on the basis of the information communication network, such problems pose a threat to national security. Countermeasures to such threats are developed and implemented on a yearly basis to protect the major infrastructures of information communication. As a part of such measures, we have developed a methodology for assessing the information protection level which can be used to establish the quantitative object setting method required for the improvement of the information protection level.

Keywords: Information Security Evaluation Methodology, Critical Information Infrastructure Protection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1660

1 Evaluation of Short-Term Load Forecasting Techniques Applied for Smart Micro Grids

Authors: Xiaolei Hu, Enrico Ferrera, Riccardo Tomasi, Claudio Pastrone

Abstract:

Load Forecasting plays a key role in making today's and future's Smart Energy Grids sustainable and reliable. Accurate power consumption prediction allows utilities to organize in advance their resources or to execute Demand Response strategies more effectively, which enables several features such as higher sustainability, better quality of service, and affordable electricity tariffs. It is easy yet effective to apply Load Forecasting at larger geographic scale, i.e. Smart Micro Grids, wherein the lower available grid flexibility makes accurate prediction more critical in Demand Response applications. This paper analyses the application of short-term load forecasting in a concrete scenario, proposed within the EU-funded GreenCom project, which collect load data from single loads and households belonging to a Smart Micro Grid. Three short-term load forecasting techniques, i.e. linear regression, artificial neural networks, and radial basis function network, are considered, compared, and evaluated through absolute forecast errors and training time. The influence of weather conditions in Load Forecasting is also evaluated. A new definition of Gain is introduced in this paper, which innovatively serves as an indicator of short-term prediction capabilities of time spam consistency. Two models, 24- and 1-hour-ahead forecasting, are built to comprehensively compare these three techniques.

Keywords: Short-term load forecasting, smart micro grid, linear regression, artificial neural networks, radial basis function network, Gain.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2602