Search results for: Spam Keyword

49 Identification of Spam Keywords Using Hierarchical Category in C2C E-commerce

Authors: Shao Bo Cheng, Yong-Jin Han, Se Young Park, Seong-Bae Park

Abstract:

Consumer-to-Consumer (C2C) E-commerce has been growing at a very high speed in recent years. Since identical or nearly-same kinds of products compete one another by relying on keyword search in C2C E-commerce, some sellers describe their products with spam keywords that are popular but are not related to their products. Though such products get more chances to be retrieved and selected by consumers than those without spam keywords, the spam keywords mislead the consumers and waste their time. This problem has been reported in many commercial services like ebay and taobao, but there have been little research to solve this problem. As a solution to this problem, this paper proposes a method to classify whether keywords of a product are spam or not. The proposed method assumes that a keyword for a given product is more reliable if the keyword is observed commonly in specifications of products which are the same or the same kind as the given product. This is because that a hierarchical category of a product in general determined precisely by a seller of the product and so is the specification of the product. Since higher layers of the hierarchical category represent more general kinds of products, a reliable degree is differently determined according to the layers. Hence, reliable degrees from different layers of a hierarchical category become features for keywords and they are used together with features only from specifications for classification of the keywords. Support Vector Machines are adopted as a basic classifier using the features, since it is powerful, and widely used in many classification tasks. In the experiments, the proposed method is evaluated with a golden standard dataset from Yi-han-wang, a Chinese C2C E-commerce, and is compared with a baseline method that does not consider the hierarchical category. The experimental results show that the proposed method outperforms the baseline in F1-measure, which proves that spam keywords are effectively identified by a hierarchical category in C2C E-commerce.

Keywords: Spam Keyword, E-commerce, keyword features, spam filtering.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2457

48 Layout Based Spam Filtering

Authors: Claudiu N.Musat

Abstract:

Due to the constant increase in the volume of information available to applications in fields varying from medical diagnosis to web search engines, accurate support of similarity becomes an important task. This is also the case of spam filtering techniques where the similarities between the known and incoming messages are the fundaments of making the spam/not spam decision. We present a novel approach to filtering based solely on layout, whose goal is not only to correctly identify spam, but also warn about major emerging threats. We propose a mathematical formulation of the email message layout and based on it we elaborate an algorithm to separate different types of emails and find the new, numerically relevant spam types.

Keywords: Clustering, layout, k-means, spam.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1600

47 Spam E-mail: How Malaysian E-mail Users Deal with It?

Authors: Yanti Rosmunie Bujang, Husnayati Hussin

Abstract:

This paper attempts to discuss the spam issue from the Malaysian e-mail users- perspective. The purpose is to discover how Malaysian users handle the spam e-mail problem. From the experiences we hope to discover the necessary effort needed to be undertaken to face this problem in the context of Malaysia. A survey was conducted to understand how Malaysian individual perceived spam and what they actually do with the spam e-mail they received in their daily life. The findings indicate that the level of awareness on spam issue in action is still low and need some extra effort by government and relevant agencies to increase their level of awareness.

Keywords: E-mail, Malaysia, spam, users' perspective.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1842

46 Facebook Spam and Spam Filter Using Artificial Neural Networks

Authors: Fahim A., Mutahira N. Naseem

Abstract:

Spam is any unwanted electronic message or material in any form posted too many people. As the world is growing as global world, social networking sites play an important role in making world global providing people from different parts of the world a platform to meet and express their views. Among different social networking sites Facebook become the leading one. With increase in usage different users start abusive use of Facebook by posting or creating ways to post spam. This paper highlights the potential spam types nowadays Facebook users’ faces. This paper also provide the reason how user become victim to spam attack. A methodology is proposed in the end discusses how to handle different types of spam.

Keywords: Artificial neural networks, Facebook spam, social networking sites, spam filter.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 3086

45 Image Spam Detection Using Color Features and K-Nearest Neighbor Classification

Authors: T. Kumaresan, S. Sanjushree, C. Palanisamy

Abstract:

Image spam is a kind of email spam where the spam text is embedded with an image. It is a new spamming technique being used by spammers to send their messages to bulk of internet users. Spam email has become a big problem in the lives of internet users, causing time consumption and economic losses. The main objective of this paper is to detect the image spam by using histogram properties of an image. Though there are many techniques to automatically detect and avoid this problem, spammers employing new tricks to bypass those techniques, as a result those techniques are inefficient to detect the spam mails. In this paper we have proposed a new method to detect the image spam. Here the image features are extracted by using RGB histogram, HSV histogram and combination of both RGB and HSV histogram. Based on the optimized image feature set classification is done by using k- Nearest Neighbor(k-NN) algorithm. Experimental result shows that our method has achieved better accuracy. From the result it is known that combination of RGB and HSV histogram with k-NN algorithm gives the best accuracy in spam detection.

Keywords: File Type, HSV Histogram, k-NN, RGB Histogram, Spam Detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2086

44 Adaptive Naïve Bayesian Anti-Spam Engine

Authors: Wojciech P. Gajewski

Abstract:

The problem of spam has been seriously troubling the Internet community during the last few years and currently reached an alarming scale. Observations made at CERN (European Organization for Nuclear Research located in Geneva, Switzerland) show that spam mails can constitute up to 75% of daily SMTP traffic. A naïve Bayesian classifier based on a Bag Of Words representation of an email is widely used to stop this unwanted flood as it combines good performance with simplicity of the training and classification processes. However, facing the constantly changing patterns of spam, it is necessary to assure online adaptability of the classifier. This work proposes combining such a classifier with another NBC (naïve Bayesian classifier) based on pairs of adjacent words. Only the latter will be retrained with examples of spam reported by users. Tests are performed on considerable sets of mails both from public spam archives and CERN mailboxes. They suggest that this architecture can increase spam recall without affecting the classifier precision as it happens when only the NBC based on single words is retrained.

Keywords: Text classification, naïve Bayesian classification, spam, email.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4363

43 Analysis of Classifications of Unsolicited Bulk Emails

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

In recent times, the problem of Unsolicited Bulk Email (UBE) or commonly known as Spam Email, has increased at a tremendous growth rate. We present an analysis of survey based on classifications of UBE in various research works. There are many research instances for classification between spam and non-spam emails but very few research instances are available for classification of spam emails, per se. This paper does not intend to assert some UBE classification to be better than the others nor does it propose any new classification but it bemoans the lack of harmony on number and definition of categories proposed by different researchers. The paper also elaborates on factors like intent of spammer, content of UBE and ambiguity in different categories as proposed in related research works of classifications of UBE.

Keywords: E-mail, Scams, Spam Email, Unsolicited Bulk Email(UBE)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1671

42 A Framework for Review Spam Detection Research

Authors: Mohammadali Tavakoli, Atefeh Heydari, Zuriati Ismail, Naomie Salim

Abstract:

With the increasing number of people reviewing products online in recent years, opinion sharing websites has become the most important source of customers’ opinions. Unfortunately, spammers generate and post fake reviews in order to promote or demote brands and mislead potential customers. These are notably destructive not only for potential customers, but also for business holders and manufacturers. However, research in this area is not adequate, and many critical problems related to spam detection have not been solved to date. To provide green researchers in the domain with a great aid, in this paper, we have attempted to create a highquality framework to make a clear vision on review spam-detection methods. In addition, this report contains a comprehensive collection of detection metrics used in proposed spam-detection approaches. These metrics are extremely applicable for developing novel detection methods.

Keywords: Fake reviews, Feature collection, Opinion spam, Spam detection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2461

41 An Efficient Spam Mail Detection by Counter Technique

Authors: Raheleh Kholghi, Soheil Behnam Roudsari, Alireza Nemaney Pour

Abstract:

Spam mails are unwanted mails sent to large number of users. Spam mails not only consume the network resources, but cause security threats as well. This paper proposes an efficient technique to detect, and to prevent spam mail in the sender side rather than the receiver side. This technique is based on a counter set on the sender server. When a mail is transmitted to the server, the mail server checks the number of the recipients based on its counter policy. The counter policy performed by the mail server is based on some pre-defined criteria. When the number of recipients exceeds the counter policy, the mail server discontinues the rest of the process, and sends a failure mail to sender of the mail; otherwise the mail is transmitted through the network. By using this technique, the usage of network resources such as bandwidth, and memory is preserved. The simulation results in real network show that when the counter is set on the sender side, the time required for spam mail detection is 100 times faster than the time the counter is set on the receiver side, and the network resources are preserved largely compared with other anti-spam mail techniques in the receiver side.

Keywords: Anti-spam, Mail server, Sender side, Spam mail

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1729

40 Bin Bloom Filter Using Heuristic Optimization Techniques for Spam Detection

Authors: N. Arulanand, K. Premalatha

Abstract:

Bloom filter is a probabilistic and memory efficient data structure designed to answer rapidly whether an element is present in a set. It tells that the element is definitely not in the set but its presence is with certain probability. The trade-off to use Bloom filter is a certain configurable risk of false positives. The odds of a false positive can be made very low if the number of hash function is sufficiently large. For spam detection, weight is attached to each set of elements. The spam weight for a word is a measure used to rate the e-mail. Each word is assigned to a Bloom filter based on its weight. The proposed work introduces an enhanced concept in Bloom filter called Bin Bloom Filter (BBF). The performance of BBF over conventional Bloom filter is evaluated under various optimization techniques. Real time data set and synthetic data sets are used for experimental analysis and the results are demonstrated for bin sizes 4, 5, 6 and 7. Finally analyzing the results, it is found that the BBF which uses heuristic techniques performs better than the traditional Bloom filter in spam detection.

Keywords: Cuckoo search algorithm, levy’s flight, metaheuristic, optimal weight.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2211

39 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: Classification, data mining, spam filtering, naive Bayes, decision tree.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1445

38 Keyword Network Analysis on the Research Trends of Life-Long Education for People with Disabilities in Korea

Authors: Jakyoung Kim, Sungwook Jang

Abstract:

The purpose of this study is to examine the research trends of life-long education for people with disabilities using a keyword network analysis. For this purpose, 151 papers were selected from 594 papers retrieved using keywords such as 'people with disabilities' and 'life-long education' in the Korean Education and Research Information Service. The Keyword network analysis was constructed by extracting and coding the keyword used in the title of the selected papers. The frequency of the extracted keywords, the centrality of degree, and betweenness was analyzed by the keyword network. The results of the keyword network analysis are as follows. First, the main keywords that appeared frequently in the study of life-long education for people with disabilities were 'people with disabilities', 'life-long education', 'developmental disabilities', 'current situations', 'development'. The research trends of life-long education for people with disabilities are focused on the current status of the life-long education and the program development. Second, the keyword network analysis and visualization showed that the keywords with high frequency of occurrences also generally have high degree centrality and betweenness centrality. In terms of the keyword network diagram, it was confirmed that research trends of life-long education for people with disabilities are centered on six prominent keywords. Based on these results, it was discussed that life-long education for people with disabilities in the future needs to expand the subjects and the supporting areas of the life-long education, and the research needs to be further expanded into more detailed and specific areas.

Keywords: Life-long education, people with disabilities, research trends, keyword network analysis.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1193

37 Identifying Potential Partnership for Open Innovation by using Bibliographic Coupling and Keyword Vector Mapping

Authors: Inchae Park, Byungun Yoon

Abstract:

As open innovation has received increasingly attention in the management of innovation, the importance of identifying potential partnership is increasing. This paper suggests a methodology to identify the interested parties as one of Innovation intermediaries to enable open innovation with patent network. To implement the methodology, multi-stage patent citation analysis such as bibliographic coupling and information visualization method such as keyword vector mapping are utilized. This paper has contribution in that it can present meaningful collaboration keywords to identified potential partners in network since not only citation information but also patent textual information is used.

Keywords: Open innovation, partner selection, bibliographic coupling, Keyword vector mapping, patent network.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1743

36 DWM-CDD: Dynamic Weighted Majority Concept Drift Detection for Spam Mail Filtering

Authors: Leili Nosrati, Alireza Nemaney Pour

Abstract:

Although e-mail is the most efficient and popular communication method, unwanted and mass unsolicited e-mails, also called spam mail, endanger the existence of the mail system. This paper proposes a new algorithm called Dynamic Weighted Majority Concept Drift Detection (DWM-CDD) for content-based filtering. The design purposes of DWM-CDD are first to accurate the performance of the previously proposed algorithms, and second to speed up the time to construct the model. The results show that DWM-CDD can detect both sudden and gradual changes quickly and accurately. Moreover, the time needed for model construction is less than previously proposed algorithms.

Keywords: Concept drift, Content-based filtering, E-mail, Spammail.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1907

35 An Open Source Advertisement System

Authors: Pushkar Umaranikar, Chris Pollett

Abstract:

An online advertisement system and its implementation for the Yioop open source search engine are presented. This system supports both selling advertisements and displaying them within search results. The selling of advertisements is done using a system to auction off daily impressions for keyword searches. This is an open, ascending price auction system in which all accepted bids will receive a fraction of the auctioned day’s impressions. New bids in our system are required to be at least one half of the sum of all previous bids ensuring the number of accepted bids is logarithmic in the total ad spend on a keyword for a day. The mechanics of creating an advertisement, attaching keywords to it, and adding it to an advertisement inventory are described. The algorithm used to go from accepted bids for a keyword to which ads are displayed at search time is also presented. We discuss properties of our system and compare it to existing auction systems and systems for selling online advertisements.

Keywords: Online markets, online ad system, online auctions, search engines.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1314

34 Effective Keyword and Similarity Thresholds for the Discovery of Themes from the User Web Access Patterns

Authors: Haider A Ramadhan, Khalil Shihab

Abstract:

Clustering techniques have been used by many intelligent software agents to group similar access patterns of the Web users into high level themes which express users intentions and interests. However, such techniques have been mostly focusing on one salient feature of the Web document visited by the user, namely the extracted keywords. The major aim of these techniques is to come up with an optimal threshold for the number of keywords needed to produce more focused themes. In this paper we focus on both keyword and similarity thresholds to generate themes with concentrated themes, and hence build a more sound model of the user behavior. The purpose of this paper is two fold: use distance based clustering methods to recognize overall themes from the Proxy log file, and suggest an efficient cut off levels for the keyword and similarity thresholds which tend to produce more optimal clusters with better focus and efficient size.

Keywords: Data mining, knowledge discovery, clustering, dataanalysis, Web log analysis, theme based searching.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1407

33 Image Retrieval: Techniques, Challenge, and Trend

Authors: Hui Hui Wang, Dzulkifli Mohamad, N.A Ismail

Abstract:

This paper attempts to discuss the evolution of the retrieval techniques focusing on development, challenges and trends of the image retrieval. It highlights both the already addressed and outstanding issues. The explosive growth of image data leads to the need of research and development of Image Retrieval. However, Image retrieval researches are moving from keyword, to low level features and to semantic features. Drive towards semantic features is due to the problem of the keywords which can be very subjective and time consuming while low level features cannot always describe high level concepts in the users- mind.

Keywords: content based image retrieval, keyword based imageretrieval, semantic gap, semantic image retrieval.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2477

32 Identification of Non-Lexicon Non-Slang Unigrams in Body-enhancement Medicinal UBE

Authors: Jatinderkumar R. Saini, Apurva A. Desai

Abstract:

Email has become a fast and cheap means of online communication. The main threat to email is Unsolicited Bulk Email (UBE), commonly called spam email. The current work aims at identification of unigrams in more than 2700 UBE that advertise body-enhancement drugs. The identification is based on the requirement that the unigram is neither present in dictionary, nor is a slang term. The motives of the paper are many fold. This is an attempt to analyze spamming behaviour and employment of wordmutation technique. On the side-lines of the paper, we have attempted to better understand the spam, the slang and their interplay. The problem has been addressed by employing Tokenization technique and Unigram BOW model. We found that the non-lexicon words constitute nearly 66% of total number of lexis of corpus whereas non-slang words constitute nearly 2.4% of non-lexicon words. Further, non-lexicon non-slang unigrams composed of 2 lexicon words, form more than 71% of the total number of such unigrams. To the best of our knowledge, this is the first attempt to analyze usage of non-lexicon non-slang unigrams in any kind of UBE.

Keywords: Body Enhancement, Lexicon, Medicinal, Slang, Unigram, Unsolicited Bulk e-mail (UBE)

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1766

31 Interactive, Topic-Oriented Search Support by a Centroid-Based Text Categorisation

Authors: Mario Kubek, Herwig Unger

Abstract:

Centroid terms are single words that semantically and topically characterise text documents and so may serve as their very compact representation in automatic text processing. In the present paper, centroids are used to measure the relevance of text documents with respect to a given search query. Thus, a new graphbased paradigm for searching texts in large corpora is proposed and evaluated against keyword-based methods. The first, promising experimental results demonstrate the usefulness of the centroid-based search procedure. It is shown that especially the routing of search queries in interactive and decentralised search systems can be greatly improved by applying this approach. A detailed discussion on further fields of its application completes this contribution.

Keywords: Search algorithm, centroid, query, keyword, cooccurrence, categorisation.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 576

30 Analyzing Keyword Networks for the Identification of Correlated Research Topics

Authors: Thiago M. R. Dias, Patrícia M. Dias, Gray F. Moita

Abstract:

The production and publication of scientific works have increased significantly in the last years, being the Internet the main factor of access and distribution of these works. Faced with this, there is a growing interest in understanding how scientific research has evolved, in order to explore this knowledge to encourage research groups to become more productive. Therefore, the objective of this work is to explore repositories containing data from scientific publications and to characterize keyword networks of these publications, in order to identify the most relevant keywords, and to highlight those that have the greatest impact on the network. To do this, each article in the study repository has its keywords extracted and in this way the network is characterized, after which several metrics for social network analysis are applied for the identification of the highlighted keywords.

Keywords: Extraction and data integration, bibliometrics, scientometrics.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 639

29 Analysis of Spamming Threats and Some Possible Solutions for Online Social Networking Sites (OSNS)

Authors: Dilip Singh Sisodia, Shrish Verma

Abstract:

In this paper we are presenting some spamming techniques their behaviour and possible solutions. We have analyzed how Spammers enters into online social networking sites (OSNSs) to target them and diverse techniques used by them for this purpose. Spamming is very common issue in present era of Internet especially through Online Social Networking Sites (like Facebook, Twitter, and Google+ etc.). Spam messages keep wasting Internet bandwidth and the storage space of servers. On social networking sites; spammers often disguise themselves by creating fake accounts and hijacking user’s accounts for personal gains. They behave like normal user and they continue to change their spamming strategy. Following spamming techniques are discussed in this paper like clickjacking, social engineered attacks, cross site scripting, URL shortening, and drive by download. We have used elgg framework for demonstration of some of spamming threats and respective implementation of solutions.

Keywords: Online social networking sites, spam attacks, Internet, clickjacking/likejacking, drive-by-download, URL shortening, cross site scripting, socially engineered attacks, elgg framework.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2312

28 Automatic Text Summarization

Authors: Mohamed Abdel Fattah, Fuji Ren

Abstract:

This work proposes an approach to address automatic text summarization. This approach is a trainable summarizer, which takes into account several features, including sentence position, positive keyword, negative keyword, sentence centrality, sentence resemblance to the title, sentence inclusion of name entity, sentence inclusion of numerical data, sentence relative length, Bushy path of the sentence and aggregated similarity for each sentence to generate summaries. First we investigate the effect of each sentence feature on the summarization task. Then we use all features score function to train genetic algorithm (GA) and mathematical regression (MR) models to obtain a suitable combination of feature weights. The proposed approach performance is measured at several compression rates on a data corpus composed of 100 English religious articles. The results of the proposed approach are promising.

Keywords: Automatic Summarization, Genetic Algorithm, Mathematical Regression, Text Features.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2273

27 A Text Mining Technique Using Association Rules Extraction

Authors: Hany Mahgoub, Dietmar Rösner, Nabil Ismail, Fawzy Torkey

Abstract:

This paper describes text mining technique for automatically extracting association rules from collections of textual documents. The technique called, Extracting Association Rules from Text (EART). It depends on keyword features for discover association rules amongst keywords labeling the documents. In this work, the EART system ignores the order in which the words occur, but instead focusing on the words and their statistical distributions in documents. The main contributions of the technique are that it integrates XML technology with Information Retrieval scheme (TFIDF) (for keyword/feature selection that automatically selects the most discriminative keywords for use in association rules generation) and use Data Mining technique for association rules discovery. It consists of three phases: Text Preprocessing phase (transformation, filtration, stemming and indexing of the documents), Association Rule Mining (ARM) phase (applying our designed algorithm for Generating Association Rules based on Weighting scheme GARW) and Visualization phase (visualization of results). Experiments applied on WebPages news documents related to the outbreak of the bird flu disease. The extracted association rules contain important features and describe the informative news included in the documents collection. The performance of the EART system compared with another system that uses the Apriori algorithm throughout the execution time and evaluating extracted association rules.

Keywords: Text mining, data mining, association rule mining

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 4372

26 A Content Vector Model for Text Classification

Authors: Eric Jiang

Abstract:

As a popular rank-reduced vector space approach, Latent Semantic Indexing (LSI) has been used in information retrieval and other applications. In this paper, an LSI-based content vector model for text classification is presented, which constructs multiple augmented category LSI spaces and classifies text by their content. The model integrates the class discriminative information from the training data and is equipped with several pertinent feature selection and text classification algorithms. The proposed classifier has been applied to email classification and its experiments on a benchmark spam testing corpus (PU1) have shown that the approach represents a competitive alternative to other email classifiers based on the well-known SVM and naïve Bayes algorithms.

Keywords: Feature Selection, Latent Semantic Indexing, Text Classification, Vector Space Model.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1838

25 Cyber Security in Nigeria: A Collaboration between Communities and Professionals

Authors: K. Alese Boniface, K. Adu Michael, K. Owa Victor

Abstract:

Security can be defined as the degree of resistance to, or protection from harm. It applies to any vulnerable and valuable assets, such as persons, dwellings, communities, nations or organizations. Cybercrime is any crime committed or facilitated via the Internet. It is any criminal activity involving computers and networks. It can range from fraud to unsolicited emails (spam). It includes the distant theft of government or corporate secrets through criminal trespass into remote systems around the globe. Nigeria like any other nations of the world is currently having her own share of the menace that has been used even as tools by terrorists. This paper is an attempt at presenting cyber security as an issue that requires a coordinated national response. It also acknowledges and advocates the key roles to be played by stakeholders and the importance of forging strong partnerships to prevent and tackle cybercrime in Nigeria.

Keywords: Security, Cybercrime, Internet, Government, Stakeholders, Partnerships.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2762

24 Lessons Learned from Observing User Behavior through Repeated Usability Evaluations

Authors: Hanmin Jung, Mikyoung Lee, Won-kyung Sung

Abstract:

Academic research information service is a must for surveying previous studies in research and development process. OntoFrame is an academic research information service under Semantic Web framework different from simple keyword-based services such as CiteSeer and Google Scholar. The first purpose of this study is for revealing user behavior in their surveys, the objects of using academic research information services, and their needs. The second is for applying lessons learned from the results to OntoFrame.

Keywords: User Behavior, Usability Evaluation, OntoFrame, CiteSeer, Google Scholar, Academic Research Information Service.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1471

23 Improve of Evaluation Method for Information Security Levels of CIIP (Critical Information Infrastructure Protection)

Authors: Dong-Young Yoo, Jong-Whoi Shin, Gang Shin Lee, Jae-Il Lee

Abstract:

As the disfunctions of the information society and social development progress, intrusion problems such as malicious replies, spam mail, private information leakage, phishing, and pharming, and side effects such as the spread of unwholesome information and privacy invasion are becoming serious social problems. Illegal access to information is also becoming a problem as the exchange and sharing of information increases on the basis of the extension of the communication network. On the other hand, as the communication network has been constructed as an international, global system, the legal response against invasion and cyber-attack from abroad is facing its limit. In addition, in an environment where the important infrastructures are managed and controlled on the basis of the information communication network, such problems pose a threat to national security. Countermeasures to such threats are developed and implemented on a yearly basis to protect the major infrastructures of information communication. As a part of such measures, we have developed a methodology for assessing the information protection level which can be used to establish the quantitative object setting method required for the improvement of the information protection level.

Keywords: Information Security Evaluation Methodology, Critical Information Infrastructure Protection.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1613

22 A Review of Existing Turnover Intention Theories

Authors: Pauline E. Ngo-Henha

Abstract:

Existing turnover intention theories are reviewed in this paper. This review was conducted with the help of the search keyword “turnover intention theories” in Google Scholar during the month of July 2017. These theories include: The Theory of Organizational Equilibrium (TOE), Social Exchange Theory, Job Embeddedness Theory, Herzberg’s Two-Factor Theory, the Resource-Based View, Equity Theory, Human Capital Theory, and the Expectancy Theory. One of the limitations of this review paper is that data were only collected from Google Scholar where many papers were sometimes not freely accessible. However, this paper attempts to contribute to the research in clarifying the distinction between theories and models in the context of turnover intention.

Keywords: Job embeddedness theory, theory of organizational equilibrium (TOE), Herzberg’s two-factor theory, turnover intention theories, theories and models.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 21983

21 A Secure Mobile OTP Authentication Scheme for User Mobility Cloud VDI Environment

Authors: Jong-won Lee

Abstract:

Since Cloud environment has appeared as the most powerful keyword in the computing industry, the growth in VDI (Virtual Desktop Infrastructure) became remarkable in domestic market. In recent years, with the trend that mobile devices such as smartphones and pads spread so rapidly, the strengths of VDI that allows people to access and perform business on the move along with companies' office needs expedite more rapid spread of VDI. In this paper, mobile OTP (One-Time Password) authentication method is proposed to secure mobile device portability through rapid and secure authentication using mobile devices such as mobile phones or pads, which does not require additional purchase or possession of OTP tokens of users. To facilitate diverse and wide use of Services in the future, service should be continuous and stable, and above all, security should be considered the most important to meet advanced portability and user accessibility, the strengths of VDI.

Keywords: Cloud, VDI, OTP, Mobility

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 2000

20 Text Summarization for Oil and Gas Drilling Topic

Authors: Y. Y. Chen, O. M. Foong, S. P. Yong, Kurniawan Iwan

Abstract:

Information sharing and gathering are important in the rapid advancement era of technology. The existence of WWW has caused rapid growth of information explosion. Readers are overloaded with too many lengthy text documents in which they are more interested in shorter versions. Oil and gas industry could not escape from this predicament. In this paper, we develop an Automated Text Summarization System known as AutoTextSumm to extract the salient points of oil and gas drilling articles by incorporating statistical approach, keywords identification, synonym words and sentence-s position. In this study, we have conducted interviews with Petroleum Engineering experts and English Language experts to identify the list of most commonly used keywords in the oil and gas drilling domain. The system performance of AutoTextSumm is evaluated using the formulae of precision, recall and F-score. Based on the experimental results, AutoTextSumm has produced satisfactory performance with F-score of 0.81.

Keywords: Keyword's probability, synonym sets.

Procedia APA BibTeX Chicago EndNote Harvard JSON MLA RIS XML ISO 690 PDF Downloads 1681