Commenced in January 2007

Frequency: Monthly

Edition: International

Paper Count: 94

Search results for: spam keyword

94 Identification of Spam Keywords Using Hierarchical Category in C2C E-Commerce

Authors: Shao Bo Cheng, Yong-Jin Han, Se Young Park, Seong-Bae Park

Abstract:

Consumer-to-Consumer (C2C) E-commerce has been growing at a very high speed in recent years. Since identical or nearly-same kinds of products compete one another by relying on keyword search in C2C E-commerce, some sellers describe their products with spam keywords that are popular but are not related to their products. Though such products get more chances to be retrieved and selected by consumers than those without spam keywords, the spam keywords mislead the consumers and waste their time. This problem has been reported in many commercial services like e-bay and taobao, but there have been little research to solve this problem. As a solution to this problem, this paper proposes a method to classify whether keywords of a product are spam or not. The proposed method assumes that a keyword for a given product is more reliable if the keyword is observed commonly in specifications of products which are the same or the same kind as the given product. This is because that a hierarchical category of a product in general determined precisely by a seller of the product and so is the specification of the product. Since higher layers of the hierarchical category represent more general kinds of products, a reliable degree is differently determined according to the layers. Hence, reliable degrees from different layers of a hierarchical category become features for keywords and they are used together with features only from specifications for classification of the keywords. Support Vector Machines are adopted as a basic classifier using the features, since it is powerful, and widely used in many classification tasks. In the experiments, the proposed method is evaluated with a golden standard dataset from Yi-han-wang, a Chinese C2C e-commerce, and is compared with a baseline method that does not consider the hierarchical category. The experimental results show that the proposed method outperforms the baseline in F1-measure, which proves that spam keywords are effectively identified by a hierarchical category in C2C e-commerce.

Keywords: spam keyword, e-commerce, keyword features, spam ﬁltering

Procedia PDF Downloads 266

93 Facebook Spam and Spam Filter Using Artificial Neural Networks

Authors: A. Fahim, Mutahira N. Naseem

Abstract:

SPAM is any unwanted electronic message or material in any form posted to many people. As the world is growing as global world, social networking sites play an important role in making world global providing people from different parts of the world a platform to meet and express their views. Among different social networking sites facebook become the leading one. With increase in usage different users start abusive use of facebook by posting or creating ways to post spam. This paper highlights the potential spam types nowadays facebook users faces. This paper also provide the reason how user become victim to spam attack. A methodology is proposed in the end discusses how to handle different types of spam.

Keywords: artificial neural networks, facebook spam, social networking sites, spam filter

Procedia PDF Downloads 341

92 A Framework for Review Spam Detection Research

Authors: Mohammadali Tavakoli, Atefeh Heydari, Zuriati Ismail, Naomie Salim

Abstract:

With the increasing number of people reviewing products online in recent years, opinion sharing websites has become the most important source of customers’ opinions. Unfortunately, spammers generate and post fake reviews in order to promote or demote brands and mislead potential customers. These are notably destructive not only for potential customers but also for business holders and manufacturers. However, research in this area is not adequate, and many critical problems related to spam detection have not been solved to date. To provide green researchers in the domain with a great aid, in this paper, we have attempted to create a high-quality framework to make a clear vision on review spam-detection methods. In addition, this report contains a comprehensive collection of detection metrics used in proposed spam-detection approaches. These metrics are extremely applicable for developing novel detection methods.

Keywords: fake reviews, feature collection, opinion spam, spam detection

Procedia PDF Downloads 381

91 A Supervised Approach for Detection of Singleton Spam Reviews

Authors: Atefeh Heydari, Mohammadali Tavakoli, Naomie Salim

Abstract:

In recent years, we have witnessed that online reviews are the most important source of customers’ opinion. They are progressively more used by individuals and organisations to make purchase and business decisions. Unfortunately, for the reason of profit or fame, frauds produce deceptive reviews to hoodwink potential customers. Their activities mislead not only potential customers to make appropriate purchasing decisions and organisations to reshape their business, but also opinion mining techniques by preventing them from reaching accurate results. Spam reviews could be divided into two main groups, i.e. multiple and singleton spam reviews. Detecting a singleton spam review that is the only review written by a user ID is extremely challenging due to lack of clue for detection purposes. Singleton spam reviews are very harmful and various features and proofs used in multiple spam reviews detection are not applicable in this case. Current research aims to propose a novel supervised technique to detect singleton spam reviews. To achieve this, various features are proposed in this study and are to be combined with the most appropriate features extracted from literature and employed in a classifier. In order to compare the performance of different classifiers, SVM and naive Bayes classification algorithms were used for model building. The results revealed that SVM was more accurate than naive Bayes and our proposed technique is capable to detect singleton spam reviews effectively.

Keywords: classification algorithms, Naïve Bayes, opinion review spam detection, singleton review spam detection, support vector machine

Procedia PDF Downloads 277

90 A Comparative Analysis of Hyper-Parameters Using Neural Networks for E-Mail Spam Detection

Authors: Syed Mahbubuz Zaman, A. B. M. Abrar Haque, Mehedi Hassan Nayeem, Misbah Uddin Sagor

Abstract:

Everyday e-mails are being used by millions of people as an effective form of communication over the Internet. Although e-mails allow high-speed communication, there is a constant threat known as spam. Spam e-mail is often called junk e-mails which are unsolicited and sent in bulk. These unsolicited emails cause security concerns among internet users because they are being exposed to inappropriate content. There is no guaranteed way to stop spammers who use static filters as they are bypassed very easily. In this paper, a smart system is proposed that will be using neural networks to approach spam in a different way, and meanwhile, this will also detect the most relevant features that will help to design the spam filter. Also, a comparison of different parameters for different neural network models has been shown to determine which model works best within suitable parameters.

Keywords: long short-term memory, bidirectional long short-term memory, gated recurrent unit, natural language processing, natural language processing

Procedia PDF Downloads 174

89 Towards an Adversary-Aware ML-Based Detector of Spam on Twitter Hashtags

Authors: Niddal Imam, Vassilios G. Vassilakis

Abstract:

After analysing messages posted by health-related spam campaigns in Twitter Arabic hashtags, we found that these campaigns use unique hijacked accounts (we call them adversarial hijacked accounts) as adversarial examples to fool deployed ML-based spam detectors. Existing ML-based models build a behaviour profile for each user to detect hijacked accounts. This approach is not applicable for detecting spam in Twitter hashtags since they are computationally expensive. Hence, we propose an adversary-aware ML-based detector, which includes a newly designed feature (avg posts) to improve the detection of spam tweets posted by the adversarial hijacked accounts at a tweet-level in trending hashtags. The proposed detector was designed considering three key points: robustness, adaptability, and interpretability. The new feature leverages the account’s temporal patterns (i.e., account age and number of posts). It is faster to compute compared to features discussed in the literature and improves the accuracy of detecting the identified hijacked accounts by 73%.

Keywords: Twitter spam detection, adversarial examples, evasion attack, adversarial concept drift, account hijacking, trending hashtag

Procedia PDF Downloads 33

88 The Impact of Keyword and Full Video Captioning on Listening Comprehension

Authors: Elias Bensalem

Abstract:

This study investigates the effect of two types of captioning (full and keyword captioning) on listening comprehension. Thirty-six university-level EFL students participated in the study. They were randomly assigned to watch three video clips under three conditions. The first group watched the video clips with full captions. The second group watched the same video clips with keyword captions. The control group watched the video clips without captions. After watching each clip, participants took a listening comprehension test. At the end of the experiment, participants completed a questionnaire to measure their perceptions about the use of captions and the video clips they watched. Results indicated that the full captioning group significantly outperformed both the keyword captioning and the no captioning group on the listening comprehension tests. However, this study did not find any significant difference between the keyword captioning group and the no captioning group. Results of the survey suggest that keyword captioning were a source of distraction for participants.

Keywords: captions, EFL, listening comprehension, video

Procedia PDF Downloads 227

87 Performance Comparison of ADTree and Naive Bayes Algorithms for Spam Filtering

Authors: Thanh Nguyen, Andrei Doncescu, Pierre Siegel

Abstract:

Classification is an important data mining technique and could be used as data filtering in artificial intelligence. The broad application of classification for all kind of data leads to be used in nearly every field of our modern life. Classification helps us to put together different items according to the feature items decided as interesting and useful. In this paper, we compare two classification methods Naïve Bayes and ADTree use to detect spam e-mail. This choice is motivated by the fact that Naive Bayes algorithm is based on probability calculus while ADTree algorithm is based on decision tree. The parameter settings of the above classifiers use the maximization of true positive rate and minimization of false positive rate. The experiment results present classification accuracy and cost analysis in view of optimal classifier choice for Spam Detection. It is point out the number of attributes to obtain a tradeoff between number of them and the classification accuracy.

Keywords: classification, data mining, spam filtering, naive bayes, decision tree

Procedia PDF Downloads 381

86 Empirical Study on Factors Influencing SEO

Authors: Pakinee Aimmanee, Phoom Chokratsamesiri

Abstract:

Search engine has become an essential tool nowadays for people to search for their needed information on the internet. In this work, we evaluate the performance of the search engine from three factors: the keyword frequency, the number of inbound links, and the difficulty of the keyword. The evaluations are based on the ranking position and the number of days that Google has seen or detect the webpage. We find that the keyword frequency and the difficulty of the keyword do not affect the Google ranking where the number of inbound links gives remarkable improvement of the ranking position. The optimal number of inbound links found in the experiment is 10.

Keywords: SEO, information retrieval, web search, knowledge technologies

Procedia PDF Downloads 255

85 Sentiment Analysis of Ensemble-Based Classifiers for E-Mail Data

Authors: Muthukumarasamy Govindarajan

Abstract:

Detection of unwanted, unsolicited mails called spam from email is an interesting area of research. It is necessary to evaluate the performance of any new spam classifier using standard data sets. Recently, ensemble-based classifiers have gained popularity in this domain. In this research work, an efficient email filtering approach based on ensemble methods is addressed for developing an accurate and sensitive spam classifier. The proposed approach employs Naive Bayes (NB), Support Vector Machine (SVM) and Genetic Algorithm (GA) as base classifiers along with different ensemble methods. The experimental results show that the ensemble classifier was performing with accuracy greater than individual classifiers, and also hybrid model results are found to be better than the combined models for the e-mail dataset. The proposed ensemble-based classifiers turn out to be good in terms of classification accuracy, which is considered to be an important criterion for building a robust spam classifier.

Keywords: accuracy, arcing, bagging, genetic algorithm, Naive Bayes, sentiment mining, support vector machine

Procedia PDF Downloads 111

84 Keyword Network Analysis on the Research Trends of Life-Long Education for People with Disabilities in Korea

Authors: Jakyoung Kim, Sungwook Jang

Abstract:

The purpose of this study is to examine the research trends of life-long education for people with disabilities using a keyword network analysis. For this purpose, 151 papers were selected from 594 papers retrieved using keywords such as 'people with disabilities' and 'life-long education' in the Korean Education and Research Information Service. The Keyword network analysis was constructed by extracting and coding the keyword used in the title of the selected papers. The frequency of the extracted keywords, the centrality of degree, and betweenness was analyzed by the keyword network. The results of the keyword network analysis are as follows. First, the main keywords that appeared frequently in the study of life-long education for people with disabilities were 'people with disabilities', 'life-long education', 'developmental disabilities', 'current situations', 'development'. The research trends of life-long education for people with disabilities are focused on the current status of the life-long education and the program development. Second, the keyword network analysis and visualization showed that the keywords with high frequency of occurrences also generally have high degree centrality and betweenness centrality. In terms of the keyword network diagram, it was confirmed that research trends of life-long education for people with disabilities are centered on six prominent keywords. Based on these results, it was discussed that life-long education for people with disabilities in the future needs to expand the subjects and the supporting areas of the life-long education, and the research needs to be further expanded into more detailed and specific areas.

Keywords: life-long education, people with disabilities, research trends, keyword network analysis

Procedia PDF Downloads 309

83 Modified Active (MA) Algorithm to Generate Semantic Web Related Clustered Hierarchy for Keyword Search

Authors: G. Leena Giri, Archana Mathur, S. H. Manjula, K. R. Venugopal, L. M. Patnaik

Abstract:

Keyword search in XML documents is based on the notion of lowest common ancestors in the labelled trees model of XML documents and has recently gained a lot of research interest in the database community. In this paper, we propose the Modified Active (MA) algorithm which is an improvement over the active clustering algorithm by taking into consideration the entity aspect of the nodes to find the level of the node pertaining to a particular keyword input by the user. A portion of the bibliography database is used to experimentally evaluate the modified active algorithm and results show that it performs better than the active algorithm. Our modification improves the response time of the system and thereby increases the efficiency of the system.

Keywords: keyword matching patterns, MA algorithm, semantic search, knowledge management

Procedia PDF Downloads 372

82 Analysis of Spamming Threats and Some Possible Solutions for Online Social Networking Sites (OSNS)

Authors: Dilip Singh Sisodia, Shrish Verma

Abstract:

Spamming is the most common issue seen nowadays in the Internet especially in Online Social Networking Sites (like Facebook, Twitter, and Google+ etc.). Spam messages keep wasting Internet bandwidth and the storage space of servers. On social network sites; spammers often disguise themselves by creating fake accounts and hijacking user’s accounts for personal gains. They behave like normal user and they continue to change their spamming strategy. To prevent this, most modern spam-filtering solutions are deployed on the receiver side; they are good at filtering spam for end users. In this paper we are presenting some spamming techniques their behaviour and possible solutions. We have analyzed how Spammers enters into online social networking sites (OSNSs) and how they target it and the techniques they use for it. The five discussed techniques of spamming techniques which are clickjacking, social engineered attacks, cross site scripting, URL shortening, and drive by download. We have used elgg framework for demonstration of some of spamming threats and respective implementation of solutions.

Keywords: online social networking sites, spam, attacks, internet, clickjacking / likejacking, drive-by-download, URL shortening, networking, socially engineered attacks, elgg framework

Procedia PDF Downloads 299

81 Optimization Query Image Using Search Relevance Re-Ranking Process

Authors: T. G. Asmitha Chandini

Abstract:

Web-based image search re-ranking, as an successful method to get better the results. In a query keyword, the first stair is store the images is first retrieve based on the text-based information. The user to select a query keywordimage, by using this query keyword other images are re-ranked based on their visual properties with images.Now a day to day, people projected to match images in a semantic space which is used attributes or reference classes closely related to the basis of semantic image. though, understanding a worldwide visual semantic space to demonstrate highly different images from the web is difficult and inefficient. The re-ranking images, which automatically offline part learns dissimilar semantic spaces for different query keywords. The features of images are projected into their related semantic spaces to get particular images. At the online stage, images are re-ranked by compare their semantic signatures obtained the semantic précised by the query keyword image. The query-specific semantic signatures extensively improve both the proper and efficiency of image re-ranking.

Keywords: Query, keyword, image, re-ranking, semantic, signature

Procedia PDF Downloads 525

80 Effect of the Keyword Strategy on Lexical Semantic Acquisition: Recognition, Retention and Comprehension in an English as Second Language Context

Authors: Fatima Muhammad Shitu

Abstract:

This study seeks to investigate the effect of the keyword strategy on lexico–semantic acquisition, recognition, retention and comprehension in an ESL context. The aim of the study is to determine whether the keyword strategy can be used to enhance acquisition. As a quasi- experimental research, the objectives of the study include: To determine the extent to which the scores obtained by the subjects, who were trained on the use of the keyword strategy for acquisition, differ at the pre-tests and the post–tests and also to find out the relationship in the scores obtained at these tests levels. The sample for the study consists of 300 hundred undergraduate ESL Students in the Federal College of Education, Kano. The seventy-five lexical items for acquisition belong to the lexical field category known as register, and they include Medical, Agriculture and Photography registers (MAP). These were divided in the ratio twenty-five (25) lexical items in each lexical field. The testing technique was used to collect the data while the descriptive and inferential statistics were employed for data analysis. For the purpose of testing, the two kinds of tests administered at each test level include the WARRT (Word Acquisition, Recognition, and Retention Test) and the CCPT (Cloze Comprehension Passage Test). The results of the study revealed that there are significant differences in the scores obtained between the pre-tests, and the post–tests and there are no correlations in the scores obtained as well. This implies that the keyword strategy has effectively enhanced the acquisition of the lexical items studied.

Keywords: keyword, lexical, semantics, strategy

Procedia PDF Downloads 280

79 Local Interpretable Model-agnostic Explanations (LIME) Approach to Email Spam Detection

Authors: Rohini Hariharan, Yazhini R., Blessy Maria Mathew

Abstract:

The task of detecting email spam is a very important one in the era of digital technology that needs effective ways of curbing unwanted messages. This paper presents an approach aimed at making email spam categorization algorithms transparent, reliable and more trustworthy by incorporating Local Interpretable Model-agnostic Explanations (LIME). Our technique assists in providing interpretable explanations for specific classifications of emails to help users understand the decision-making process by the model. In this study, we developed a complete pipeline that incorporates LIME into the spam classification framework and allows creating simplified, interpretable models tailored to individual emails. LIME identifies influential terms, pointing out key elements that drive classification results, thus reducing opacity inherent in conventional machine learning models. Additionally, we suggest a visualization scheme for displaying keywords that will improve understanding of categorization decisions by users. We test our method on a diverse email dataset and compare its performance with various baseline models, such as Gaussian Naive Bayes, Multinomial Naive Bayes, Bernoulli Naive Bayes, Support Vector Classifier, K-Nearest Neighbors, Decision Tree, and Logistic Regression. Our testing results show that our model surpasses all other models, achieving an accuracy of 96.59% and a precision of 99.12%.

Keywords: text classification, LIME (local interpretable model-agnostic explanations), stemming, tokenization, logistic regression.

Procedia PDF Downloads 15

78 Keyword Advertising: Still Need Construction in European Union; Perspective on Interflora vs. Marks and Spencer

Authors: Mohammadbagher Asghariaghamashhadi

Abstract:

Internet users normally are automatically linked to an advertisement sponsored by a bidder when Internet users enter any trademarked keyword on a search engine. This advertisement appears beside the search results. Through the process of keyword advertising, advertisers can connect with many Internet users and let them know about their goods and services. This concept has generated heated disagreements among legal scholars, trademark proprietors, advertisers, search engine owners, and consumers. Therefore, use of trademarks in keyword advertising has been one of the most debatable issues in trademark law for several years. This entirely new way of using trademarks over the Internet has provoked a discussion concerning the core concepts of trademark law. In respect to legal issues, European Union (EU) trademark law is mostly governed by the Trademark Directive and the Community Trademark Regulation. Article 5 of the directive and Article 9 of the trademark regulation determine the circumstances in which a trademark owner holds the right to prohibit a third party’s use of his/her registered sign. Harmonized EU trademark law proved to be ambiguous on whether using of a trademark is amounted to trademark infringement or not. The case law of the European Court of Justice (ECJ), with reference to this legislation, is mostly unfavorable to trademark owners. This ambivalence was also exhibited by the case law of EU Member States. European keyword advertisers simply could not tell which use of a competitor‘s trademark was lawful. In recent years, ECJ has continuously expanded the scope and reach of trademark protection in the EU. It is notable that Inconsistencies in the Court’s system of infringement criteria clearly come to the fore and this approach has been criticized by analysts who believe that the Court should have adopted a more traditional approach to the analysis of trademark infringement, which was suggested by its Advocate General, in order to arrive at the same conclusion. Regarding case law of keyword advertising within Europe, one of the most disputable cases is Interflora vs. Marks and Spencer, which is still on-going. This study examines and critically analyzes the decisions of the ECJ, the high court of England, and the Court of Appeals of England and address critically keyword advertising issue within European trademark legislation.

Keywords: ECJ, Google, Interflora, keyword advertising, Marks and Spencer, trademark infringement

Procedia PDF Downloads 314

77 Graph-Based Semantical Extractive Text Analysis

Authors: Mina Samizadeh

Abstract:

In the past few decades, there has been an explosion in the amount of available data produced from various sources with different topics. The availability of this enormous data necessitates us to adopt effective computational tools to explore the data. This leads to an intense growing interest in the research community to develop computational methods focused on processing this text data. A line of study focused on condensing the text so that we are able to get a higher level of understanding in a shorter time. The two important tasks to do this are keyword extraction and text summarization. In keyword extraction, we are interested in finding the key important words from a text. This makes us familiar with the general topic of a text. In text summarization, we are interested in producing a short-length text which includes important information about the document. The TextRank algorithm, an unsupervised learning method that is an extension of the PageRank (algorithm which is the base algorithm of Google search engine for searching pages and ranking them), has shown its efficacy in large-scale text mining, especially for text summarization and keyword extraction. This algorithm can automatically extract the important parts of a text (keywords or sentences) and declare them as a result. However, this algorithm neglects the semantic similarity between the different parts. In this work, we improved the results of the TextRank algorithm by incorporating the semantic similarity between parts of the text. Aside from keyword extraction and text summarization, we develop a topic clustering algorithm based on our framework, which can be used individually or as a part of generating the summary to overcome coverage problems.

Keywords: keyword extraction, n-gram extraction, text summarization, topic clustering, semantic analysis

Procedia PDF Downloads 40

76 An Open Source Advertisement System

Authors: Pushkar Umaranikar, Chris Pollett

Abstract:

An online advertisement system and its implementation for the Yioop open source search engine are presented. This system supports both selling advertisements and displaying them within search results. The selling of advertisements is done using a system to auction off daily impressions for keyword searches. This is an open, ascending price auction system in which all accepted bids will receive a fraction of the auctioned day’s impressions. New bids in our system are required to be at least one half of the sum of all previous bids ensuring the number of accepted bids is logarithmic in the total ad spend on a keyword for a day. The mechanics of creating an advertisement, attaching keywords to it, and adding it to an advertisement inventory are described. The algorithm used to go from accepted bids for a keyword to which ads are displayed at search time is also presented. We discuss properties of our system and compare it to existing auction systems and systems for selling online advertisements.

Keywords: online markets, online ad system, online auctions, search engines

Procedia PDF Downloads 290

75 Sentence vs. Keyword Content Analysis in Intellectual Capital Disclosures Study

Authors: Martin Surya Mulyadi, Yunita Anwar, Rosinta Ria Panggabean

Abstract:

Major transformations in economic activity from an agricultural economy to knowledge economy have led to an increasing focus on intellectual capital (IC) that has been characterized by continuous innovation, the spread of digital and communication technologies, intangible and human factors. IC is defined as the possession of knowledge and experience, professional knowledge and skill, proper relationships and technological capacities, which when applied will give organizations a competitive advantage. All of IC report/disclosure could be captured from the corporate annual report as it is a communication device that allows a corporation to connect with various external and internal stakeholders. This study was conducted using sentence-content analysis of IC disclosure in the annual report. This research aims to analyze whether the keyword-content analysis is reliable research methodology for IC disclosure related research.

Keywords: intellectual capital, intellectual capital disclosure, content analysis, annual report, sentence analysis, keyword analysis

Procedia PDF Downloads 333

74 Interactive, Topic-Oriented Search Support by a Centroid-Based Text Categorisation

Authors: Mario Kubek, Herwig Unger

Abstract:

Centroid terms are single words that semantically and topically characterise text documents and so may serve as their very compact representation in automatic text processing. In the present paper, centroids are used to measure the relevance of text documents with respect to a given search query. Thus, a new graphbased paradigm for searching texts in large corpora is proposed and evaluated against keyword-based methods. The first, promising experimental results demonstrate the usefulness of the centroid-based search procedure. It is shown that especially the routing of search queries in interactive and decentralised search systems can be greatly improved by applying this approach. A detailed discussion on further fields of its application completes this contribution.

Keywords: search algorithm, centroid, query, keyword, co-occurrence, categorisation

Procedia PDF Downloads 250

73 Analyzing Keyword Networks for the Identification of Correlated Research Topics

Authors: Thiago M. R. Dias, Patrícia M. Dias, Gray F. Moita

Abstract:

The production and publication of scientific works have increased significantly in the last years, being the Internet the main factor of access and distribution of these works. Faced with this, there is a growing interest in understanding how scientific research has evolved, in order to explore this knowledge to encourage research groups to become more productive. Therefore, the objective of this work is to explore repositories containing data from scientific publications and to characterize keyword networks of these publications, in order to identify the most relevant keywords, and to highlight those that have the greatest impact on the network. To do this, each article in the study repository has its keywords extracted and in this way the network is characterized, after which several metrics for social network analysis are applied for the identification of the highlighted keywords.

Keywords: bibliometrics, data analysis, extraction and data integration, scientometrics

Procedia PDF Downloads 214

72 Finding Related Scientific Documents Using Formal Concept Analysis

Authors: Nadeem Akhtar, Hira Javed

Abstract:

An important aspect of research is literature survey. Availability of a large amount of literature across different domains triggers the need for optimized systems which provide relevant literature to researchers. We propose a search system based on keywords for text documents. This experimental approach provides a hierarchical structure to the document corpus. The documents are labelled with keywords using KEA (Keyword Extraction Algorithm) and are automatically organized in a lattice structure using Formal Concept Analysis (FCA). This groups the semantically related documents together. The hierarchical structure, based on keywords gives out only those documents which precisely contain them. This approach open doors for multi-domain research. The documents across multiple domains which are indexed by similar keywords are grouped together. A hierarchical relationship between keywords is obtained. To signify the effectiveness of the approach, we have carried out the experiment and evaluation on Semeval-2010 Dataset. Results depict that the presented method is considerably successful in indexing of scientific papers.

Keywords: formal concept analysis, keyword extraction algorithm, scientific documents, lattice

Procedia PDF Downloads 300

71 Some Tips for Increasing Online Services Safety

Authors: Mohsen Rezaee

Abstract:

Although robust security softwares, including anti-viruses, anti-spywares, anti-spam and firewalls are amalgamated with new technologies such as safe zone, hybrid cloud, sand box and etc., and although it can be said that they have managed to prepare highest level of security against viruses, spywares and other malwares in 2012, in fact, hacker attacks to websites are increasingly becoming more and more complicated. Because of security matters developments it can be said it was expected to happen so. Here in this work we try to point out some functional and vital notes to enhance security on the web, enabling the user to browse safely in unlimited web world and to use virtual space securely.

Keywords: firewalls, security, web services, computer science

Procedia PDF Downloads 363

70 Evidence of a Negativity Bias in the Keywords of Scientific Papers

Authors: Kseniia Zviagintseva, Brett Buttliere

Abstract:

Science is fundamentally a problem-solving enterprise, and scientists pay more attention to the negative things, that cause them dissonance and negative affective state of uncertainty or contradiction. While this is agreed upon by philosophers of science, there are few empirical demonstrations. Here we examine the keywords from those papers published by PLoS in 2014 and show with several sentiment analyzers that negative keywords are studied more than positive keywords. Our dataset is the 927,406 keywords of 32,870 scientific articles in all fields published in 2014 by the journal PLOS ONE (collected from Altmetric.com). Counting how often the 47,415 unique keywords are used, we can examine whether those negative topics are studied more than positive. In order to find the sentiment of the keywords, we utilized two sentiment analysis tools, Hu and Liu (2004) and SentiStrength (2014). The results below are for Hu and Liu as these are the less convincing results. The average keyword was utilized 19.56 times, with half of the keywords being utilized only 1 time and the maximum number of uses being 18,589 times. The keywords identified as negative were utilized 37.39 times, on average, with the positive keywords being utilized 14.72 times and the neutral keywords - 19.29, on average. This difference is only marginally significant, with an F value of 2.82, with a p of .05, but one must keep in mind that more than half of the keywords are utilized only 1 time, artificially increasing the variance and driving the effect size down. To examine more closely, we looked at those top 25 most utilized keywords that have a sentiment. Among the top 25, there are only two positive words, ‘care’ and ‘dynamics’, in position numbers 5 and 13 respectively, with all the rest being identified as negative. ‘Diseases’ is the most studied keyword with 8,790 uses, with ‘cancer’ and ‘infectious’ being the second and fourth most utilized sentiment-laden keywords. The sentiment analysis is not perfect though, as the words ‘diseases’ and ‘disease’ are split by taking 1st and 3rd positions. Combining them, they remain as the most common sentiment-laden keyword, being utilized 13,236 times. More than just splitting the words, the sentiment analyzer logs ‘regression’ and ‘rat’ as negative, and these should probably be considered false positives. Despite these potential problems, the effect is apparent, as even the positive keywords like ‘care’ could or should be considered negative, since this word is most commonly utilized as a part of ‘health care’, ‘critical care’ or ‘quality of care’ and generally associated with how to improve it. All in all, the results suggest that negative concepts are studied more, also providing support for the notion that science is most generally a problem-solving enterprise. The results also provide evidence that negativity and contradiction are related to greater productivity and positive outcomes.

Keywords: bibliometrics, keywords analysis, negativity bias, positive and negative words, scientific papers, scientometrics

Procedia PDF Downloads 157

69 Simple Ways to Enhance the Security of Web Services

Authors: Majid Azarniush, Soroush Mokallaei

Abstract:

Although robust security software, including anti-viruses, anti spy wares, anti-spam and firewalls, are amalgamated with new technologies such as Safe Zone, Hybrid Cloud, Sand Box etc., and it can be said that they have managed to prepare highest level of security against viruses, spy wares and other malwares in 2012, but in fact hackers' attacks to websites are increasingly becoming more and more complicated. Because of security matters and developments, it can be said that it was expected to happen so. Here in this work, we try to point out to some functional and vital notes to enhance security on the web enabling the user to browse safely in no limit web world and to use virtual space securely.

Keywords: firewalls, security, web services, software

Procedia PDF Downloads 453

68 Frontier Dynamic Tracking in the Field of Urban Plant and Habitat Research: Data Visualization and Analysis Based on Journal Literature

Authors: Shao Qi

Abstract:

The article uses the CiteSpace knowledge graph analysis tool to sort and visualize the journal literature on urban plants and habitats in the Web of Science and China National Knowledge Infrastructure databases. Based on a comprehensive interpretation of the visualization results of various data sources and the description of the intrinsic relationship between high-frequency keywords using knowledge mapping, the research hotspots, processes and evolution trends in this field are analyzed. Relevant case studies are also conducted for the hotspot contents to explore the means of landscape intervention and synthesize the understanding of research theories. The results show that (1) from 1999 to 2022, the research direction of urban plants and habitats gradually changed from focusing on plant and animal extinction and biological invasion to the field of human urban habitat creation, ecological restoration, and ecosystem services. (2) The results of keyword emergence and keyword growth trend analysis show that habitat creation research has shown a rapid and stable growth trend since 2017, and ecological restoration has gained long-term sustained attention since 2004. The hotspots of future research on urban plants and habitats in China may focus on habitat creation and ecological restoration.

Keywords: research trends, visual analysis, habitat creation, ecological restoration

Procedia PDF Downloads 37

67 Methodologies for Deriving Semantic Technical Information Using an Unstructured Patent Text Data

Authors: Jaehyung An, Sungjoo Lee

Abstract:

Patent documents constitute an up-to-date and reliable source of knowledge for reflecting technological advance, so patent analysis has been widely used for identification of technological trends and formulation of technology strategies. But, identifying technological information from patent data entails some limitations such as, high cost, complexity, and inconsistency because it rely on the expert’ knowledge. To overcome these limitations, researchers have applied to a quantitative analysis based on the keyword technique. By using this method, you can include a technological implication, particularly patent documents, or extract a keyword that indicates the important contents. However, it only uses the simple-counting method by keyword frequency, so it cannot take into account the sematic relationship with the keywords and sematic information such as, how the technologies are used in their technology area and how the technologies affect the other technologies. To automatically analyze unstructured technological information in patents to extract the semantic information, it should be transformed into an abstracted form that includes the technological key concepts. Specific sentence structure ‘SAO’ (subject, action, object) is newly emerged by representing ‘key concepts’ and can be extracted by NLP (Natural language processor). An SAO structure can be organized in a problem-solution format if the action-object (AO) states that the problem and subject (S) form the solution. In this paper, we propose the new methodology that can extract the SAO structure through technical elements extracting rules. Although sentence structures in the patents text have a unique format, prior studies have depended on general NLP (Natural language processor) applied to the common documents such as newspaper, research paper, and twitter mentions, so it cannot take into account the specific sentence structure types of the patent documents. To overcome this limitation, we identified a unique form of the patent sentences and defined the SAO structures in the patents text data. There are four types of technical elements that consist of technology adoption purpose, application area, tool for technology, and technical components. These four types of sentence structures from patents have their own specific word structure by location or sequence of the part of speech at each sentence. Finally, we developed algorithms for extracting SAOs and this result offer insight for the technology innovation process by providing different perspectives of technology.

Keywords: NLP, patent analysis, SAO, semantic-analysis

Procedia PDF Downloads 239

66 A Corpus-Based Analysis of "MeToo" Discourse in South Korea: Coverage Representation in Korean Newspapers

Authors: Sun-Hee Lee, Amanda Kraley

Abstract:

The “MeToo” movement is a social movement against sexual abuse and harassment. Though the hashtag went viral in 2017 following different cultural flashpoints in different countries, the initial response was quiet in South Korea. This radically changed in January 2018, when a high-ranking senior prosecutor, Seo Ji-hyun, gave a televised interview discussing being sexually assaulted by a colleague. Acknowledging public anger, particularly among women, on the long-existing problems of sexual harassment and abuse, the South Korean media have focused on several high-profile cases. Analyzing the media representation of these cases is a window into the evolving South Korean discourse around “MeToo.” This study presents a linguistic analysis of “MeToo” discourse in South Korea by utilizing a corpus-based approach. The term corpus (pl. corpora) is used to refer to electronic language data, that is, any collection of recorded instances of spoken or written language. A “MeToo” corpus has been collected by extracting newspaper articles containing the keyword “MeToo” from BIGKinds, big data analysis, and service and Nexis Uni, an online academic database search engine, to conduct this language analysis. The corpus analysis explores how Korean media represent accusers and the accused, victims and perpetrators. The extracted data includes 5,885 articles from four broadsheet newspapers (Chosun, JoongAng, Hangyore, and Kyunghyang) and 88 articles from two Korea-based English newspapers (Korea Times and Korea Herald) between January 2017 and November 2020. The information includes basic data analysis with respect to keyword frequency and network analysis and adds refined examinations of select corpus samples through naming strategies, semantic relations, and pragmatic properties. Along with the exponential increase of the number of articles containing the keyword “MeToo” from 104 articles in 2017 to 3,546 articles in 2018, the network and keyword analysis highlights ‘US,’ ‘Harvey Weinstein’, and ‘Hollywood,’ as keywords for 2017, with articles in 2018 highlighting ‘Seo Ji-Hyun, ‘politics,’ ‘President Moon,’ ‘An Ui-Jeong, ‘Lee Yoon-taek’ (the names of perpetrators), and ‘(Korean) society.’ This outcome demonstrates the shift of media focus from international affairs to domestic cases. Another crucial finding is that word ‘defamation’ is widely distributed in the “MeToo” corpus. This relates to the South Korean legal system, in which a person who defames another by publicly alleging information detrimental to their reputation—factual or fabricated—is punishable by law (Article 307 of the Criminal Act of Korea). If the defamation occurs on the internet, it is subject to aggravated punishment under the Act on Promotion of Information and Communications Network Utilization and Information Protection. These laws, in particular, have been used against accusers who have publicly come forward in the wake of “MeToo” in South Korea, adding an extra dimension of risk. This corpus analysis of “MeToo” newspaper articles contributes to the analysis of the media representation of the “MeToo” movement and sheds light on the shifting landscape of gender relations in the public sphere in South Korea.

Keywords: corpus linguistics, MeToo, newspapers, South Korea

Procedia PDF Downloads 180

65 SFO-ECRSEP: Sensor Field Optimızation Based Ecrsep For Heterogeneous WSNS

Authors: Gagandeep Singh

Abstract:

The sensor field optimization is a serious issue in WSNs and has been ignored by many researchers. As in numerous real-time sensing fields the sensor nodes on the corners i.e. on the segment boundaries will become lifeless early because no extraordinary safety is presented for them. Accordingly, in this research work the central objective is on the segment based optimization by separating the sensor field between advance and normal segments. The inspiration at the back this sensor field optimization is to extend the time spam when the first sensor node dies. For the reason that in normal sensor nodes which were exist on the borders may become lifeless early because the space among them and the base station is more so they consume more power so at last will become lifeless soon.

Keywords: WSNs, ECRSEP, SEP, field optimization, energy

Procedia PDF Downloads 264